On universally consistent and fully distribution-free rank tests of vector independence
RRate-Optimality of Consistent Distribution-FreeTests of IndependenceBased on Center-Outward Ranks and Signs
Hongjian Shi ∗ , Marc Hallin † , Mathias Drton ‡ , and Fang Han § Abstract
Rank correlations have found many innovative applications in the last decade. In particu-lar, suitable versions of rank correlations have been used for consistent tests of independencebetween pairs of random variables. The use of ranks is especially appealing for continuous dataas tests become distribution-free. However, the traditional concept of ranks relies on orderingdata and is, thus, tied to univariate observations. As a result it has long remained unclear howone may construct distribution-free yet consistent tests of independence between multivariaterandom vectors. This is the problem we address in this paper, in which we lay out a generalframework for designing dependence measures that give tests of multivariate independence thatare not only consistent and distribution-free but which we also prove to be statistically efficient.Our framework leverages the recently introduced concept of center-outward ranks and signs, amultivariate generalization of traditional ranks, and adopts a common standard form for de-pendence measures that encompasses many popular measures from the literature. In a unifiedstudy, we derive a general asymptotic representation of center-outward test statistics under in-dependence, extending to the multivariate setting the classical Hájek asymptotic representationresults. This representation permits a direct calculation of limiting null distributions for theproposed test statistics. Moreover, it facilitates a local power analysis that provides strong sup-port for the center-outward approach to multivariate ranks by establishing, for the first time,the rate-optimality of center-outward tests within families of Konijn alternatives.
Keywords:
Multivariate ranks and signs, center-outward ranks and signs, multivariate depen-dence measure, independence test, Hájek representation, Le Cam’s third lemma.
Quantifying the dependence between two variables and testing for their independence are amongthe oldest and most fundamental problems of statistical inference. The (marginal) distributions ofthe two variables under study, in that context, typically play the role of nuisances, and the need fora nonparametric approach naturally leads, when they are univariate, to distribution-free methodsbased on their ranks. This paper is dealing with the multivariate extension of that approach. ∗ Department of Statistics, University of Washington, Seattle, WA 98195, USA; e-mail: [email protected] † ECARES and Department of Mathematics, UniversitÃľ Libre de Bruxelles, Brussels, Belgium; email: [email protected] ‡ Department of Mathematics, Technical University of Munich, 85748 Garching b. München, Germany; e-mail: [email protected] § Department of Statistics, University of Washington, Seattle, WA 98195, USA; e-mail: [email protected] a r X i v : . [ m a t h . S T ] J u l .1 Measuring and testing for dependence Consider two absolutely continuous random vectors X and X , with values in R d and R d ,respectively. The problems of measuring the dependence between X and X and testing theirindependence when d = d = 1 (call this the univariate case) have a long history that goes backmore than a century ago (Pearson, 1895; Spearman, 1904). The same problem when d and d are possibly unequal and larger than one (the multivariate case) is of equal practical interest butalso more challenging. Following early attempts (Wilks, 1935), a large literature has emerged, withrenewed interest in recent years.When the marginal distributions of X and X are unspecified and d = d = 1 , rank-basedcorrelations provide a natural and appealing nonparametric approach to testing for independence(the best known examples are due to Spearman (1904) and Kendall (1938); see Chapter III.6 in Hájekand Šidák (1967)). On one hand, they yield distribution-free tests because, under the null hypothesisof independence, their distributions do not depend on the unspecified marginal distributions. Onthe other hand, they can be designed (Hoeffding, 1948; Blum et al., 1961; Bergsma and Dassios,2014; Yanagimoto, 1970) to consistently estimate dependence measures that vanish if and only ifindependence holds, and so detect any type of dependence—something Spearman and Kendall’srank-based correlations cannot.New subtleties arise, however, when attempting to extend this rank-based approach in the mul-tivariate case. For example, while d k ranks can be constructed componentwise for each X k , k = 1 , their joint distribution depends on the distribution of the underlying X k , preventing distribution-freeness of the ( d + d ) -tuple of ranks. As a consequence, the existing tests of multivariate in-dependence based on componentwise ranks (see, e.g., Puri et al. (1970)) are only conditionallydistribution-free, which has both computational implications (e.g., through a need for permutationanalysis) and statistical implications (as we shall detail soon). In this paper, we develop a general framework for multivariate analogues of popular rank-basedmeasures of dependence for the univariate case. Our objective is to propose a class of statistics andtesting procedures achieving the following list of five desirable properties.(1)
Exact distribution-freeness.
Many statistical tests exploit asymptotic distribution-freenessfor computationally efficient distributional approximations yielding pointwise asymptotic controlof their size (this is the case, for instance, with Hallin and Paindaveine (2002c,b,a, 2008) dueto the estimation of a scatter matrix, or with Taskinen et al. (2003, 2004, 2005)). To be moreprecise, for any given significance level α ∈ (0 , , one obtains a sequence of tests φ ( n ) α indexed bythe sample size n such that lim n →∞ E P [ φ ( n ) α ] = α for every distribution P from a class P of nulldistributions. Generally, however, the size is not asymptotically controlled in a uniform sense, as itshould be, that is, one does not have that lim n →∞ sup P ∈P E P [ φ ( n ) α ] ≤ α , which may explain poorfinite-sample properties (see, e.g., Le Cam and Yang, 2000; Leeb and Pötscher, 2008; Belloni et al.,2014). While uniform inferential validity is impossible to achieve for some problems, e.g., whentesting for conditional independence (Shah and Peters, 2020; Azadkia and Chatterjee, 2019), weshall see that it is achievable for testing (unconditional) multivariate independence. Indeed, forstrictly distribution-free tests, as developed in this paper, pointwise validity automatically impliesuniform validity. 22) Transformation invariance.
A dependence measure µ is said to be invariant under orthogonaltransformations, shifts, and global scales if µ ( X , X ) = µ ( v + a O X , v + a O X ) for anyscalars a k > , vectors v k ∈ R d k , and orthogonal d k × d k matrices O k , k = 1 , . This invariance, heresimply termed “transformation invariance”, is a natural requirement in cases where the componentsof X , X do not have specific meanings and the observation could, thus, have been recorded inanother coordinate system. Such invariance properties are of considerable interest in multivariatestatistics (see, e.g., Gieser and Randles, 1997; Taskinen et al., 2003, 2005; Oja et al., 2016).(3) Consistency.
Following terminology from Weihs et al. (2018), a dependence measure µ iscalled I-consistent within a family P of distributions if, denoting by X and X two random vectorswith joint distribution P ∈ P , independence between X and X implies µ ( X , X ) = 0 ; µ iscalled D-consistent within P if µ ( X , X ) = 0 implies that X and X are independent (equiv-alently, if the non-independence of X and X implies µ ( X , X ) (cid:54) = 0 ). While any reasonabledependence measure should be I-consistent, some of the best-known ones (Pearson’s correlation,Spearman’s ρ , and Kendall’s τ ) fail to be D-consistent. If a dependence measure µ is both I-and D-consistent, then the consistency of tests based on an estimator µ ( n ) of µ is guaranteed bythe (strong or weak) consistency of that estimator. Dependence measures that are both I- andD-consistent (within a large nonparametric family) serve an important purpose as they are ableto capture nonlinear dependences and yield consistent tests of independence. Well-known I- andD-consistent dependence measures for the univariate case include Hoeffding’s D (Hoeffding, 1948),Blum–Kiefer–Rosenblatt’s R (Blum et al., 1961), and Bergsma–Dassios–Yanagimoto’s τ ∗ (Bergsmaand Dassios, 2014; Yanagimoto, 1970; Drton et al., 2020). Extensions to the multivariate case havebeen proposed in Gretton et al. (2005), Székely et al. (2007), Heller et al. (2012), Heller et al. (2013),Heller and Heller (2016), Zhu et al. (2017), Weihs et al. (2018), Kim et al. (2019), Deb and Sen(2019), Shi et al. (2020), Berrett et al. (2020), among many others.(4) Statistical efficiency.
Once its size is controlled, the performance of a test may be evaluatedby considering its power against local alternatives. For independence tests, the so-called Konijnalternatives (Konijn, 1956) constitute a popular choice for conducting local power analyses (see,among others, Gieser (1993); Gieser and Randles (1997); Taskinen et al. (2003, 2004, 2005); Hallinand Paindaveine (2008)) that we also take up here. Specifically, we will call an independence testrate-optimal against a family of Konijn local alternatives if, within this family, it achieves thedetection boundary in the minimax sense.(5)
Computational efficiency.
On top of the aforementioned statistical properties, computing adependence measure and performing the corresponding test should remain as feasible as possible.We thus give preference to dependence measures and tests with low computational complexity.The main challenge, with this list of five desirable properties, is the combination of exactdistribution-freeness (property (1)) and properties (2)–(5). The solution, as we shall see, involvesan adequate multivariate extension of the univariate concepts of ranks and signs.
This paper proposes a class of dependence measures and tests achieving the five properties listed inSection 1.2. Those measures (and the corresponding test statistics) are based on the multivariatenotion of center-outward ranks and signs recently introduced in Chernozhukov et al. (2017) andHallin (2017): see Hallin et al. (2020a) for a complete account. In contrast to earlier related conceptsincluding marginal ranks (Puri and Sen, 1971), spatial ranks (Oja, 2010; Han and Liu, 2018), depth-3ased ranks (Liu and Singh, 1993; Zuo and He, 2006), and pseudo-Mahalanobis ranks and signs(Hallin and Paindaveine, 2002c), the new concept yields statistics that enjoy exact distribution-freeness (property (1) above) as soon as the underlying probability measure is Lebesgue-absolutelycontinuous. This allows for a general multivariate strategy, in which the original observations arereplaced by functions of their center-outward ranks and signs when forming a dependence measureand the corresponding test statistic. This is also the idea put forward in Shi et al. (2020) and, ina slightly different way, in Deb and Sen (2019), where the focus is on distance covariance betweencenter-outward ranks and signs.We are generalizing this approach in two important ways. First, we introduce a class of general-ized symmetric covariances (GSCs) along with their center-outward rank–based versions, of whichthe distance covariance concepts from Deb and Sen (2019) and Shi et al. (2020) are but particularcases. Second, we show how considerable additional flexibility and power results from incorporat-ing score functions in the definition. Our simulations in Section 5.3 exemplify the benefits of this“score-based” approach.From a theoretical point of view, the present paper also offers a new approach to the asymptotictheory of the proposed rank-based statistics. Indeed, handling the asymptotics of this new classof statistics with the same methods as Shi et al. (2020) and Deb and Sen (2019) would be highlynontrivial and, moreover, would not provide any insights into local powers—an issue (property (4))receiving much attention also in other contexts (Hallin et al., 2019; Beirlant et al., 2019; Hallin et al.,2020c). Therefore, we are developing a completely different method, based on a general asymptoticrepresentation result applicable to all center-outward rank-based GSCs under the null hypothesisof independence and contiguous dependence alternatives. That result (Theorem 5.1) extends to themultivariate setting Hájek’s classical asymptotic representation result for univariate ranks (Hájekand Šidák, 1967) and considerably simplifies the derivation of limiting null distribution. Combinedwith a nontrivial use of Le Cam’s third lemma (the limiting distributions here are not Gaussian),our approach moreover allows for the first rate-optimality result in the area; the rate-optimality ofthe tests proposed in Deb and Sen (2019) and Shi et al. (2020) follows as a particular case.
Outline of the paper
The rest of the paper is organized as follows. Section 2 reviews severalimportant dependence measures from the literature. Generalizing the idea of symmetric rank co-variances put forward in Weihs et al. (2018), we show that a single formula unifies them all; we referto the concept as generalized symmetric covariance (GSC). As further background, Section 3 brieflyreviews the notion of center-outward ranks and signs. In Section 4 we then present our streamlinedapproach of defining multivariate dependence measures, along with their sample counterparts, andhighlight some of their basic properties. Section 5 treats the problem of independence testing anddevelops (Section 5.1) a theory of asymptotic representation for center-outward rank-based GSCsas well as the local power analysis of the corresponding tests (Section 5.2). A numerical studyillustrating the potential benefits of a choice of srandard score functions (sign test scores, Wilcoxon,and normal scores) is given in Section 5.3. All proofs are deferred to Section 7.
Notation.
A permutation of a finite set S is a bijection σ from S to itself; the notation (cid:18) · · · mσ (1) σ (2) · · · σ ( m ) (cid:19) , for instance, is used for the permutation i (cid:55)→ σ ( i ) of S = { , , . . . , m } . Let sgn( σ ) := +1 if σ is an even permutation and sgn( σ ) := − if σ is an odd permutation. For all integer n ≥ ,4ut (cid:74) n (cid:75) := { , , . . . , n } and denote by S n the symmetric group, i.e., the group of all permutationsof (cid:74) n (cid:75) . That group, in general, admits many subgroups: for example, when n = 5 , the subgroups H τ := (cid:26)(cid:18) (cid:19) , (cid:18) (cid:19)(cid:27) , and H ∗ := (cid:26)(cid:18) (cid:19) , (cid:18) (cid:19) , (cid:18) (cid:19) , (cid:18) (cid:19)(cid:27) of S will play an important role in the sequel.A set consisting of distinct elements x , . . . , x n is written either as { x , . . . , x n } or { x i } ni =1 .The corresponding sequence is denoted by [ x , . . . , x n ] or [ x i ] ni =1 . An arrangement of a finiteset S = { x , . . . , x n } is a sequence [ x σ ( i ) ] ni =1 , where σ ∈ S n . An r -arrangement of S is thena sequence [ x σ ( i ) ] ri =1 for r ∈ (cid:74) n (cid:75) ; write I nr for the family of all ( n ) r := n ! / ( n − r )! possible r -arrangements of (cid:74) n (cid:75) .We write d for the zero vector in R d and (cid:107) · (cid:107) for the Euclidean norm. For two vectors u and v in R d , we write u (cid:22) v if u (cid:96) ≤ v (cid:96) for all (cid:96) ∈ (cid:74) d (cid:75) . Let Arc ( u , v ) := (2 π ) − arccos { u (cid:62) v / ( (cid:107) u (cid:107)(cid:107) v (cid:107) ) } if u , v (cid:54) = d , Arc ( u , v ) := 0 otherwise. For a sequence of vectors v , . . . , v k , we use ( v , . . . , v k ) as ashorthand for ( v (cid:62) , . . . , v (cid:62) k ) (cid:62) . We write I d for the d × d identity matrix. For a function f : X → R ,we define (cid:107) f (cid:107) ∞ := max x ∈X | f ( x ) | . The symbol (cid:98)·(cid:99) stands for the floor function, ( · ) for the indicatorfunction.The cumulative distribution function and the probability distribution of a real-valued randomvariable/vector Z are denoted as F Z ( · ) and P Z , respectively. The class of probability measureson R d that are absolutely continuous (with respect to Lebesgue measure) is denoted as P ac d . Weuse (cid:32) and a . s . −→ to denote convergence in distribution and almost sure convergence, respectively.For any symmetric kernel h ( · ) on ( R d ) m , any integer (cid:96) ∈ (cid:74) m (cid:75) , and any probability measure P Z , wewrite h (cid:96) ( z . . . , z (cid:96) ; P Z ) for E h ( z . . . , z (cid:96) , Z (cid:96) +1 , . . . , Z m ) where Z , . . . , Z m are m independent copiesof Z ∼ P Z , and E h := E h ( Z , . . . , Z m ) . The product measure of two distributions P and P isdenoted P ⊗ P . Let X and X be two random vectors with values in R d and R d , and assume throughout thispaper that they are both absolutely continuous with respect to the Lebesgue measure. Weihs et al.(2018, Definition 3) introduced a general approach to defining rank-based measures of dependencebetween X and X , which is based on signed sums over indicator functions that are acted upon bysubgroups of the symmetric group. This creates a family of dependence measures they call symmetricrank covariances . In this section, we highlight that their approach can be extended to cover a muchwider range of dependence measures including, in particular, the celebrated distance covariance (Székely et al., 2007). This observation is important, as it enables us to study a broad class ofdependence measures under a common standard form. Specifically, we introduce the followingfamily of generalized symmetric covariances (GSC). Definition 2.1 (Generalized symmetric covariance) . A measure of dependence µ is said to be an m -th order generalized symmetric covariance if there exist two kernel functions f : ( R d ) m → R ≥ and f : ( R d ) m → R ≥ , and a subgroup H ⊆ S m containing an equal number of even and odd5ermutations such that µ ( X , X ) = µ f ,f ,H ( X , X ) := E[ k f ,f ,H (( X , X ) , . . . , ( X m , X m ))] . Here ( X , X ) , . . . , ( X m , X m ) are m independent copies of ( X , X ) , and the dependence kernelfunction k f ,f ,H ( · ) is defined as k f ,f ,H (cid:16) ( x , x ) , . . . , ( x m , x m ) (cid:17) := (cid:110) (cid:88) σ ∈ H sgn( σ ) f ( x σ (1) , . . . , x σ ( m ) ) (cid:111)(cid:110) (cid:88) σ ∈ H sgn( σ ) f ( x σ (1) , . . . , x σ ( m ) ) (cid:111) . (2.1)The order m of a GSC, by the requirement that H is a subgroup with equal numbers of evenand odd permutations, satisfies m ≥ . The same requirement implies the following property—theproof of which follows along the same lines as for Proposition 2 in Weihs et al. (2018)—that justifiesthe terminology generalized covariance. Proposition 2.1.
All GSCs are I-consistent. More precisely, the GSC µ f ,f ,H ( X , X ) is I-consistent in the family of distributions such that, denoting by X k , . . . , X km m independentcopies of X k , E[ f k ] := E[ f k ( X k , . . . , X km )] < ∞ , k = 1 , . . The concept of GSC unifies a surprisingly large number of well-known dependence measures.Moreover, only two types of subgroups are needed, namely, H mτ := (cid:104) (1 2) (cid:105) = { (1) , (1 2) } ⊆ S m for m = 2 and H m ∗ := (cid:104) (1 4) , (2 3) (cid:105) = { (1) , (1 4) , (2 3) , (1 4)(2 3) } ⊆ S m for m ≥ . Thefollowing result illustrates this fact with four classical examples of univariate dependence measures,namely, the tau of Kendall (1938), the D of Hoeffding (1948), the R of Blum et al. (1961), andthe τ ∗ of Bergsma and Dassios (2014) which, as shown by Drton et al. (2020), is connected to thework of Yanagimoto (1970). Below, we write w = ( w , . . . , w m ) (cid:55)→ f k ( w ) , k = 1 , for the kernelfunctions of an m th order univariate GSC; note that not all components of w need have an impacton f k ( w ) : see, for instance the kernel f of the 6th order Blum–Kiefer–Rosenblatt GSC, which ismapping w = ( w , . . . , w ) to R ≥ but does not depend on w ( f does). Proposition 2.2 (Examples of univariate GSCs) . (a) Kendall’s tau is a 2nd order GSC with H = H τ and f ( w ) = f ( w ) = ( w < w ) on R ; (b) Hoeffding’s D is a 5th order GSC with H = H ∗ and f ( w ) = f ( w ) = 12 (max { w , w } ≤ w ) on R ; (c) Blum–Kiefer–Rosenblatt’s R is a 6th order GSC with H = H ∗ and f ( w ) = 12 (max { w , w } ≤ w ) , f ( w ) = 12 (max { w , w } ≤ w ) on R ; (d) Bergsma–Dassios–Yanagimoto’s τ ∗ is a 4th order GSC with H = H ∗ and f ( w ) = f ( w ) = (max { w , w } < min { w , w } ) on R . Remark 2.1.
Distinct choices of the kernels f and f do not necessarily imply distinct GSCs. Forexample, Weihs et al. (2018, Proposition 1(ii)) showed that Hoeffding’s D in Proposition 2.2(b) is6 5th order GSC with H = H ∗ also for f ( w ) = f ( w ) = 12 (max { w , w } ≤ w < max { w , w } ) on R ; similarly, for Blum–Kiefer–Rosenblatt’s R , the kernels in Proposition 2.2(c) can be replaced with f ( w ) = 12 (max { w , w } ≤ w < max { w , w } ) on R ,f ( w ) = 12 (max { w , w } ≤ w < max { w , w } ) on R . GSCs similarly unify several noteworthy multivariate dependence measures. We consider herethe distance covariance of Székely et al. (2007) and Székely and Rizzo (2013), the multivariateversion of Hoeffding’s D based on marginal ordering (Weihs et al., 2018, Sec. 2.2, p. 549), and theprojection-averaging approach to defining a multivariate extension of Hoeffding’s D (Zhu et al.,2017), of Blum–Kiefer–Rosenblatt’s R (Kim et al., 2019, Theorem 7.2), and of Bergsma–Dassios–Yanagimoto’s τ ∗ (Kim et al., 2019, Theorem 7.3). Here, we write w = ( w , . . . , w m ) (cid:55)→ f k ( w ) for thekernel functions of an m th order multivariate GSC for which the dimension of w (cid:96) , (cid:96) = 1 , . . . , m is d k ,hence may differ for k = 1 and 2. Again, not all components of w need have an impact on f k ( w ) : see,for instance, the kernels of the 4th order distance covariance GSC, which maps w = ( w , . . . , w ) to R ≥ but does not depend on w nor w . Proposition 2.3 (Examples of multivariate GSCs) . (a) Distance covariance is a 4th order GSC with H = H ∗ and f k ( w ) = 12 (cid:107) w − w (cid:107) on ( R d k ) , k = 1 , (b) Hoeffding’s multivariate marginal ordering D is a 5th order GSC with H = H ∗ and f k ( w ) = 12 ( w , w (cid:22) w ) on ( R d k ) , k = 1 , (c) Hoeffding’s multivariate projection-averaging D is a 5th order GSC with H = H ∗ and f k ( w ) = 12 Arc ( w − w , w − w ) on ( R d k ) , k = 1 , (d) Blum–Kiefer–Rosenblatt’s multivariate projection-averaging R is a 6th order GSC with H = H ∗ and f ( w ) = 12 Arc ( w − w , w − w ) on ( R d ) ,f ( w ) = 12 Arc ( w − w , w − w ) on ( R d ) ; (e) Bergsma–Dassios–Yanagimoto’s multivariate projection-averaging τ ∗ is a 4th order GSCwith H = H ∗ and f k ( w ) = Arc ( w − w , w − w ) + Arc ( w − w , w − w ) on ( R d k ) , k = 1 , . Similarly, as noted in Remark 2.1, some kernels f and f presented in Proposition 2.3 are notin the same forms as the ones in those original papers, while the equivalence can be proved easily.It is well known that Kendall’s tau is not D-consistent. For example, tau is zero for X = X and X symmetric about 0. All other dependence measures we have introduced are D-consistent,7lbeit with some variations in the families of distributions for which this holds; see, e.g., the dis-cussions in Examples 2.1–2.3 of Drton et al. (2020). As these dependence measures all involve thegroup H m ∗ , we highlight the following fact. Lemma 2.1.
A GSC µ = µ f ,f ,H m ∗ with m ≥ is D-consistent in a family P if and only if thepair ( f , f ) is D-consistent in P —namely, if and only if E (cid:104) (cid:89) k =1 (cid:110) f k ( X k , X k , X k , X k , X k , . . . , X km ) − f k ( X k , X k , X k , X k , X k , . . . , X km ) − f k ( X k , X k , X k , X k , X k , . . . , X km ) + f k ( X k , X k , X k , X k , X k , . . . , X km ) (cid:111)(cid:105) is finite, nonnegative, and equal to only if X and X are independent. The following theorem summarizes the D-consistency properties of the GSCs considered inPropositions 2.2 and 2.3.
Theorem 2.1.
The univariate GSCs (b)–(d) in Proposition 2.2 and all multivariate GSCs in Propo-sition 2.3 are D-consistent in the family (cid:8) P ∈ P ac d + d (cid:12)(cid:12) E P [ f k ( X k , . . . , X km )] < ∞ , k = 1 , (cid:9) (with f k , k = 1 , denoting their respective kernels). The invariance/equivariance properties of GSCs depend on those of their kernels. We say thata kernel function f : ( R d ) m → R is orthogonally invariant if, for any orthogonal matrix O ∈ R d × d and any w , . . . , w m ∈ ( R d ) m , f ( w , . . . , w m ) = f ( O w , . . . , O w m ) . (2.2) Lemma 2.2. If f and f both are orthogonally invariant, then any GSC of the form µ = µ f ,f ,H is orthogonally invariant in the sense that µ ( X , X ) = µ ( O X , O X ) for any pair of randomvectors ( X , X ) and any pair of orthogonal matrices O ∈ R d × d and O ∈ R d × d . The following invariance properties hold for the multivariate GSCs listed in Proposition 2.3.
Proposition 2.4.
The kernels (a),(c)–(e) in Proposition 2.3, hence the corresponding GSCs, areorthogonally invariant.
Turning from theoretical dependence measures to their empirical counterparts, it is clear thatany GSC admits a natural unbiased estimator in the form of a U-statistic, which we call the samplegeneralized symmetric covariance (SGSC).
Definition 2.2 (Sample generalized symmetric covariance) . The sample generalized symmetriccovariance corresponding to µ = µ f ,f ,H is (cid:98) µ ( n ) = (cid:98) µ ( n ) (cid:16) [( x i , x i )] ni =1 ; f , f , H (cid:17) := (cid:18) nm (cid:19) − (cid:88) i
8f the kernel functions f and f are orthogonally invariant, then it also holds that all SGSCs ofthe form (cid:98) µ ( n ) ( · ; f , f , H ) are orthogonally invariant, in the sense of remaining unaffected when theinput [( x i , x i )] ni =1 is transformed into [( O x i , O x i )] ni =1 where O ∈ R d × d and O ∈ R d × d are arbitrary orthogonal matrices. Proposition 2.4 thus also yields the orthogonal invariance of theSGSCs associated with kernels (a) and (c)–(e) in Proposition 2.3.The SGSCs associated with the examples listed in Proposition 2.3, unfortunately, all fail tosatisfy the crucial property of distribution-freeness. However, as we show in Section 4, distribution-freeness, along with transformation invariance, can be obtained by computing SGSCs from (func-tions of) the center-outward ranks and signs of the observations. This section briefly introduces the concepts of center-outward ranks and signs to be used in thesequel. The main purpose is to fix notation and terminology; for a comprehensive coverage, we referto Hallin et al. (2020a).We are concerned with defining multivariate ranks for a sample of d -dimensional observationsdrawn from a distribution in the class P ac d of absolutely continuous probability measures on R d with d ≥ . Let S d and S d − denote the open unit ball and the unit sphere, respectively, in R d .Denote by U d the spherical uniform measure on S d , that is, the product of the uniform measureon [0 , (for the distance to the origin) and the uniform on S d − (for the direction). The push-forward of a measure Q by a measurable transformation T is denoted as T (cid:93) Q . Definition 3.1 (Center-outward distribution function) . The center-outward distribution function of a probability measure P ∈ P ac d is the P -almost surely unique function F ± that(i) is the gradient of a convex function on R d ,(ii) maps R d to the open unit ball S d ,(iii) pushes P forward to U d (i.e., such that F ± (cid:93) P = U d ).The sample counterpart F ( n ) ± of F ± is based on an n -tuple of data points z , . . . , z n ∈ R d . Thekey idea is to construct n grid points in the unit ball S d such that the corresponding discrete uniformdistribution converges weakly, as n → ∞ , to U d . For d ≥ , the construction proposed in Hallin(2017, Sec. 4.2) starts by factorizing n into n = n R n S + n , n R , n S ∈ Z > , ≤ n < min { n R , n S } , where in asymptotic scenarios n R and n S → ∞ , hence n /n → , as n → ∞ . Next consider theintersection points between– the n R hyperspheres centered at d , with radii rn R +1 , r ∈ (cid:74) n R (cid:75) , and– n S distinct unit vectors { s ( n S ) s } s ∈ (cid:74) n S (cid:75) dividing the unit circle into n S arcs of equal length π/n S for d = 2 , that are distributed as regularly as possible on the unit hypersphere for d ≥ ;the only requirement for asymptotic statements is that the uniform discrete distributionover { s ( n S ) s } n S s =1 converges weakly to the uniform distribution over the sphere S d − as n S → ∞ .Letting n := ( n R , n S , n ) , the grid G d n is defined as the set of n R n S points (cid:8) rn R +1 s ( n S ) s (cid:9) r ∈ (cid:74) n R (cid:75) ,s ∈ (cid:74) n S (cid:75) as described above and, whenever n > , the n points (cid:8) n R +1) s ( n S ) s (cid:9) s ∈S where S is chosen as arandom sample without replacement from (cid:74) n S (cid:75) . 9or d = 1 , letting n S = 2 , n R = (cid:98) n/n S (cid:99) , n = n − n R n S = 0 or 1, the grid G d n reduces to thethe points {± r/ ( n R + 1) : r ∈ (cid:74) n R (cid:75) } , along with the origin in case n = 1 .The empirical version F ( n ) ± of F ± then is defined as the optimal coupling between the observeddata points and the grid G d n . Definition 3.2 (Center-outward ranks and signs) . Let z , . . . , z n be distinct data points in R d .Let T be the collection of all bijective mappings between the set { z i } ni =1 and the grid G d n . The sample center-outward distribution function is defined as F ( n ) ± := argmin T ∈T n (cid:88) i =1 (cid:13)(cid:13)(cid:13) z i − T ( z i ) (cid:13)(cid:13)(cid:13) , (3.1)and ( n R + 1) (cid:107) F ( n ) ± ( z i ) (cid:107) and F ( n ) ± ( z i ) / (cid:107) F ( n ) ± ( z i ) (cid:107) are called the center-outward rank and center-outward sign of z i , respectively.The next proposition collects some properties of center-outward distribution functions. Proposition 3.1.
Let F ± be the center-outward distribution function of P ∈ P ac d . Then,(i) (Hallin, 2017, Proposition 4.2(i), Hallin et al., 2020a, Proposition 2.1(i),(iii)) F ± is a proba-bility integral transformation of R d , namely, Z ∼ P if and only if F ± ( Z ) ∼ U d ;(ii) (Hallin et al., 2020a, Proposition 2.1(ii)) if Z ∼ P , (cid:107) F ± ( Z ) (cid:107) is uniform over [0 , , F ± ( Z ) / (cid:107) F ± ( Z ) (cid:107) is uniform over the sphere S d − , and they are mutually independent.Writing F Z ± for the center-outward distribution function of Z ∼ P ∈ P ac d ,(iii) (Hallin et al., 2020b, Proposition 2.2) for any v ∈ R d , a ∈ R > , and orthogonal d × d matrix O , F v + a O Z ± ( v + a O z ) = OF Z ± ( z ) for all z ∈ R d . Letting Z , . . . , Z n be independent copies of Z ∼ P ∈ P ac d with center-outward distribution func-tion F ± ,(iv) (Hallin, 2017, Proposition 6.1(ii), Hallin et al., 2020a, Proposition 2.5(ii)) for any decom-position n , n R , n S of n , the random vector [ F ( n ) ± ( Z ) , . . . , F ( n ) ± ( Z n )] is uniformly distributedover all distinct arrangements of the grid G d n ;(v) (del Barrio et al., 2018, Proof of Theorem 3.1, Hallin et al., 2020a, Proof of Proposition 3.3)as n R and n S → ∞ , for every i ∈ (cid:74) n (cid:75) , (cid:13)(cid:13)(cid:13) F ( n ) ± ( Z i ) − F ± ( Z i ) (cid:13)(cid:13)(cid:13) a . s . −→ . In the sequel, it will be convenient to consider the following classes of distributions:• the class P + d of distributions P ∈ P ac d with nonvanishing probability density , namely, withLebesgue density f such that, for all D > there exist constants λ D ; f < Λ D ; f ∈ (0 , ∞ ) suchthat λ D ; f ≤ f ( z ) ≤ Λ D ; f for all (cid:107) z (cid:107) ≤ D ;• the class P conv d of distributions P ∈ P ac d with convex support supp(P) and a density thatis nonvanishing over this support, namely, with density f such that, for all D > thereexist constants λ D ; f < Λ D ; f ∈ (0 , ∞ ) such that λ D ; f ≤ f ( z ) ≤ Λ D ; f for all z ∈ supp(P) with (cid:107) z (cid:107) ≤ D ; 10 the class P ± d of distributions P ∈ P ac d that are push-forwards of U d of the form P = ∇ Υ (cid:93) U d ( ∇ Υ the gradient of a convex function) and a homeomorphism from the punctured ball S d \{ d } to ∇ Υ( S d \{ d } ) such that ∇ Υ( { d } ) is compact, convex, and has Lebesgue measure zero;• the class P d of all distributions P ∈ P ac d such that, denoting by F ( n ) ± the sample distributionfunction computed from an n -tuple Z , . . . , Z n of independent copies of Z ∼ P , max ≤ i ≤ n (cid:13)(cid:13)(cid:13) F ( n ) ± ( Z i ) − F ± ( Z i ) (cid:13)(cid:13)(cid:13) a . s . −→ as n R and n S → ∞ (a Glivenko-Cantelli property).By Hallin (2017, Proposition 5.1), del Barrio et al. (2018, Theorem 3.1), del Barrio et al. (2019,Theorem 2.5), and Hallin et al. (2020a, Proposition 2.3), the following inclusions hold among thesefour classes of distributions. Proposition 3.2.
It holds that P + d (cid:40) P conv d (cid:40) P ± d ⊆ P d (cid:40) P ac d . We are now ready to present our proposed family of center-outward dependence measures based onthe notions of GSCs and center-outward ranks and signs. Throughout, we denote by ( X , X ) a pairof random vectors with P X ∈ P ac d and P X ∈ P ac d and by ( X , X ) , ( X , X ) , . . . , ( X n , X n ) an n -tuple of independent copies of ( X , X ) . Let F k, ± denote the center-outward distributionfunction of X k , and write F ( n ) k, ± ( · ) for the sample center-outward distribution function correspondingto { X ki } ni =1 , k = 1 , .Our ideas build on Shi et al. (2020) and, in slightly different form, also on Deb and Sen (2019),where the authors introduce multivariate-rank-based dependence measures of the form µ dCov ( X , X ) := µ f dCov1 ,f dCov2 ,H ∗ ( F , ± ( X ) , F , ± ( X )) with sample counterparts µ ∼ ( n )dCov = (cid:98) µ ( n ) (cid:16)(cid:2)(cid:0) F ( n )1 , ± ( X i ) , F ( n )2 , ± ( X i ) (cid:1)(cid:3) ni =1 ; f dCov1 , f dCov2 , H ∗ (cid:17) , for which f dCov k ( w k , . . . , w k ) := (cid:107) w k − w k (cid:107) , w k , . . . , w k ∈ R d k , k = 1 , corresponds to thedistance covariance kernel; see Proposition 2.3(a). Our generalization of this particular dependencemeasure involves score functions and requires some further notation. The score functions are con-tinuous functions J k , k = 1 , from the interval [0 , to the set R ≥ of nonnegative real numbers.Classical examples include the normal or van der Waerden score function J vdW ( u ) := (cid:0) F − χ d ( u ) (cid:1) / (with F χ d the χ d distribution function), the Wilcoxon score function J W ( u ) := u , and the sign testscore function J sign ( u ) := 1 . For k = 1 , , let J k ( u ) := (cid:40) J k ( (cid:107) u (cid:107) ) u (cid:107) u (cid:107) , if u ∈ S d k \{ d k } , d k , if u = d k , Define the population and sample scored center-outward distribution functions as G k, ± ( · ) := J k ( F k, ± ( · )) , and G ( n ) k, ± ( · ) := J k ( F ( n ) k, ± ( · )) , respectively. 11 efinition 4.1 (Center-outward GSCs) . For any GSC µ = µ f ,f ,H and score functions J and J ,define the population and sample center-outward dependence measures as µ ± ( X , X ) = µ ± ; J ,J ,f ,f ,H ( X , X ) := µ f ,f ,H ( G , ± ( X ) , G , ± ( X )) (4.1)and W ∼ ( n ) µ = W ∼ ( n ) J ,J ,µ f ,f ,H := (cid:98) µ ( n ) (cid:16)(cid:2)(cid:0) G ( n )1 , ± ( X i ) , G ( n )2 , ± ( X i ) (cid:1)(cid:3) ni =1 ; f , f , H (cid:17) , (4.2)respectively. Remark 4.1.
Plugging the center-outward ranks and signs into the multivariate dependence mea-sures listed in Section 2 in combination with various score functions, one immediately obtainsa large variety of center-outward rank-based GSCs. For instance, f = f dCov1 , f = f dCov2 , J ( u ) = J ( u ) = u , and H = H ∗ yield the rank-based distance covariance. As a side note, Bergsma(2006, 2011) studied distance covariance when d = d = 1 (equivalent to κ in his notation) asan extension of the traditional Pearson covariance. There, Bergsma (2006, Lemma 10) implies thatwhen d = d = 1 and J ( u ) = J ( u ) = u , µ f dCov1 ,f dCov2 ,H ∗ ( G X , ± ( X ) , G X , ± ( X )) = (cid:90) ( F ( X ,X ) − F X F X ) d F X d F X , where the right-hand side is precisely Blum–Kiefer–Rosenblatt’s R as proposed in Blum et al.(1961) (see, also, Proposition 2.2(c) for its definition as a GSC). Therefore, for d = d = 1 and J ( u ) = J ( u ) = u , the rank-based distance covariance coincides with the Blum–Kiefer–Rosenblatt R dependence measure.Below is a list of center-outward rank-based versions of the widely used dependence measuresfrom Proposition 2.3; each case can be combined with arbitrary scores J and J . Example 4.1.
A list of center-outward rank-based SGSCs.(i) Rank-based distance covariance W ∼ ( n )dCov := (cid:18) n (cid:19) − (cid:88) i < ···
A score function J : [0 , → R ≥ is called weakly regular if it is continuousover [0 , and satisfies (cid:82) J ( u )d u > . If moreover J is Lipschitz-continuous, strictly monotone,and satisfies J (0) = 0 , it is called strongly regular .The regularity properties of the standard van der Waerden, Wilcoxon and sign test score func-tions are as follows. Proposition 4.1.
It holds that(i) the normal (van der Waerden) and the sign test score functions are weakly but not stronglyregular;(ii) the Wilcoxon score function is strongly regular.
The following proposition then summarizes the main properties of center-outward GSCs andtheir rank-based sample counterparts.
Proposition 4.2.
Let ( X , X ) , . . . , ( X n , X n ) denote an n -tuple of independent copiesof ( X , X ) ∼ P ( X , X ) with marginals P X ∈ P ac d and P X ∈ P ac d . Consider µ ± := µ ± ; J ,J ,f ,f ,H and W ∼ ( n ) µ := W ∼ ( n ) J ,J ,µ f ,f ,H as defined in (4.1) and (4.2) ; let µ ∗ ± := µ ± ; J ,J ,f ,f ,H m ∗ . Then,(i) (Exact distribution-freeness) under independence between X and X , the distribution of W ∼ ( n ) µ does not depend on P X nor P X ;(ii) (Transformation invariance) the dependence measure µ ± satisfies µ ± ( X , X ) = µ ± (cid:0) v + a O X , v + a O X (cid:1) (4.3) for any orthogonal matrix O k ∈ R d k × d k , vector v k ∈ R d k , and scalar a k ∈ R > provided thatthe kernels f and f are orthogonally invariant;(iii) (I- and D-Consistency)(a) µ ± is I-consistent in the family (cid:8) P ( X , X ) (cid:12)(cid:12) P X k ∈ P ac d k and E (cid:2) f k (cid:0) [ G k, ± ( X ki )] mi =1 (cid:1)(cid:3) < ∞ for k = 1 , (cid:9) ; (b) if the pair of kernels is D-consistent in the class (cid:8) P ( X , X ) ∈ P ac d + d (cid:12)(cid:12) E (cid:2) f k ( X k , . . . , X km ) (cid:3) < ∞ for k = 1 , (cid:9) (cf. Lemma 2.1), then µ ∗ ± is D-consistent in the family P ac d ,d , ∞ := (cid:8) P ( X , X ) ∈ P ac d + d (cid:12)(cid:12) E (cid:2) f k (cid:0) G k, ± ( X k ) , . . . , G k, ± ( X km ) (cid:1)(cid:3) < ∞ for k = 1 , (cid:9) (4.4) provided that the score functions J and J are strictly monotone; iv) (Strong consistency) if f k (cid:0) [ G ( n ) k, ± ( X ki (cid:96) )] m(cid:96) =1 (cid:1) and f k (cid:0) [ G k, ± ( X ki (cid:96) )] m(cid:96) =1 (cid:1) are almost surely bounded,that is, if there exists a constant C (depending on f k , J k , and P X k ) such that for any n , P (cid:104) (cid:12)(cid:12)(cid:12) f k (cid:16)(cid:2) G ( n ) k, ± ( X ki (cid:96) ) (cid:3) m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12) ≤ C (cid:105) = 1 = P (cid:104) (cid:12)(cid:12)(cid:12) f k (cid:16)(cid:2) G k, ± ( X ki (cid:96) ) (cid:3) m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12) ≤ C (cid:105) , k = 1 , and ( n ) − m (cid:88) [ i ,...,i m ] ∈ I nm (cid:12)(cid:12)(cid:12) f k (cid:16)(cid:2) G ( n ) k, ± ( X ki (cid:96) ) (cid:3) m(cid:96) =1 (cid:17) − f k (cid:16)(cid:2) G k, ± ( X ki (cid:96) ) (cid:3) m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12) a . s . −→ , k = 1 , , (4.5) then W ∼ ( n ) µ = W ∼ ( n ) J ,J ,µ f ,f ,H a . s . −→ µ ± ( X , X ) . (4.6) Theorem 4.1 (Examples) . As long as P X ∈ P d , P X ∈ P d , and J , J are strongly regular, allthe kernel functions in Proposition 2.3(a)–(e) satisfy Condition (4.5) . Remark 4.2.
Notice that (4.3) may not hold for the center-outward version of Hoeffding’s multivari-ate marginal ordering D (cf. Proposition 2.3(b)): consider, for instance, X = ( Z , Z ) and X = Z ,where Z and Z are independent standard normal, a = a = 1 , v = , v = 0 , O = (cid:18) √ / −√ / √ / √ / (cid:19) , O = 1 , and J ( u ) = J ( u ) = u . Remark 4.3.
Examining Theorem 4.1, it is evident that the sample center-outward dependencemeasures with normal score functions, unfortunately, do not necessarily satisfy (4.6) although, inview of Proposition 4.2(iii), its population counterpart is both I- and D-consistent within a suffi-ciently large nonparametric family of distributions. We believe that this lack of strong consistencyis not essential, though, and conjecture that (4.6) will still hold for normal scores. Establishing thisproperty, however, is likely to involve nontrivial modifications of the proof of Proposition 3.2.We conclude this section with some computational issues. Two parts are potentially compu-tationally costly: (i) calculating the center-outward ranks and signs in (3.1), and (ii) computinga GSC (cid:98) µ ( n ) ( · ) with n inputs. Problem (3.1) is a linear sum assignment problem (LSAP): given n points z i and u j in R d (here, u j ∈ S d ) and n nonnegative costs c ij := (cid:107) z i − u j (cid:107) , i, j ∈ (cid:74) n (cid:75) , itconsists of finding an optimal matching , i.e., a bijection σ from (cid:74) n (cid:75) to itself such that (cid:80) ni =1 c iσ ( i ) isminimized. The time complexity of computing the optimal matching and nearly optimal matchingsis summarized in the following proposition. Proposition 4.3.
The optimal matching problem (3.1) yielding [ G ( n )1 , ± ( X i )] ni =1 and [ G ( n )2 , ± ( X i )] ni =1 can be solved in O ( n ) time via the refined Hungarian algorithm (Dinic and Kronrod, 1969; Tomizawa,1971; Edmonds and Karp, 1970, 1972). Moreover,(i) if we assume that c ij , i, j ∈ (cid:74) n (cid:75) all are integers and bounded by some (positive) integer N ,which can be achieved by scaling and rounding, then there exists an optimal matching algorithmsolving the problem in O ( n / log( nN )) time (Gabow and Tarjan, 1989);(ii) if d = 2 and c ij , i, j ∈ (cid:74) n (cid:75) all are integers and bounded by some (positive) integer N , thereexists an exact an optimal matching algorithm solving the problem in O ( n / δ log( N )) timefor any arbitrarily small constant δ > (Sharathkumar and Agarwal, 2012); iii) if d ≥ , there is an algorithm computing a (1 + (cid:15) ) -approximate perfect matching in O (cid:16) n / (cid:15) − τ ( n, (cid:15) ) log ( n/(cid:15) ) log (max c ij / min c ij ) (cid:17) time , where a (1 + (cid:15) ) -approximate perfect matching for (cid:15) > is a bijection σ from (cid:74) n (cid:75) to itself suchthat (cid:80) ni =1 c iσ ( i ) is no larger than (1 + (cid:15) ) times the cost of the optimal matching and τ ( n, (cid:15) ) is the query and update time of an (cid:15)/c -approximate nearest neighbor data structure for someconstant c > (Agarwal and Sharathkumar, 2014). Once [ G ( n )1 , ± ( X i )] ni =1 and [ G ( n )2 , ± ( X i )] ni =1 are obtained, a naive approach to the computationof W ∼ ( n ) , on the other hand, requires at most a O ( n m ) time complexity. Great speedups are possible,however, in particular cases and the next proposition summarizes the results for the various center-outward rank-based statistics listed in Example 4.1. Proposition 4.4.
Assuming that [ G ( n )1 , ± ( X i )] ni =1 and [ G ( n )2 , ± ( X i )] ni =1 have been previously obtained,one can compute(i) W ∼ ( n )dCov in O ( n ) time (Székely and Rizzo, 2013, Definition 1, Székely and Rizzo, 2014, Defi-nition 2, Proposition 1, Huo and Székely, 2016, Lemma 3.1)(ii) W ∼ ( n ) M in O ( n (log n ) d + d − ) time (Weihs et al., 2018, p. 557, end of Sec. 5.2),(iii) W ∼ ( n ) D in O ( n ) time (Zhu et al., 2017, Theorem 1),(iv) W ∼ ( n ) R in O ( n ) time as proved in Section 7.3.4,(v) W ∼ ( n ) τ ∗ in O ( n ) time by definition.If, moreover, approximate values are allowed, one can compute(i) approximate W ∼ ( n )dCov in O ( nK log n ) time (Huo and Székely, 2016, Theorem 4.1, Chaudhuriand Hu, 2019, Theorem 3.1),(ii) approximate W ∼ ( n ) D in O ( nK log n ) time (Weihs et al., 2018, p. 557),(iii) approximate W ∼ ( n ) R in O ( nK log n ) time (Drton et al., 2020, Equation (6.1), Weihs et al., 2018,p. 557, Even-Zohar and Leng, 2019, Corollary 4),(iv) approximate W ∼ ( n ) τ ∗ in O ( nK log n ) time (Even-Zohar and Leng, 2019, Corollary 4).These approximations consider random projections to speed up computation; K stands for the num-ber of random projections. See also Huang and Huo (2017, Sec. 3.1). Besides quantifying the dependence between two groups of random variables, the center-outwardGSCs introduced in Section 4 allow for constructing tests of the null hypothesis H : X and X are mutually independent , based on a sample ( X , X ) , . . . , ( X n , X n ) of n independent copies of ( X , X ) . Shi et al.(2020), and, in a slightly different manner, Deb and Sen (2019), studied the particular case of atest based on the Wilcoxon version of the center-outward distance covariance W ∼ ( n )dCov . Their work,15mong other results, provides the limiting null distribution of the test statistic, with proofs involvingcombinatorial limit theorems and “brute-force” calculation of permutation statistics. Although thisled to a fairly general combinatorial non-central limit theorem (Shi et al., 2020, Theorems 4.1and 4.2), the derivation is not intuitive and difficult to generalize.In this paper, we take a different and more powerful approach to the asymptotic analysis ofrank-based center-outward GSCs. Compared with Shi et al. (2020) and Deb and Sen (2019), thatnew approach resolves three main issues:(i) Intuitively, the asymptotic behavior of sample center-outward dependence measures shouldbe predicted by that of oracle versions in which the observations are transformed using theunknown actual center-outward distribution function F ± rather than its sample version F ( n ) ± .Here, we prove that this intuition is indeed correct by showing that sample center-outwarddependence measures and their oracle versions are asymptotically equivalent.(ii) Prior work does not perform any power analysis for the new center-outward tests. Here, wefill this gap by proving that these center-outward tests in fact are rate-optimal in the contextof the classical Konijn alternatives .(iii) Finally, our center-outward rank-based tests allow for the incorporation of score functions,which potentially considerably reinforce their performance.This novel approach rests on a general asymptotic representation result that extends to center-outward ranks and the multivariate setting the classical Hájek representation method (Hájek andŠidák, 1967), thereby simplifying the derivation of asymptotic null distributions and, via a nontrivialuse of Le Cam’s third lemma for non-normal limits, enabling local power analysis.
In order to develop our multivariate asymptotic representation result, we first introduce formallythe oracle counterpart to the sample center-outward GSC W ∼ ( n ) µ . Definition 5.1 (Oracle sample center-outward GSCs) . The oracle version of the center-outwardrank-based GSC W ∼ ( n ) J ,J ,µ f ,f ,H associated with the GSC µ = µ f ,f ,H is W ( n ) µ = W ( n ) J ,J ,µ f ,f ,H := (cid:98) µ ( n ) (cid:16)(cid:2)(cid:0) G , ± ( X i ) , G , ± ( X i ) (cid:1)(cid:3) ni =1 ; f , f , H (cid:17) . In contrast to the rank-based W ∼ ( n ) µ = W ∼ ( n ) J ,J ,µ f ,f ,H , the oracle W ( n ) µ = W ( n ) J ,J ,µ f ,f ,H in-volves G , ± and G , ± , which are the population scored center-outward distribution functions, hencecannot be computed from the observations. However, the limiting null distribution of W ( n ) , unlikethat of W ∼ ( n ) , follows from standard theory for degenerate U-statistics (Serfling, 1980, Chap. 5.5.2).This point can be summarized as follows. Proposition 5.1.
Let µ = µ f ,f ,H m ∗ be a GSC with m ≥ . Let the kernels f , f and the scorefunctions J , J satisfy < Var( g k ( W k , W k )) < ∞ , k = 1 , , (5.1) where W ki := J k ( U ki ) with ( U i , U i ) , i ∈ (cid:74) m (cid:75) independent with distribution U d ⊗ U d , g k ( w k , w k ) := E (cid:104) f k,H m ∗ (cid:16) w k , w k , W k , W k , . . . , W km (cid:17)(cid:105) , (5.2)16 nd f k,H m ∗ := (cid:88) σ ∈ H m ∗ sgn( σ ) f k ( x kσ (1) , . . . , x kσ ( m ) ) , k = 1 , . Then, under the hypothesis H that X ∼ P X ∈ P ac d and X ∼ P X ∈ P ac d are independent, nW ( n ) µ = nW ( n ) J ,J ,µ f ,f ,Hm ∗ (cid:32) ∞ (cid:88) v =1 λ µ,v ( ξ v − , where [ ξ v ] ∞ v =1 are independent standard Gaussian random variables and [ λ µ,v ] ∞ v =1 are the non-zeroeigenvalues of the integral equation E (cid:104) g ( w , W ) g ( w , W ) ψ (cid:16) ( W , W ) (cid:17)(cid:105) = λψ (cid:16) ( w , w ) (cid:17) . (5.3)The examples of tests we consider reject for large values of a test statistic that unbiasedlyestimates a nonnegative (I- and D-)consistent dependence measure. In these examples, and inparticular the one from Shi et al. (2020), it holds thatall eigenvalues of the integral equation (5.3) are non-negative. (5.4)However, it should be noted that, in view of the following multivariate representation result, a testof H can be implemented also when (5.4) does not hold.The following asymptotic representation is the main result of this section. Theorem 5.1 (Multivariate asymptotic representation) . Let f , f be kernel functions of order m ≥ ,and let J , J be weakly regular score functions. Denoting by U ( n ) d k the uniform discrete distribu-tion over the grid G d k n , let W ( n ) ki := J k ( U ( n ) ki ) where ( U ( n )1 i , U ( n )2 i ) , i ∈ (cid:74) m (cid:75) are independent withdistribution U ( n ) d ⊗ U ( n ) d . Define g k , k = 1 , , as in (5.2) , and g ( n ) k ( w k , w k ) := E (cid:104) f k,H m ∗ (cid:16) w k , w k , W ( n ) k , W ( n ) k , . . . , W ( n ) km (cid:17)(cid:105) , k = 1 , . (5.5) Assume that f k and g k are Lipschitz-continuous, g ( n ) k converges uniformly to g k , sup i ,...,i m ∈ (cid:74) m (cid:75) E[ f k ([ W ki (cid:96) ] m(cid:96) =1 ) ] < ∞ , and (cid:90) J k ( u )d u < ∞ , k = 1 , . (5.6) Then, under the hypothesis H that X ∼ P X ∈ P ac d and X ∼ P X ∈ P ac d are independent, thesample center-outward dependence measure W ∼ ( n ) µ based on the GSC µ = µ f ,f ,H m ∗ and its oracleversion W ( n ) µ are asymptotically equivalent in the sense that W ∼ ( n ) µ − W ( n ) µ = o P ( n − ) as n R , n S → ∞ . Theorem 5.2.
The conclusion of Theorem 5.1 still holds if condition (5.6) is replaced by f k is uniformly bounded, and almost everywhere continuous , k = 1 , . (5.7) Proposition 5.2 (Examples) . For the kernel functions considered in Proposition 2.3(b)–(e), condi-tions (5.1) , (5.4) , and (5.7) are satisfied as soon as X ∼ P X ∈ P ac d is independent of X ∼ P X ∈ P ac d and the scores J , J are weakly regular. If, in addition, J and J also are square-integrable(viz., (cid:82) J k ( u )d u < ∞ , k = 1 , ), then Conditions (5.1) , (5.4) , and (5.6) are satisfied for thekernel in Proposition 2.3(a). W ∼ ( n ) µ . Proposition 5.3 (Limiting null distribution) . Suppose the conditions in Proposition 5.1 and The-orem 5.1 hold. Then, for µ = µ f ,f ,H m ∗ with m ≥ , under the hypothesis H that X ∼ P X ∈ P ac d and X ∼ P X ∈ P ac d are independent, nW ∼ ( n ) µ = nW ∼ ( n ) J ,J ,µ f ,f ,Hm ∗ (cid:32) ∞ (cid:88) v =1 λ µ,v ( ξ v − (5.8) with [ λ µ,v ] ∞ v =1 and [ ξ v ] ∞ v =1 as defined in Proposition 5.1. Let α ∈ (0 , be any pre-specified significance level, and let W ∼ ( n ) µ be as in Theorem 5.1. Itfollows from (5.8), recalling also the discussions right after Proposition 5.1, that the sequence oftests T ( n ) µ,α := (cid:0) nW ∼ ( n ) µ > q µ, − α (cid:1) rejecting the null hypothesis H of independence whenever nW ∼ ( n ) µ exceeds q µ, − α := inf (cid:110) x ∈ R : P (cid:16) ∞ (cid:88) v =1 λ µ,v ( ξ v − ≤ x (cid:17) ≥ − α (cid:111) , (5.9)where [ λ µ,v ] ∞ v =1 and [ ξ v ] ∞ v =1 are as in Proposition 5.1, has asymptotic level α irrespective of P X ∈ P ac d and P X ∈ P ac d . The following proposition summarizes the properties of T ( n ) µ,α . Proposition 5.4 (Uniform validity and consistency) . Let the score functions J and J be weaklyregular and let µ = µ f ,f ,H m ∗ denote a GSC with m ≥ such that Conditions (5.1) and one of (5.6) and (5.7) hold. Then, lim n →∞ P( T ( n ) µ,α = 1) = α for any P ∈ P ac d ⊗P ac d , i.e., any P such that X ∼ P X ∈ P ac d and X ∼ P X ∈ P ac d are independent.Furthermore, by exact distribution-freeness (Proposition 4.2(i)), lim n →∞ sup P ∈P d ⊗P d P( T ( n ) µ,α = 1) = α. If, in addition, the pair of kernels ( f , f ) is D-consistent, the score functions J , J are strictlymonotone, and (4.6) holds, then, for any fixed alternative P ( X , X ) ∈ P ac d ,d , ∞ defined in (4.4) , lim n →∞ P( T ( n ) µ,α = 1) = 1 . In this section, we investigate the power of the proposed tests from an asymptotic minimax per-spective. To this end, we consider parametrized families of alternatives extending the so-calledbivariate
Konijn alternatives (Konijn, 1956). These alternatives are classical in the context of test-ing for multivariate independence, where they also have been considered by Gieser (1993), Gieserand Randles (1997), Taskinen et al. (2003, 2004, 2005), and Hallin and Paindaveine (2008). Withinthese families, we establish the rate-optimality (the rate here is the usual root- n rate) of our tests.Other families of bivariate alternatives also have been considered (see Kössler and Rödel (2007) fora survey and Dhar et al. (2016) for a more recent result) and similarly could be extended to the18ultivariate case; as far as rate-optimality is concerned, the results and the proofs would be quitesimilar, though.Konijn families are constructed as follows. Let X ∗ ∼ P X ∗ ∈ P ac d and X ∗ ∼ P X ∗ ∈ P ac d be two(without loss of generality) mean zero (unobserved) independent random vectors with densities q and q , respectively; let G ∗ , ± and G ∗ , ± denote their respective population scored center-outward dis-tribution functions, P X ∗ ∈ P ac d + d their joint distribution, q X ∗ ( x ) = q X ∗ (( x , x )) = q ( x ) q ( x ) their joint density. Define, for δ ∈ R , X = (cid:32) X X (cid:33) := (cid:32) I d δ M δ M I d (cid:33)(cid:32) X ∗ X ∗ (cid:33) = A δ (cid:32) X ∗ X ∗ (cid:33) = A δ X ∗ , (5.10)where M ∈ R d × d and M ∈ R d × d are two deterministic matrices. For δ = 0 , the matrix A δ is the identity and, thus, is invertible. It follows by continuity that A δ is also invertible for δ in asufficiently small neighborhood Θ of . For δ ∈ Θ , the density of X can be expressed as q X ( x ; δ ) = (cid:12)(cid:12) det( A δ ) (cid:12)(cid:12) − q X ∗ ( A − δ x ) , which is differentiable with respect to δ . Let L ( x ; δ ) := q X ( x ; δ ) q X ( x ; 0) and L (cid:48) ( x ; δ ) := ∂∂δ L ( x ; δ ) . The following additional assumptions will be made on the generating scheme (5.10).
Assumption 5.1.
It is assumed that(i) the distributions of X have a common support for all δ ∈ Θ so that, without loss of generality,we can assume that X := { x : q X ( x ; δ ) > } does not depend on δ ;(ii) the gradient ∇ q X ∗ of x (cid:55)→ q X ∗ ( x ) exists almost everywhere over X with E[ ∇ q X ∗ ( X ∗ ) /q X ∗ ( X ∗ )] =0 ;(iii) the Fisher information I X (0) := E[ L (cid:48) ( X ; 0) ] = (cid:82) ( L (cid:48) ( x ; 0)) q X ( x , x of X relative to δ at δ = 0 is finite and stricly positive. Proposition 5.5 (Examples) . If one of the following two conditions(i) X ∗ and X ∗ are elliptical with centers d and d and covariances Σ and Σ , respectively,that is, if q k ( x k ) ∝ φ k (cid:16) x (cid:62) k Σ − k x k (cid:17) , k = 1 , . where φ k is chosen such that Var( X ∗ k ) = Σ k and E (cid:2) (cid:107) Z ∗ k (cid:107) ρ k ( (cid:107) Z ∗ k (cid:107) ) (cid:3) < ∞ , k = 1 , , (5.11) where ρ k ( t ) := φ (cid:48) k ( t ) /φ k ( t ) and Z ∗ k has density function proportional to φ k ( (cid:107) z k (cid:107) ) (ii) X ∗ and X ∗ are centered multivariate normal or follow centered multivariate t -distributionswith degrees of freedom (not necessarily integer-valued) strictly greater than two, respectivelyholds, then Assumption 5.1 is satisfied for any M , M such that Σ M (cid:62) + M Σ (cid:54) = . For a local power analysis, we consider a sequence of local alternatives H ( n )1 indexed by parametervalues δ ( n ) := n − / δ with δ (cid:54) = 0 . In this local model, testing the null hypothesis of independencereduces to testing H : δ = 0 versus H : δ (cid:54) = 0 . For given P X ∗ , P X ∗ , M , and M , we obtain thefollowing results on the power of the tests T ( n ) µ,α . 19 heorem 5.3 (Power analysis) . Consider a GSC µ = µ f ,f ,H m ∗ with m ≥ and weakly regularscore functions J , J . Assume that the kernel functions f , f are picked from Proposition 2.3 andthe score functions J , J satisfy conditions in Proposition 5.2. Then if Assumption 5.1 holds, forany β > , there exists some sufficiently large constant C β > only depending on β such that forall n large enough, inf | δ |≥ C β P A δ ( n ) ( T ( n ) µ,α = 1) ≥ − β, where the infimum is taken over all distributions P A δ ( n ) such that | δ | ≥ C β . Combined with the following result, the theorem yields minimax rate-optimality of the proposedtests against local alternative.
Theorem 5.4 (Rate-optimality) . For any number β > satisfying α + β < , there exists anabsolute constant c β > such that, for all sufficiently large n , inf T ( n ) α ∈T ( n ) α sup | δ |≥ c β P A δ ( n ) ( T ( n ) α = 0) ≥ − α − β. Here the infimum is taken over all size- α tests, and the supremum over all distributions P A δ ( n ) considered in Theorem 5.3 with | δ | ≥ c β . We emphasize that, although Theorem 5.3 only considers those specific examples consideredin Example 4.1, the proof technique can generalize to more cases. A general version of the theo-rem, however, is expected to include unfortunately many highly technical conditions, and is hencepreferred to be omitted for reading easiness.We conclude this section by summarizing our results for the (sample) center-outward GSCs inExample 4.1. Table 1 gives, for each of them, an overview of the five desirable properties listed inthe Introduction. It also indicates consistency of the tests. In all cases, it is assumed that the scorefunctions involved are weakly regular.
This section is focused on illustrating the potential benefits of a non-trivial choice of score functions.The tests used in Shi et al. (2020) correspond to using Wilcoxon scores with distance covariance.Instead, we explore here the use of normal scores. The specific example we consider is a Gaussianexperiment borrowed from Example 6.1 in Shi et al. (2020).
Example 5.1.
The data are a sample of n independent copies of the multivariate normal vec-tor ( X , X ) in R d + d , mean zero, and covariance matrix Σ τ,ρ ∈ R ( d + d ) × ( d + d ) with Σ ij = Σ ji = , i = j,τ, i = 1 , j = 2 ,ρ, i = 1 , j = d + 1 , , otherwise . For τ , which is a within-group correlation, we consider three values: (a) τ = 0 , (b) τ = 0 . , and(c) τ = 0 . . Independence holds if and only if ρ , a between-group correlation, is zero.20 a b l e : P r o p e rt i e s o f t h ece n t e r - o u t w a r d G S C s i n E x a m p l e . w i t h w e a k l y r e g u l a r s c o r e f un c t i o n s J k . W ∼ ( n ) µ W ∼ ( n ) d C o v W ∼ ( n ) M W ∼ ( n ) D W ∼ ( n ) R W ∼ ( n ) τ ∗ ( ) D i s tr i bu t i o n - f r ee n e ss P ( X , X ) ∈ P a c d ⊗ P a c d ( a ) P ( X , X ) ∈ P a c d ⊗ P a c d P ( X , X ) ∈ P a c d ⊗ P a c d P ( X , X ) ∈ P a c d ⊗ P a c d P ( X , X ) ∈ P a c d ⊗ P a c d ( ) T r a n s f o r m a t i o n i n v a r i a n ce O rt h ogo n a l tr a n s f ., s h i f t s , a nd g l o b a l s c a l e s Sh i f t s a nd g l o b a l s c a l e s O rt h ogo n a l tr a n s f ., s h i f t s , a nd g l o b a l s c a l e s O rt h ogo n a l tr a n s f ., s h i f t s , a nd g l o b a l s c a l e s O rt h ogo n a l tr a n s f ., s h i f t s , a nd g l o b a l s c a l e s ( ) D - c o n s i s t e n c y J k s tr i c t l y m o n o t o n e a nd i n t e g r a b l e , P ( X , X ) ∈ P a c d + d ( b ) J k s tr i c t l y m o n o t o n e , P ( X , X ) ∈ P a c d + d J k s tr i c t l y m o n o t o n e , P ( X , X ) ∈ P a c d + d J k s tr i c t l y m o n o t o n e , P ( X , X ) ∈ P a c d + d J k s tr i c t l y m o n o t o n e , P ( X , X ) ∈ P a c d + d ( ’ ) C o n s i s t e n c y o f t e s t J k s tr o n g l y r e g u l a r , P ( X , X ) ∈ P d , d ( c ) J k s tr o n g l y r e g u l a r , P ( X , X ) ∈ P d , d J k s tr o n g l y r e g u l a r , P ( X , X ) ∈ P d , d J k s tr o n g l y r e g u l a r , P ( X , X ) ∈ P d , d J k s tr o n g l y r e g u l a r , P ( X , X ) ∈ P d , d ( ) E ffi c i e n c y J k s q u a r e - i n t e g r a b l e J k w e a k l y r e g u l a r ( a s a ss u m e d ) J k w e a k l y r e g u l a r ( a s a ss u m e d ) J k w e a k l y r e g u l a r ( a s a ss u m e d ) J k w e a k l y r e g u l a r ( a s a ss u m e d ) E x a c t d ∨ d = O ( n ) O ( n / + δ l og N ) ( d ) O ( n ) O ( n ) O ( n ) ( ) d ∨ d = O ( n / l og ( n N )) ( d ) O ( n / l og ( n N )) O ( n ) O ( n ) O ( n ) F a s t a pp r o x i m a t i o n O ( n / Ω ∨ n K l og n ) ( d ) O ( n / Ω ) O ( n / Ω ∨ n K l og n ) O ( n / Ω ∨ n K l og n ) O ( n / Ω ∨ n K l og n ) ( a ) P a c d ⊗ P a c d i s t h e f a m il y o f a ll P ( X , X ) s u c h t h a t X , X i nd e p e nd e n t , P X ∈ P a c d a nd P X ∈ P a c d ( b ) P a c d + d i s t h e f a m il y o f a ll a b s o l u t e l y c o n t i nu o u s d i s tr i bu t i o n s o n R d + d ( c ) P d , d : = (cid:8) P ( X , X ) ∈ P a c d + d | P X ∈ P d , P X ∈ P d (cid:9) ( d ) R ec a ll t h e n o t a t i o n s u s e d i n P r o p o s i t i o n s . nd . . H e r e w e a ss u m e w i t h o u t l o ss o f g e n e r a li t y t h a t c i j , i , j ∈ (cid:74) n (cid:75) a r e a ll i n t e g e r s a ndb o und e db y i n t e g e r N , δ i ss o m e a r b i tr a r il y s m a ll c o n s t a n t , Ω i s d e fi n e d a s (cid:15) − τ ( n , (cid:15) ) l og ( n / (cid:15) ) l og ( m a x c i j / m i n c i j ) , a nd K i ss u ffi c i e n t l y l a r g e ; a s u s u a l, q ∨ q s t a nd s f o rt h e m i n i m u m o f t w o q u a n t i t i e s q a nd q . , simulations. The underlying nominal significance level is . , and the sample size is chosen as n ∈ { , , , } . The dimensions are d = d ∈ { , , , } , and the parameter ρ in thetrue covariance matrix takes values ρ ∈ { , . , . . . , . } . The permutational critical values fortests (i) and (ii) were computed on the basis of n permutations.It is evident from the figures that, in this Gaussian experiment, the performance of the nor-mal score–based test (iv) is uniformly better than that of the Wilcoxon score–based one (iii); thatsuperiority increases with the dimension and decreases with the within-group dependence τ . Thesuperiority of both center-outward rank-based tests (iii) and (iv) over the traditional distance co-variance test and its marginal rank version is quite significant for high values of the within-groupcorrelation τ . In this paper we have given a general framework to develop dependence measures and associatedindependence tests that leverage the new concept of center-outward ranks and signs. The resultingtests are strictly distribution-free, hence can be implemented in relatively small samples. Via theuse of a flexible class of generalized symmetric covariances and the incorporation of score functions,our framework allows for a variety of consistent dependence measures. This, as our numericalexperiments demonstrate, can lead to significant gains in power.The theory we develop facilitates the derivation of asymptotic distributions yielding easily com-putable approximate critical values. The key result is an asymptotic representation that also allowsus to establish, for the first time, a rate-optimality property for tests based on center-outward ranksand signs.While our theory covers a wide range of settings, there remain important issues to be resolvedin future work. In particular, the current theory (cf. Theorem 4.1) has not yet confirmed the con-sistency of the normal-scores based test (against all types of dependence) although such consistencyis to be expected. 22 .00 0.05 0.10 0.15 . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r n = 216 n = 432 n = 864 n = 1728 ( p , q ) = ( , )( p , q ) = ( , )( p , q ) = ( , )( p , q ) = ( , ) Figure 1: Empirical powers of the four competing tests in Example 5.1(a) ( τ = 0 , no within-group correlation). The y -axis represents rejection frequencies based on 1,000 replicates, the x -axisrepresents ρ (the between-group correlation), and the blue, green, red, and gold lines representthe performance of (i) Szekely and Rizzo’s original distance covariance test, (ii) Lin’s marginal rankversion of the distance covariance test, (iii) Shi–Drton–Han’s center-outward Wilcoxon version of thedistance covariance test, and (iv) the center-outward normal-score version of the distance covariancetest, respectively. 23 .00 0.05 0.10 0.15 . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r n = 216 n = 432 n = 864 n = 1728 ( p , q ) = ( , )( p , q ) = ( , )( p , q ) = ( , )( p , q ) = ( , ) Figure 2: Empirical powers of the four competing tests in Example 5.1(b) ( τ = 0 . , moderate within-group correlation). The y -axis represents rejection frequencies based on 1,000 replicates, the x -axisrepresents ρ (the between-group correlation), and the blue, green, red, and gold lines representthe performance of (i) Szekely and Rizzo’s original distance covariance test, (ii) Lin’s marginal rankversion of the distance covariance test, (iii) Shi–Drton–Han’s center-outward Wilcoxon version of thedistance covariance test, and (iv) the center-outward normal-score version of the distance covariancetest, respectively. 24 .00 0.05 0.10 0.15 . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r (i) SR(ii) Lin(iii) SDH(iv) SHDH 0.00 0.05 0.10 0.15 . . . . . . r po w e r . . . . . . r po w e r . . . . . . r po w e r n = 216 n = 432 n = 864 n = 1728 ( p , q ) = ( , )( p , q ) = ( , )( p , q ) = ( , )( p , q ) = ( , ) Figure 3: Empirical powers of the for competing tests in Example 5.1(c) ( τ = 0 . , high within-group correlation). The y -axis represents rejection frequencies based on 1,000 replicates, the x -axisrepresents ρ (the between-group correlation), and the blue, green, red, and gold lines representthe performance of (i) Szekely and Rizzo’s original distance covariance test, (ii) Lin’s marginal rankversion of the distance covariance test, (iii) Shi–Drton–Han’s center-outward Wilcoxon version of thedistance covariance test, and (iv) the center-outward normal-score version of the distance covariancetest, respectively. 25 Proofs
Some further concepts and notation concerning U-statistics are needed in this section. For anysymmetric kernel h , any integer (cid:96) ∈ (cid:74) m (cid:75) , and any probability measure P Z , recall the definition h (cid:96) ( z . . . , z (cid:96) ; P Z ) := E h ( z . . . , z (cid:96) , Z (cid:96) +1 , . . . , Z m ) , of the kernel and define (cid:101) h (cid:96) ( z , . . . , z (cid:96) ; P Z ) := h (cid:96) ( z , . . . , z (cid:96) ; P Z ) − E h − (cid:96) − (cid:88) k =1 (cid:88) ≤ i < ···
The cardinality of a set S is denoted as card( S ) and its complement as S (cid:123) . We use ⇒ todenote uniform convergence of functions The cumulative distribution function and probability den-sity function of the univariate standard normal distribution are denoted by Φ and ϕ , respectively.Let (cid:107) X (cid:107) L r := (E | X | r ) /r stand for the L r -norm of a random variable X . We use L r −→ to denote conver-gence of random variables in the r -th mean. For random vectors X n , X ∈ R d , we write X n L r −→ X if (cid:107) X n − X (cid:107) L r −→ . Let ( X , A ) be a measurable space, and let P and Q be two probability mea-sures on ( X , A ) : we write P (cid:28) µ and Q (cid:28) µ if P and Q are absolutely continuous with respectto a σ -finite measure µ on ( X , A ) . The total variation and Hellinger distances between Q and P are denoted as TV(Q , P) := sup A ∈A | Q( A ) − P( A ) | and HL(Q , P) := { (cid:82) − (cid:112) dQ / dP)dP } / ,respectively. We write Q ( n ) (cid:47) P ( n ) for “ Q ( n ) is contiguous to P ( n ) ”. Proof of Propostion 2.1.
The proof is entirely similar to the proof of Proposition 2 in Weihs et al.(2018) and hence omitted.
Proof of Proposition 2.2.
Items (a)–(d) are borrowed from Weihs et al. (2018, Proposition 1).
Proof of Proposition 2.3.
Item (a) is stated in Bergsma and Dassios (2014, Sec. 3.4). Item (b) isgiven in Weihs et al. (2018, Proposition 1). Item (c) can be proved using Equation (3) in Zhuet al. (2017). Items (d) and (e) can be proved using Theorems 7.2 and 7.3 in Kim et al. (2019),respectively. 26 .1.4 Proof of Lemma 2.1
Proof of Lemma 2.1.
Provided that E[ f ] and E[ f ] exist and are finite, we have E (cid:104) k f ,f ,H m ∗ (cid:16) ( X , X ) , . . . , ( X m , X m ) (cid:17)(cid:105) = E (cid:110) (cid:88) σ ∈ H sgn( σ ) f ( X σ (1) , . . . , X σ ( m ) ) (cid:111)(cid:110) (cid:88) σ ∈ H sgn( σ ) f ( X σ (1) , . . . , X σ ( m ) ) (cid:111) = E (cid:110) f ( X , X , X , X , X , . . . , X m ) − f ( X , X , X , X , X , . . . , X m ) − f ( X , X , X , X , X , . . . , X m ) + f ( X , X , X , X , X , . . . , X m ) (cid:111) × (cid:110) f ( X , X , X , X , X , . . . , X m ) − f ( X , X , X , X , X , . . . , X m ) − f ( X , X , X , X , X , . . . , X m ) + f ( X , X , X , X , X , . . . , X m ) (cid:111) . The result follows.
Proof of Theorem 2.1.
Hoeffding (1948, Theorem 3.1) and Yanagimoto (1970, Proposition 3) provethe D-consistency of the pairs of kernels used in Proposition 2.2(b) within the class P ac d + d . Thecorresponding result for the kernels in Proposition 2.2(c) is proved on page 490 of Blum et al. (1961),and that for 2.2(d) is given in Bergsma and Dassios (2014, Theorem 1).The D-consistency, again in the class P ac d + d , of the pairs of kernels used in Proposition 2.3(a)has been shown in Székely et al. (2007, Theorem 3(i)), Lyons (2013, Theorem 3.11) and Lyons(2018, Item (iv)). The result for 2.3(b) is given in Weihs et al. (2018, Theorem 1), that for 2.3(c) inZhu et al. (2017, Proposition 1(i)), and that for 2.3(d) and 2.3(e) in Kim et al. (2019, p. 24–25). Proof of Lemma 2.2.
The lemma directly follows from the definition of µ f ,f ,H (cf. Definition 2.1)and the fact that f and f are both orthogonally invariant (cf. Condition (2.2)). Proof of Proposition 2.4.
To verify that the kernels used in Proposition 2.3(a),(c)–(e) are orthogo-nally invariant, it suffices to notice that O w − O v = O ( w − v ) , ( O w ) (cid:62) ( O v ) = w (cid:62) O (cid:62) O v = w (cid:62) v , and (cid:107) O w (cid:107) = √ w (cid:62) w = (cid:107) w (cid:107) for any orthogonal matrix O ∈ R d × d and w , v ∈ R d . Proof of Proposition 3.1.
We give an independent proof of part (iii). In view of Definition 3.1, thereexists a convex function Ψ such that F Z ± = ∇ Ψ . It is obvious that F v + a O Z ± defined implicitly by F v + a O Z ± ( v + a O z ) = OF Z ± ( z ) , satisfies (ii) and (iii) in Definition 3.1. It only remains, thus, to construct a convex function Ψ ∗ such that F v + a O Z ± = ∇ Ψ ∗ . Noting that F v + a O Z ± ( z ) = OF Z ± ( a − O − ( z − v )) , it is easy to check27hat z (cid:55)→ Ψ ∗ ( z ) := a Ψ (cid:0) a − O − ( z − v ) (cid:1) is convex, and thus continuous and almost everywheredifferentiable, with ∇ Ψ ∗ ( v + a O Z ) = O ∇ Ψ( z ) . Proof of Proposition 4.1.
Part (ii) is trivial. We next prove part (i). The function u (cid:55)→ (cid:0) F − χ d ( u ) (cid:1) / is continuous over [0 , , and (cid:90) (cid:16)(cid:0) F − χ d ( u ) (cid:1) / (cid:17) d u = (cid:90) F − χ d ( u )d u = E[ F − χ d ( U )] = d, where U is uniformly distributed over [0 , , and thus F − χ d ( U ) is chi-square distributed with d degreesof freedom and expectation d . Hence, J vdW ( u ) is weakly regular; it is not strongly regular, however,since it is unbounded. Proof of Proposition 4.2(i).
This follows immediately from Proposition 3.1(iv) and the indepen-dence between [ G ( n )1 , ± ( X i )] ni =1 and [ G ( n )2 , ± ( X i )] ni =1 under the null hypothesis. Proof of Proposition 4.2(ii).
The desired result follows from combining Lemma 2.2 and Proposi-tion 3.1(iii).
Proof of Proposition 4.2(iii).
We only prove the D-consistency part. Using Lemma 2.1, it remainsto prove that the independence of G , ± ( X ) and G , ± ( X ) implies the independence of X and X .Notice that F ± is P -almost surely invertible for any P ∈ P ac d (Ambrosio et al., 2008, Section 6.2.3and Remark 6.2.11), and so is G ± . The independence claim follows. Proof of Proposition 4.2(iv).
The main idea of the proof consists in bounding | W ∼ ( n ) µ − W µ | . Let Y ( n ) ki and Y ki stand for G ( n ) k, ± ( X ki ) and G k, ± ( X ki ) , respectively. Notice that W ∼ ( n ) J ,J ,µ f ,f ,H = ( n ) − m (cid:88) [ i ,...,i m ] ∈ I nm k f ,f ,H (cid:16) ( Y ( n )1 i , Y ( n )2 i ) , . . . , ( Y ( n )1 i m , Y ( n )2 i m ) (cid:17) ,W J ,J ,µ f ,f ,H = ( n ) − m (cid:88) [ i ,...,i m ] ∈ I nm k f ,f ,H (cid:16) ( Y i , Y i ) , . . . , ( Y i m , Y i m ) (cid:17) , k f ,f ,H (cid:16) ( x , x ) , . . . , ( x m , x m ) (cid:17) := (cid:110) (cid:88) σ ∈ H sgn( σ ) f ( x σ (1) , . . . , x σ ( m ) ) (cid:111)(cid:110) (cid:88) σ ∈ H sgn( σ ) f ( x σ (1) , . . . , x σ ( m ) ) (cid:111) . Since f k ([ Y ( n ) ki (cid:96) ] m(cid:96) =1 ) and f k ([ Y ki (cid:96) ] m(cid:96) =1 ) are almost surely bounded by some constant C J k ,f k , we deduce (cid:12)(cid:12)(cid:12) k f ,f ,H (cid:16) [( Y ( n )1 i (cid:96) , Y ( n )2 i (cid:96) )] m(cid:96) =1 (cid:17) − k f ,f ,H (cid:16) [( Y i (cid:96) , Y i (cid:96) )] m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12) ≤ card( H ) · C J ,f · (cid:88) σ ∈ H (cid:12)(cid:12)(cid:12) f (cid:16) [ Y ( n )2 σ ( i (cid:96) ) ] m(cid:96) =1 (cid:17) − f (cid:16) [ Y σ ( i (cid:96) ) ] m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12) + card( H ) · C J ,f · (cid:88) σ ∈ H (cid:12)(cid:12)(cid:12) f (cid:16) [ Y ( n )1 σ ( i (cid:96) ) ] m(cid:96) =1 (cid:17) − f (cid:16) [ Y σ ( i (cid:96) ) ] m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12) , recalling that card( H ) denotes the number of permutations in the subgroup H . Moreover, (cid:12)(cid:12)(cid:12) W ∼ ( n ) J ,J ,µ f ,f ,H − W J ,J ,µ f ,f ,H (cid:12)(cid:12)(cid:12) ≤ card( H ) · C J ,f · (cid:104) ( n ) − m (cid:88) [ i ,...,i m ] ∈ I nm (cid:12)(cid:12)(cid:12) f (cid:16) [ Y ( n )2 i (cid:96) ] m(cid:96) =1 (cid:17) − f (cid:16) [ Y i (cid:96) ] m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12)(cid:105) + card( H ) · C J ,f · (cid:104) ( n ) − m (cid:88) [ i ,...,i m ] ∈ I nm (cid:12)(cid:12)(cid:12) f (cid:16) [ Y ( n )1 i (cid:96) ] m(cid:96) =1 (cid:17) − f (cid:16) [ Y i (cid:96) ] m(cid:96) =1 (cid:17)(cid:12)(cid:12)(cid:12)(cid:105) a . s . −→ . This, together with the fact that W J ,J ,µ f ,f ,H a . s . −→ µ ± ( X , X ) and the strong consistency ofU-statistics, yields W ∼ ( n ) J ,J ,µ f ,f ,H a . s . −→ µ ± ( X , X ) . We first fix some notation and prove a property that will hold for all GSCs µ and associated kernelfunctions considered in Proposition 2.3(a)–(e). For k = 1 , , let y ( n ) ki = J ( u ∗ ( n ) ki ) , where u ∗ ( n ) ki , i ∈ (cid:74) n (cid:75) are the deterministic points forming the grid G d k n . Writing Y ( n ) ki and Y ki for G ( n ) k, ± ( X ki ) and G k, ± ( X ki ) , respectively, let us show that Ξ ( n ) k := sup ≤ i ≤ n (cid:107) Y ( n ) ki − Y ki (cid:107) a . s . −→ , k = 1 , . (7.1)Recall that, by definition of strong regularity, J k is Lipschitz-continuous with some constant L k ,strictly monotone, and satisfies J k (0) = 0 . Then we immediately have | J k ( u ) | ≤ L k for all u ∈ [0 , ,and thus Y ( n ) ki and Y ki are almost surely bounded by L k . As long as P X k ∈ P d k , in order to provethat Ξ ( n ) k a . s . −→ , it suffices to show that (cid:107) J k ( u k ) − J k ( u k ) (cid:107) ≤ L k (cid:107) u k − u k (cid:107) for any u k , u k ∈ R d k with (cid:107) u k (cid:107) , (cid:107) u k (cid:107) < . Without loss of generality, assume that (cid:107) u k (cid:107) ≤ (cid:107) u k (cid:107) . If (cid:107) u k (cid:107) = 0 , theclaim is obvious by noticing | J k ( u ) | ≤ L k u for u ∈ [0 , and then (cid:107) J k ( u k ) (cid:107) ≤ L k (cid:107) u k (cid:107) ; otherwise29e have (cid:107) J k ( u k ) − J k ( u k ) (cid:107) ≤ (cid:13)(cid:13)(cid:13) J k ( u k ) − J k (cid:16) (cid:107) u k (cid:107)(cid:107) u k (cid:107) u k (cid:17)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) J k (cid:16) (cid:107) u k (cid:107)(cid:107) u k (cid:107) u k (cid:17) − J k ( u k ) (cid:13)(cid:13)(cid:13) = (cid:12)(cid:12)(cid:12) J k ( (cid:107) u k (cid:107) ) − J k ( (cid:107) u k (cid:107) ) (cid:12)(cid:12)(cid:12) + J k ( (cid:107) u k (cid:107) ) (cid:107) u k (cid:107) · (cid:13)(cid:13)(cid:13) (cid:107) u k (cid:107)(cid:107) u k (cid:107) u k − u k (cid:13)(cid:13)(cid:13) ≤ L k (cid:12)(cid:12)(cid:12) (cid:107) u k (cid:107) − (cid:107) u k (cid:107) (cid:12)(cid:12)(cid:12) + L k (cid:13)(cid:13)(cid:13) (cid:107) u k (cid:107)(cid:107) u k (cid:107) u k − u k (cid:13)(cid:13)(cid:13) ≤ L k (cid:107) u k − u k (cid:107) . This completes the proof of (7.1). h = h dCov ) Proof of Theorem 4.1 ( h = h dCov ). Recall that f dCov1 ([ w i ] i =1 ) = 12 (cid:107) w − w (cid:107) and f dCov2 ([ w i ] i =1 ) = 12 (cid:107) w − w (cid:107) , with possibly different dimension for the inputs. Now, (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) and (cid:107) Y ki − Y ki (cid:107) arealmost surely bounded by L k , since Y ( n ) ki and Y ki are. Next, (cid:12)(cid:12)(cid:12) (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) − (cid:107) Y ki − Y ki (cid:107) (cid:12)(cid:12)(cid:12) ≤ (cid:107) Y ( n ) ki − Y ki (cid:107) + 12 (cid:107) Y ( n ) ki − Y ki (cid:107) ≤ sup ≤ i ≤ n (cid:107) Y ( n ) ki − Y ki (cid:107) , and we deduce that ( n ) − (cid:88) [ i ,...,i ] ∈ I n (cid:12)(cid:12)(cid:12) (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) − (cid:107) Y ki − Y ki (cid:107) (cid:12)(cid:12)(cid:12) ≤ sup ≤ i ≤ n (cid:107) Y ( n ) ki − Y ki (cid:107) a . s . −→ . Both conditions in (4.5) are satisfied, and the proof is thus completed. h = h M ) Proof of Theorem 4.1 ( h = h M ). Recall that f M ([ w i ] i =1 ) = f M ([ w i ] i =1 ) = ( w , w (cid:22) w ) , upto a change in input dimension for the two functions. It is obvious that f k ( { Y ( n ) ki (cid:96) } m(cid:96) =1 ) and f k ( { Y ki (cid:96) } m(cid:96) =1 ) are almost surely bounded. Next we verify the second condition in (4.5).We have for k = 1 , , (cid:12)(cid:12)(cid:12) ( Y ( n ) ki , Y ( n ) ki (cid:22) Y ( n ) ki ) − ( Y ki , Y ki (cid:22) Y ki ) (cid:12)(cid:12)(cid:12) ≤ ( B (cid:123) k ; i ,i ,i ,i ,i ) , where B k ; i ,i ,i ,i ,i := (cid:110) (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) ≥ ( n ) k , (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) ≥ ( n ) k (cid:111) . Accordingly, ( n ) − (cid:88) [ i ,...,i ] ∈ I n (cid:12)(cid:12)(cid:12) ( Y ( n ) ki , Y ( n ) ki (cid:22) Y ( n ) ki ) − ( Y ki , Y ki (cid:22) Y ki ) (cid:12)(cid:12)(cid:12) ≤ ( n ) − card (cid:110) [ i , i , i ] ∈ I n : (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) < ( n ) k or (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) < ( n ) k (cid:111) = ( n ) − card (cid:110) [ i , i , i ] ∈ I n : (cid:107) y ( n ) ki − y ( n ) ki (cid:107) < ( n ) k or (cid:107) y ( n ) ki − y ( n ) ki (cid:107) < ( n ) k (cid:111) a . s . −→ , (7.2)which completes the proof. 30 .3.3.3 Proof of Theorem 4.1 ( h = h D ) Proof of Theorem 4.1 ( h = h D ). Recall that f D ([ w i ] i =1 ) = f D ([ w i ] i =1 ) = Arc ( w − w , w − w ) ,up to a change in input dimension for the two functions. Obviously, f k ([ Y ( n ) ki (cid:96) ] m(cid:96) =1 ) and f k ([ Y ki (cid:96) ] m(cid:96) =1 ) are almost surely bounded. To verify the second condition in (4.5), we start by bounding thedifference between Arc ( Y ( n ) ki − Y ( n ) ki , Y ( n ) ki − Y ( n ) ki ) and Arc ( Y ki − Y ki , Y ki − Y ki ) .For k = 1 , , consider ( y k , y k , y k ) ∈ ( R d k ) such that min {(cid:107) y k − y k (cid:107) , (cid:107) y k − y k (cid:107)} ≥ η and ζ ≤ Arc ( y k − y k , y k − y k ) ≤ − ζ, where η and ζ will be specified later on. For ( y (cid:48) k , y (cid:48) k , y (cid:48) k ) ∈ ( R d k ) satisfying (cid:107) y ki − y (cid:48) ki (cid:107) ≤ δ for i = 1 , , , Arc ( y k − y k , y k − y (cid:48) k ) ≤ π arcsin (cid:107) y k − y (cid:48) k (cid:107)(cid:107) y k − y k (cid:107) ≤ π arcsin δη , Arc ( y k − y (cid:48) k , y (cid:48) k − y (cid:48) k ) ≤ π arcsin (cid:107) y k − y (cid:48) k (cid:107)(cid:107) y k − y (cid:48) k (cid:107) ≤ π arcsin δη − δ , Arc ( y k − y k , y k − y (cid:48) k ) ≤ π arcsin (cid:107) y k − y (cid:48) k (cid:107)(cid:107) y k − y k (cid:107) ≤ π arcsin δη , and Arc ( y k − y (cid:48) k , y (cid:48) k − y (cid:48) k ) ≤ π arcsin (cid:107) y k − y (cid:48) k (cid:107)(cid:107) y k − y (cid:48) k (cid:107) ≤ π arcsin δη − δ . Assuming that π (cid:16) δη + 2 arcsin δη − δ (cid:17) ≤ ζ, (7.3)we obtain | Arc ( y k − y k , y k − y k ) − Arc ( y (cid:48) k − y (cid:48) k , y (cid:48) k − y (cid:48) k ) | ≤ π (cid:16) δη + 2 arcsin δη − δ (cid:17) . For δ ≤ / , take η = √ δ and ζ = 3 √ δ/ such that (7.3) holds, π (cid:16) δη + 2 arcsin δη − δ (cid:17) = 12 π (cid:16) √ δ + 2 arcsin √ δ − √ δ (cid:17) ≤ π (cid:16) √ δ + 2 arcsin 2 √ δ (cid:17) ≤ π (cid:16) π √ δ + 2 π √ δ ) (cid:17) = 32 √ δ = ζ. It follows that for δ ≤ / and ( y k , y k , y k ) , ( y (cid:48) k , y (cid:48) k , y (cid:48) k ) ∈ ( R d k ) such that min {(cid:107) y k − y k (cid:107) , (cid:107) y k − y k (cid:107)} ≥ √ δ, √ δ ≤ Arc ( y k − y k , y k − y k ) ≤ − √ δ, and (cid:107) y ki − y (cid:48) ki (cid:107) ≤ δ for i = 1 , , , we have | Arc ( y k − y k , y k − y k ) − Arc ( y (cid:48) k − y (cid:48) k , y (cid:48) k − y (cid:48) k ) | ≤ √ δ. k = 1 , , (cid:12)(cid:12)(cid:12) Arc ( Y ( n ) ki − Y ( n ) ki , Y ( n ) ki − Y ( n ) ki ) − Arc ( Y ki − Y ki , Y ki − Y ki ) (cid:12)(cid:12)(cid:12) ≤ (cid:113) Ξ ( n ) k · ( A k ; i ,i ,i ,i ,i ) + (cid:16)
12 + 12 (cid:17) · ( A (cid:123) k ; i ,i ,i ,i ,i ) ≤ (cid:113) Ξ ( n ) k + ( A (cid:123) k ; i ,i ,i ,i ,i ) , where A k ; i ,i ,i ,i ,i := (cid:110) Ξ ( n ) k ≤ , (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) ≥ (cid:113) Ξ ( n ) k , (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) ≥ (cid:113) Ξ ( n ) k , and (cid:113) Ξ ( n ) k ≤ Arc ( Y ( n ) ki − Y ( n ) ki , Y ( n ) ki − Y ( n ) ki ) ≤ − (cid:113) Ξ ( n ) k (cid:111) , and, accordingly, ( n ) − (cid:88) [ i ,...,i ] ∈ I n (cid:12)(cid:12)(cid:12) Arc ( Y ( n ) ki − Y ( n ) ki , Y ( n ) ki − Y ( n ) ki ) − Arc ( Y ki − Y ki , Y ki − Y ki ) (cid:12)(cid:12)(cid:12) ≤ (cid:16) (cid:113) Ξ ( n ) k + (cid:110) Ξ ( n ) k > (cid:111) + ( n ) − card (cid:110) [ i , i , i ] ∈ I n : (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) < (cid:113) Ξ ( n ) k , or (cid:107) Y ( n ) ki − Y ( n ) ki (cid:107) < (cid:113) Ξ ( n ) k , or Arc ( Y ( n ) ki − Y ( n ) ki , Y ( n ) i − Y ( n ) ki ) ∈ (cid:104) , (cid:113) Ξ ( n ) k (cid:17) ∪ (cid:16) − (cid:113) Ξ ( n ) k , (cid:105)(cid:111)(cid:17) = 12 (cid:16) (cid:113) Ξ ( n ) k + (cid:110) Ξ ( n ) k > (cid:111) + ( n ) − card (cid:110) [ i , i , i ] ∈ I n : (cid:107) y ( n ) ki − y ( n ) ki (cid:107) < (cid:113) Ξ ( n ) k , or (cid:107) y ( n ) ki − y ( n ) ki (cid:107) < (cid:113) Ξ ( n ) k , or Arc ( y ( n ) ki − y ( n ) ki , y ( n ) ki − y ( n ) ki ) ∈ (cid:104) , (cid:113) Ξ ( n ) k (cid:17) ∪ (cid:16) − (cid:113) Ξ ( n ) k , (cid:105)(cid:111)(cid:17) . (7.4)Since, for any sequence [ δ ( n ) ] ∞ n =1 tending to , it holds that ( n ) − card (cid:110) [ i , i , i ] ∈ I n : (cid:107) y ( n ) ki − y ( n ) ki (cid:107) < (cid:112) δ ( n ) , or (cid:107) y ( n ) ki − y ( n ) ki (cid:107) < (cid:112) δ ( n ) , or Arc ( y ( n ) ki − y ( n ) ki , y ( n ) ki − y ( n ) ki ) ∈ (cid:104) , (cid:112) δ ( n ) (cid:17) ∪ (cid:16) − (cid:112) δ ( n ) , (cid:105)(cid:111) → , we have shown that (7.4) converges to almost surely. This completes the proof. h = h R , h τ ∗ ) Proof of Theorem 4.1 ( h = h R , h τ ∗ ). The proof is similar to the proof of Theorem 4.1 ( h = h D )and hence omitted. Proof of Proposition 4.4.
We only illustrate how to efficiently compute U-statistic estimates of Ho-effding’s multivariate projection-averaging D and Blum–Kiefer–Rosenblatt’s multivariate projection-averaging R . The other claims straightforwardly follow from the sources provided in the proposition.Zhu et al. (2017) showed how to efficiently compute a V-statistic estimate of Hoeffding’s multi-variate projection-averaging D . Let us show how to efficiently compute the corresponding32-statistic. We define arrays ( a k(cid:96)rs ) (cid:96),r,s ∈ (cid:74) n (cid:75) for k = 1 , as (cid:40) a k(cid:96)rs := Arc ( y k(cid:96) − y ks , y kr − y ks ) if [ (cid:96), r, s ] ∈ I n ,a k(cid:96)rs := 0 otherwise . Their U-centered versions ( A k(cid:96)rs ) (cid:96),r,s ∈ (cid:74) n (cid:75) for k = 1 , are A k(cid:96)rs := a k(cid:96)rs − n − n (cid:88) i =1 a kirs − n − n (cid:88) j =1 a k(cid:96)js + 1( n − n − n (cid:88) i,j =1 a kijs if [ (cid:96), r, s ] ∈ I n , otherwise . Then, (cid:18) n (cid:19) − (cid:88) i < ···
In view of Lemma 3 in Weihs et al. (2018), the claim readily follows fromthe theory of degenerate U-statistics (Serfling, 1980, Chap. 5.5.2).
Proof of Theorem 5.1.
For k = 1 , , let P ( n ) J k ,d k and P J k ,d k denote the distributions of W ( n ) k and W k ,respectively, and let again Y ( n ) ki and Y ki stand for G ( n ) k, ± ( X ki ) and G k, ± ( X ki ) , respectively. Consider33he Hoeffding decomposition W ∼ ( n ) µ = m (cid:88) (cid:96) =1 (cid:18) m(cid:96) (cid:19)(cid:18) n(cid:96) (cid:19) − (cid:88) ≤ i < ···
Lemma 3 in Weihs et al. (2018) confirms that (cid:101) h µ, ( · ; P ( n ) J ,d ⊗ P ( n ) J ,d ) = 0 = (cid:101) h µ, ( · ; P J ,d ⊗ P J ,d ) , and thus nH ∼ n, = nH n, = 0 . Step II.
Lemma 3 in Weihs et al. (2018) shows that, (cid:18) m (cid:19) · (cid:101) h µ, (cid:16) ( y , y ) , ( y , y ); P ( n ) J ,d ⊗ P ( n ) J ,d (cid:17) = g ( n )1 ( y , y ) g ( n )2 ( y , y ) , and (cid:18) m (cid:19) · (cid:101) h µ, (cid:16) ( y , y ) , ( y , y ); P J ,d ⊗ P J ,d (cid:17) = g ( y , y ) g ( y , y ) , where g ( n ) k and g k are defined in (5.5) and (5.2). To prove that nH ∼ n, − nH n, = o P (1) , it sufficesto show that E (cid:2) ( nH ∼ n, − nH n, ) (cid:3) = E (cid:104)(cid:16) n − (cid:88) ( i,j ) ∈ I n g ( n )1 ( Y ( n )1 i , Y ( n )1 j ) g ( n )2 ( Y ( n )2 i , Y ( n )2 j ) − n − (cid:88) ( i,j ) ∈ I n g ( Y i , Y j ) g ( Y i , Y j ) (cid:17) (cid:105) = o (1) . (7.7)We proceed in three sub-steps. Step II-1.
The theory of degenerate U-statistics (cf. Equation (7) of Section 1.6 in Lee (1990))yields that E (cid:2) ( nH n, ) (cid:3) = 2 nn − (cid:2) g ( Y , Y ) (cid:3) E (cid:2) g ( Y , Y ) (cid:3) . (7.8)34 tep II-2. We next deduce that E (cid:2) ( nH ∼ n, )( nH n, ) (cid:3) = E (cid:104)(cid:16) n − (cid:88) ( i,j ) ∈ I n g ( n )1 ( Y ( n )1 i , Y ( n )1 j ) g ( n )2 ( Y ( n )2 i , Y ( n )2 j ) (cid:17)(cid:16) n − (cid:88) ( i,j ) ∈ I n g ( Y i , Y j ) g ( Y i , Y j ) (cid:17)(cid:105) → (cid:2) g ( Y , Y ) (cid:3) E (cid:2) g ( Y , Y ) (cid:3) . (7.9)By symmetry, we have E (cid:2) g ( n ) k ( Y ( n ) ki , Y ( n ) kj ) g k ( Y ki , Y kj ) (cid:3) = E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) =: A ( n ) k , E (cid:2) g ( n ) k ( Y ( n ) k(cid:96) , Y ( n ) kj ) g k ( Y ki , Y kj ) (cid:3) = E (cid:2) g ( n ) k ( Y ( n ) ki , Y ( n ) kr ) g k ( Y ki , Y kj ) (cid:3) = E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) =: B ( n ) k , E (cid:2) g ( n ) k ( Y ( n ) k(cid:96) , Y ( n ) kr ) g k ( Y ki , Y kj ) (cid:3) = E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) =: C ( n ) k for all distinct i, j, (cid:96), r , and also A ( n ) k = E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) , (7.10) A ( n ) k + ( n − B ( n ) k = E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) + (cid:88) (cid:96) : (cid:96) (cid:54) =1 , E (cid:2) g ( n ) k ( Y ( n ) k(cid:96) , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) = − E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) , (7.11) B ( n ) k + ( n − C ( n ) k = E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) + E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) + (cid:88) (cid:96) : (cid:96) (cid:54) =1 , , E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k(cid:96) ) g k ( Y k , Y k ) (cid:3) = − E (cid:2) g ( n ) k ( Y ( n ) k , Y ( n ) k ) g k ( Y k , Y k ) (cid:3) . (7.12)We claim that A ( n ) k → E (cid:2) g k ( Y k , Y k ) (cid:3) , (7.13) A ( n ) k + ( n − B ( n ) k → − E (cid:2) g k ( Y k , Y k ) g k ( Y k , Y k ) (cid:3) = 0 , (7.14) B ( n ) k + ( n − C ( n ) k → − E (cid:2) g k ( Y k , Y k ) g k ( Y k , Y k ) (cid:3) = 0 . (7.15)We only prove (7.13), as (7.14) and (7.15) are quite similar.If Condition (5.6) holds, we obtain, since E (cid:2) f k ([ W ki (cid:96) ] m(cid:96) =1 ) (cid:3) < ∞ , that (cid:107) g k ( Y k , Y k ) (cid:107) L ≤ (cid:107) g k ( Y k , Y k ) (cid:107) L < ∞ . To prove (7.13), we still need to show that Y ( n ) ki L −→ Y ki for k = 1 , . Since the scores J k , k = 1 , are weakly regular (cf. Definition 4.2) and square-integrable, we obtain lim n →∞ n − n (cid:88) r =1 J (cid:16) rn + 1 (cid:17) = (cid:90) J ( u )d u, and thus E (cid:107) Y ( n ) ki (cid:107) → E (cid:107) Y ki (cid:107) . Notice also that Y ( n ) ki a . s . −→ Y ki . Using Vitali’s theorem (Shorack,2017, Theorem 5.5) yields E (cid:107) Y ( n ) ki − Y ki (cid:107) → . 35ecause g ( n ) k ( y k , y k ) ⇒ g k ( y k , y k ) , we have E (cid:2) | g ( n ) k ( Y ( n ) k , Y ( n ) k ) − g k ( Y ( n ) k , Y ( n ) k ) | · | g k ( Y k , Y k ) | (cid:3) ≤ (cid:107) g ( n ) k ( Y ( n ) k , Y ( n ) k ) − g k ( Y ( n ) k , Y ( n ) k ) (cid:107) L ∞ · (cid:107) g k ( Y k , Y k ) (cid:107) L → . (7.16)Next, since g k is Lipschitz-continuous, by the fact that Y ( n ) ki L −→ Y ki , E (cid:2) | g k ( Y ( n ) k , Y ( n ) k ) − g k ( Y k , Y k ) | · | g k ( Y k , Y k ) | (cid:3) ≤ (cid:107) g k ( Y ( n ) k , Y ( n ) k ) − g k ( Y k , Y k ) (cid:107) L · (cid:107) g k ( Y k , Y k ) (cid:107) L → (7.17)Combining (7.16) and (7.17) yields (7.13).Having established (7.13)–(7.15), we obtain that A ( n ) k → E (cid:2) g k ( Y k , Y k ) (cid:3) , B ( n ) k = O ( n − ) and C ( n ) k = o ( n − ) . (7.18)Plugging (7.18) into the left-hand side of (7.9) gives E (cid:104)(cid:16) n − (cid:88) ( i,j ) ∈ I n g ( n )1 ( Y ( n )1 i , Y ( n )1 j ) g ( n )2 ( Y ( n )2 i , Y ( n )2 j ) (cid:17)(cid:16) n − (cid:88) ( i,j ) ∈ I n g ( Y i , Y j ) g ( Y i , Y j ) (cid:17)(cid:105) = n ( n − n − (cid:110) A ( n )1 A ( n )2 + 4( n − B ( n )1 B ( n )2 + ( n − n − C ( n )1 C ( n )2 (cid:111) → (cid:2) g ( Y , Y ) (cid:3) E (cid:2) g ( Y , Y ) (cid:3) . This completes the proof of (7.9).
Step II-3.
In order to prove (7.7), it remains to show that E (cid:2) ( nH ∼ n, ) (cid:3) → (cid:2) g ( Y , Y ) (cid:3) E (cid:2) g ( Y , Y ) (cid:3) . (7.19)Notice that nH ∼ n, is a double-indexed permutation statistic. Applying Equations (2.2)–(2.3) inBarbour and Eagleson (1986) yields E (cid:2) nH ∼ n, (cid:3) = nµ ( n )1 µ ( n )2 , and Var( nH ∼ n, ) = 4 n ( n − ( n − (cid:16) (cid:80) ni =1 { ζ ( n )1 i } n (cid:17)(cid:16) (cid:80) ni =1 { ζ ( n )2 i } n (cid:17) + 2 nn − (cid:16) (cid:80) i (cid:54) = j { η ( n )1 ij } n ( n − (cid:17)(cid:16) (cid:80) i (cid:54) = j { η ( n )2 ij } n ( n − (cid:17) , where for k = 1 , , µ ( n ) k := 1 n ( n − (cid:88) i (cid:54) = j g ( n ) k ( y ( n ) ki , y ( n ) kj ) ,ζ ( n ) ki := (cid:88) j : j (cid:54) = i (cid:110) g ( n ) k ( y ( n ) ki , y ( n ) kj ) − µ ( n ) k (cid:111) ,η ( n ) kij := g ( n ) k ( y ( n ) i , y ( n ) j ) − ζ ( n ) ki n − − ζ ( n ) kj n − − µ ( n ) k . µ ( n ) k = − n ( n − n (cid:88) i =1 g ( n ) k ( y ( n ) ki , y ( n ) ki ) ,ζ ( n ) ki = − g ( n ) k ( y ( n ) ki , y ( n ) ki ) + 1 n n (cid:88) j =1 g ( n ) k ( y ( n ) kj , y ( n ) kj ) ,η ( n ) kij = g ( n ) k ( y ( n ) i , y ( n ) j ) + g ( n ) k ( y ( n ) ki , y ( n ) ki ) n − g ( n ) k ( y ( n ) kj , y ( n ) kj ) n − − n − n − n (cid:88) i =1 g ( n ) k ( y ( n ) ki , y ( n ) ki ) . Moreover, we can write E (cid:2) nH ∼ n, (cid:3) and Var( nH ∼ n, ) in terms of Y ( n ) k and Y ( n ) k : E (cid:2) nH ∼ n, (cid:3) = n ( n − E (cid:2) g ( n )1 ( Y ( n )11 , Y ( n )11 ) (cid:3) E (cid:2) g ( n )2 ( Y ( n )21 , Y ( n )21 ) (cid:3) , Var( nH ∼ n, ) = 4 n ( n − ( n − Var (cid:2) g ( n )1 ( Y ( n )11 , Y ( n )11 ) (cid:3) Var (cid:2) g ( n )2 ( Y ( n )21 , Y ( n )21 ) (cid:3) + 2 nn − (cid:104) g ( n )1 ( Y ( n )11 , Y ( n )12 ) + g ( n )1 ( Y ( n )11 , Y ( n )11 ) n − g ( n )1 ( Y ( n )12 , Y ( n )12 ) n − (cid:105) × Var (cid:104) g ( n )2 ( Y ( n )21 , Y ( n )22 ) + g ( n )2 ( Y ( n )21 , Y ( n )21 ) n − g ( n )2 ( Y ( n )22 , Y ( n )22 ) n − (cid:105) . Using once again Condition (5.6), and by a similar argument as in the proof of (7.13), we obtain E (cid:2) nH ∼ n, (cid:3) → n ( n − E (cid:2) g ( Y , Y ) (cid:3) E (cid:2) g ( Y , Y ) (cid:3) → , (7.20) Var( nH ∼ n, ) → n ( n − ( n − Var (cid:2) g ( Y , Y ) (cid:3) Var (cid:2) g ( Y , Y ) (cid:3) + 2 nn − (cid:104) g ( Y , Y ) + g ( Y , Y ) n − g ( Y , Y ) n − (cid:105) × Var (cid:104) g ( Y , Y ) + g ( Y , Y ) n − g ( Y , Y ) n − (cid:105) → (cid:2) g ( Y , Y ) (cid:3) E (cid:2) g ( Y , Y ) (cid:3) . (7.21)Combining (7.20) and (7.21), we deduce that (7.19) holds.Finally, Step II is completed by combining (7.8), (7.9), and (7.19) to deduce (7.7). Step III.
Notice that sup i ,...,i m ∈ (cid:74) m (cid:75) E (cid:2) f k ([ W ki (cid:96) ] m(cid:96) =1 ) (cid:3) < ∞ . Proving that E (cid:2) ( nH ∼ n,(cid:96) ) (cid:3) = o (1) for (cid:96) = 3 , , . . . , m goes along the same steps as the proof of Theorem 4.2 in the supplement of Shiet al. (2020); it is omitted here. The fact that E (cid:2) ( nH n,(cid:96) ) (cid:3) = o (1) , (cid:96) = 3 , , . . . , m follows directlyfrom the theory of degenerate U-statistics (cf. Equation (7) of Section 1.6 in Lee (1990)). The proofis thus complete. Proof of Theorem 5.2.
The proof is similar to that of Theorem 5.1. The only difference lies inproving (7.13)–(7.15) and (7.20)–(7.21). By the continuous mapping theorem (van der Vaart, 1998,Theorem 2.3) and the Skorokhod construction (Shorack, 2017, Chap. 3, Theorem 5.7(viii)), wecan assume, without loss of generality, that W ( n ) ki a . s . −→ W ki . If Condition (5.7) holds, then (7.13)37mmediately follows from the dominated convergence theorem and the definitions of g ( n ) k and g k in (5.5) and (5.2). The proofs for (7.14), (7.15), (7.20), and (7.21) are similar. h = h dCov ) Proof of Proposition 5.2 ( h = h dCov ). Condition (5.1) is obvious. Condition (5.4) is satisfied inview of Theorem 5 in Székely et al. (2007). We next verify that condition (5.6) is satisfied. To doso, let us first show that g ( n ) k ( y k , y k ) ⇒ g k ( y k , y k ) for k = 1 , . By definitions (5.2) and (5.5), g ( n ) k ( y k , y k ) := (cid:107) y k − y k (cid:107) − E (cid:107) y k − W ( n ) k (cid:107) − E (cid:107) W ( n ) k − y k (cid:107) + E (cid:107) W ( n ) k − W ( n ) k (cid:107) , and g k ( y k , y k ) := (cid:107) y k − y k (cid:107) − E (cid:107) y k − W k (cid:107) − E (cid:107) W k − y k (cid:107) + E (cid:107) W k − W k (cid:107) . Noting that J k , k = 1 , are continuous, we can assume, as in the proof of Theorem 5.2, that W ( n ) ki a . s . −→ W ki . Since the scores J k are square-integrable, we obtain that E (cid:107) W ( n ) ki (cid:107) → E (cid:107) W ki (cid:107) .Using Vitali’s theorem (Shorack, 2017, Chap. 3, Theorem 5.5) yields W ( n ) ki L −→ W ki . Therefore, weobtain (cid:12)(cid:12)(cid:12) E (cid:107) y k − W ( n ) k (cid:107) − E (cid:107) y k − W k (cid:107) (cid:12)(cid:12)(cid:12) ≤ E (cid:107) W ( n ) k − W k (cid:107) , (cid:12)(cid:12)(cid:12) E (cid:107) W ( n ) k − y k (cid:107) − E (cid:107) W k − y k (cid:107) (cid:12)(cid:12)(cid:12) ≤ E (cid:107) W ( n ) k − W k (cid:107) , (cid:12)(cid:12)(cid:12) E (cid:107) W ( n ) k − W ( n ) k (cid:107) − E (cid:107) W k − W k (cid:12)(cid:12)(cid:12) ≤ E (cid:107) W ( n ) k − W k (cid:107) + E (cid:107) W ( n ) k − W k (cid:107) , and, furthermore, (cid:12)(cid:12)(cid:12) g ( n ) k ( y k , y k ) − g k ( y k , y k ) (cid:12)(cid:12)(cid:12) ≤ (cid:16) E (cid:107) W ( n ) k − W k (cid:107) + E (cid:107) W ( n ) k − W k (cid:107) (cid:17) . The uniform convergence g ( n ) k ( y k , y k ) ⇒ g k ( y k , y k ) follows. It is obvious that g k ( y k , y k ) isLipschitz-continuous, and E (cid:2) f k ( W ki , . . . , W ki ) (cid:3) < ∞ for all i , . . . , i ∈ (cid:74) (cid:75) as long as J , J areweakly regular. h = h M , h D , h R , h τ ∗ ) Proof of Proposition 5.2 ( h = h M , h D , h R , h τ ∗ ). Condition (5.1) is obvious. Condition (5.4) is sat-isfied for h D by Theorem 3(i) in Zhu et al. (2017). For h = h M , h R , h τ ∗ , we can prove condition(5.4) holds as well in a similar way. It is clear that condition (5.7) is satisfied for all these fourkernel functions. Proof of Proposition 5.4.
Validity is a direct corollary of Proposition 5.3. Uniform validity thenfollows from validity and exact distribution-freeness. For any fixed alternative in P ac d ,d , ∞ , it holdsthat W ∼ ( n ) µ a . s . −→ µ ± ( X , X ) > as n → ∞ . Thus, n W ∼ ( n ) µ a . s . −→ ∞ and the result follows. roof of Proposition 5.5(i). We need to verify Assumption 5.1. Items (i) and (ii) are obvious.For (iii), following the proof of Lemma 3.2.1 in Gieser (1993), when X ∗ and X ∗ are ellipticallysymmetric with parameters d , Σ and d , Σ , respectively, we obtain L (cid:48) ( x ; 0) = − M x ) (cid:62) Σ − x · ρ (cid:16) x (cid:62) Σ − x (cid:17) − M x ) (cid:62) Σ − x · ρ (cid:16) x (cid:62) Σ − x (cid:17) . Consequently, (5.11) is sufficient for I X (0) = E (cid:2) L (cid:48) ( X ; 0) (cid:3) < ∞ . If I X (0) = 0 , then we must have ρ (cid:16) x (cid:62) Σ − x (cid:17) = ρ (cid:16) x (cid:62) Σ − x (cid:17) = C ρ for some constant C ρ (cid:54) = 0 and ( M x ) (cid:62) Σ − x + ( M x ) (cid:62) Σ − x = x (cid:62) Σ − ( M Σ + Σ M (cid:62) ) Σ − x = 0 for all x , x . This contradicts the assumption that Σ M (cid:62) + M Σ (cid:54) = and completes the proof. Proof of Proposition 5.5(ii).
For the multivariate normal, φ k ( t ) = exp( − t/ and ρ k ( t ) = − / , sothat (5.11) is satisfied. For a multivariate t -distribution with ν k degrees of freedom, φ k ( t ) = (1 + t/ν k ) − ( ν k + d k ) / and ρ k ( t ) = − − (1 + d k /ν k )(1 + t/ν k ) − . It is easily checked that (5.11) is satisfied when ν k > ; see Gieser (1993, p. 44–46). Proof of Theorem 5.3.
Let X ∗ ni and X ni , i ∈ (cid:74) n (cid:75) be independent copies of X ∗ and X with δ = δ ( n ) ,respectively. Let P ( n ) := ⊗ ni =1 P ( n ) i , Q ( n ) := ⊗ ni =1 Q ( n ) i , where P ( n ) i and Q ( n ) i are the distributionsof X ∗ ni and X ni , respectively. Define Λ ( n ) := log dQ ( n ) dP ( n ) = n (cid:88) i =1 log L ( X ∗ ni ; δ ) and T ( n ) := δ ( n ) n (cid:88) i =1 L (cid:48) ( X ∗ ni ; 0) . Following the proof of Lemma 3.2.1 in Gieser (1993), we get L (cid:48) ( x ; 0) = − M x ) (cid:62) (cid:16) ∇ q ( x ) (cid:14) q ( x ) (cid:17) − M x ) (cid:62) (cid:16) ∇ q ( x ) (cid:14) q ( x ) (cid:17) . (7.22)We proceed in three steps. First, we clarify that Q ( n ) is contiguous to P ( n ) in order for Le Cam’sthird lemma (van der Vaart, 1998, Theorem 6.6) to be applicable. Next, we derive the joint lim-iting null distribution of ( nW ∼ ( n ) µ , Λ ( n ) ) (cid:62) . Lastly, we employ Le Cam’s third lemma to obtain theasymptotic distribution of ( nW ∼ ( n ) µ , Λ ( n ) ) (cid:62) under contiguous alternatives. Step I.
In view of Gieser (1993, Sec. 3.2.1), Assumption 5.1 entails the contiguity Q ( n ) (cid:47) P ( n ) . Step II.
Next, we derive the limiting joint distribution of ( nW ∼ ( n ) µ , Λ ( n ) ) (cid:62) under the null hy-pothesis. To this end, we first obtain the limiting null distribution of ( nH n, , T ( n ) ) (cid:62) , where H n, isdefined in (7.6). By condition (5.1), we write H n, = 1 n ( n − (cid:88) i (cid:54) = j ∞ (cid:88) v =1 λ v ψ v ( Y i , Y i ) ψ v ( Y j , Y j ) , where ψ v is the normalized eigenfunction associated with λ v and Y ki = G ∗ k, ± ( X ∗ ki ) for k = 1 , . For39ach positive integer K , consider the “truncated” U-statistic H n, ,K := 1 n ( n − (cid:88) i (cid:54) = j K (cid:88) v =1 λ v ψ v ( Y i , Y i ) ψ v ( Y j , Y j ) . Note that nH n, and nH n, ,K can be written as nH n, = nn − (cid:110) ∞ (cid:88) v =1 λ v (cid:16) n (cid:88) i =1 ψ v ( Y i , Y i ) √ n (cid:17) − ∞ (cid:88) v =1 λ v (cid:16) (cid:80) ni =1 { ψ v ( Y i , Y i ) } n (cid:17)(cid:111) ,nH n, ,K = nn − (cid:110) K (cid:88) v =1 λ v (cid:16) n (cid:88) i =1 ψ v ( Y i , Y i ) √ n (cid:17) − K (cid:88) v =1 λ v (cid:16) (cid:80) ni =1 { ψ v ( Y i , Y i ) } n (cid:17)(cid:111) . To obtain the limiting null distribution of ( nH n, , T ( n ) ) (cid:62) , first consider the limiting null distri-bution, for fixed K , of ( nH n, ,K , T ( n ) ) (cid:62) . Let S n,v be a shorthand for n − / (cid:80) ni =1 ψ v ( Y i , Y i ) andobserve that E[ S n,v ] = 0 , Var[ S n,v ] = 1 , Cov[ S n,v , T ( n ) ] → γ v δ , E[ T ( n ) ] = 0 , and Var[ T ( n ) ] = I X (0) ,where γ v := Cov (cid:2) ψ v ( Y , Y ) , L (cid:48) (( G − , ± ( Y ) , G − , ± ( Y )); 0) (cid:3) . There exists at least one v ≥ suchthat γ v (cid:54) = 0 . Indeed, since E X ∗ k = E (cid:2) ∇ q k ( X ∗ k ) /q k ( X ∗ k ) (cid:3) = , [ ψ v ( x )] ∞ v =1 forms a complete orthogo-nal basis for the family of functions of the form (7.22): γ v = 0 for all v thus entails E (cid:2) L (cid:48) ( X ; 0) (cid:3) = E (cid:104)(cid:16) ∞ (cid:88) v =1 γ v ψ v ( Y , Y ) (cid:17) (cid:105) = ∞ (cid:88) v =1 γ v = 0 , which contradicts Assumption 5.1(iii). Therefore, γ v ∗ (cid:54) = 0 for some v ∗ . Applying the multivariatecentral limit theorem (Bhattacharya and Ranga Rao, 1986, Equation (18.24)), we deduce ( S n, , . . . , S n,K , T ( n ) ) (cid:62) P ( n ) (cid:32) ( ξ , . . . , ξ K , V K ) (cid:62) ∼ N K +1 (cid:18)(cid:18) K (cid:19) , (cid:18) I p δ v δ v (cid:62) δ I (cid:19)(cid:19) , where I := I X (0) and v = ( γ , . . . , γ K ) (cid:62) . Thus, V K can be expressed as (cid:16) δ I (cid:17) / (cid:110) K (cid:88) v =1 c v ξ v + (cid:16) − K (cid:88) v =1 c v (cid:17) / ξ (cid:111) where c v := I − / γ v , and ξ is standard Gaussian, independent of ξ , . . . , ξ K . Then, by the contin-uous mapping theorem (van der Vaart, 1998, Theorem 2.3) and Slutsky’s theorem (van der Vaart,1998, Theorem 2.8), ( nH n, ,K , T ( n ) ) (cid:62) P ( n ) (cid:32) (cid:18) K (cid:88) v =1 λ v ( ξ v − , (cid:16) δ I (cid:17) / (cid:110) K (cid:88) v =1 c v ξ v + (cid:16) − K (cid:88) v =1 c v (cid:17) / ξ (cid:111)(cid:19) (cid:62) (7.23)for any K . This entails ( nH n, , T ( n ) ) (cid:62) P ( n ) (cid:32) (cid:18) ∞ (cid:88) v =1 λ v ( ξ v − , (cid:16) δ I (cid:17) / (cid:110) ∞ (cid:88) v =1 c v ξ v + (cid:16) − ∞ (cid:88) v =1 c v (cid:17) / ξ (cid:111)(cid:19) (cid:62) . (7.24)40ndeed, putting M K := K (cid:88) v =1 λ v ( ξ v − , V K := (cid:16) δ I (cid:17) / (cid:110) K (cid:88) v =1 c v ξ v + (cid:16) − K (cid:88) v =1 c v (cid:17) / ξ (cid:111) ,M := ∞ (cid:88) v =1 λ v ( ξ v − , and V := (cid:16) δ I (cid:17) / (cid:110) ∞ (cid:88) v =1 c v ξ v + (cid:16) − ∞ (cid:88) v =1 c v (cid:17) / ξ (cid:111) , it suffices, in order to to prove (7.24), to show that, for any a, b ∈ R , (cid:12)(cid:12)(cid:12) E (cid:104) exp (cid:110) i anH n, + i bT ( n ) (cid:111)(cid:105) − E (cid:104) exp (cid:110) i aM + i bV (cid:111)(cid:105)(cid:12)(cid:12)(cid:12) → as n → ∞ . (7.25)We have (cid:12)(cid:12)(cid:12) E (cid:104) exp (cid:110) i anH n, + i bT ( n ) (cid:111)(cid:105) − E (cid:104) exp (cid:110) i aM + i bV (cid:111)(cid:105)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) E (cid:104) exp (cid:110) i anH n, + i bT ( n ) (cid:111)(cid:105) − E (cid:104) exp (cid:110) i anH n, ,K + i bT ( n ) (cid:111)(cid:105)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) E (cid:104) exp (cid:110) i anH n, ,K + i bT ( n ) (cid:111)(cid:105) − E (cid:104) exp (cid:110) i aM K + i bV K (cid:111)(cid:105)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) E (cid:104) exp (cid:110) i aM K + i bV K (cid:111)(cid:105) − E (cid:104) exp (cid:110) i aM + i bV (cid:111)(cid:105)(cid:12)(cid:12)(cid:12) =: I + II + III, say,where it follows from page 82 of Lee (1990) and Equation (4.3.10) in Koroljuk and Borovskich (1994)that I ≤ E (cid:12)(cid:12)(cid:12) exp (cid:110) i an ( H n, − H n, ,K ) (cid:111) − (cid:12)(cid:12)(cid:12) ≤ (cid:110) E (cid:12)(cid:12)(cid:12) an ( H n, − H n, ,K ) (cid:12)(cid:12)(cid:12) (cid:111) / = (cid:110) na n − ∞ (cid:88) v = K +1 λ v (cid:111) / and III ≤ E (cid:12)(cid:12)(cid:12) exp (cid:110) i a ( M K − M ) + i b ( V K − V ) (cid:111) − (cid:12)(cid:12)(cid:12) ≤ (cid:110) E (cid:12)(cid:12)(cid:12) a ( M K − M ) + b ( V K − V ) (cid:12)(cid:12)(cid:12) (cid:111) / ≤ (cid:110) (cid:16) a ∞ (cid:88) v = K +1 λ v + 2 b δ I ∞ (cid:88) v = K +1 c v (cid:17)(cid:111) / . Since by condition (5.1) ∞ (cid:88) v =1 λ v = Var( g ( W , W )) · Var( g ( W , W )) ∈ (0 , ∞ ) and ∞ (cid:88) v =1 c v = I − ∞ (cid:88) v =1 γ v ≤ , we conclude that, for any (cid:15) > , there exists K such that I < (cid:15)/ and III < (cid:15)/ for all n andall K ≥ K . For this K , we also have, by (7.23), that II < (cid:15)/ for all n sufficiently large; (7.25),hence (7.24), follow.Now, as in Hájek and Šidák (1967, p. 210–214), Λ ( n ) − T ( n ) + δ I / P ( n ) −→ (7.26)(see also Gieser, 1993, Appx. B). Combining (7.24) and (7.26) yields ( nH n, , Λ ( n ) ) (cid:62) P ( n ) (cid:32) (cid:18) ∞ (cid:88) v =1 λ v ( ξ v − , (cid:16) δ I (cid:17) / (cid:110) ∞ (cid:88) v =1 c v ξ v + (cid:16) − ∞ (cid:88) v =1 c v (cid:17) / ξ (cid:111) − δ I (cid:19) (cid:62) . (7.27)Equation (1.6.7) in Lee (1990, p. 30), along with the fact that H n, = 0 , implies that ( nW ∼ ( n ) µ , Λ ( n ) ) (cid:62) has the same limiting distribution as (7.27) under P ( n ) .41 tep III. Finally we employ the general form (van der Vaart, 1998, Theorem 6.6) of Le Cam’sthird lemma, which by condition (5.4) entails Q ( n ) ( nW ∼ ( n ) µ ≤ q − α ) → E (cid:104) (cid:16) ∞ (cid:88) v =1 λ v ( ξ v − ≤ q − α (cid:17) · exp (cid:110)(cid:16) δ I (cid:17) / (cid:16) ∞ (cid:88) v =1 c v ξ v + (cid:16) − ∞ (cid:88) v =1 c v (cid:17) / ξ (cid:17) − δ I (cid:111)(cid:105) ≤ E (cid:104) (cid:110)(cid:12)(cid:12)(cid:12) ξ v ∗ (cid:12)(cid:12)(cid:12) ≤ (cid:16) q − α + (cid:80) ∞ v =1 λ v λ v ∗ (cid:17) / (cid:111) · exp (cid:110)(cid:16) δ I (cid:17) / (cid:16) ∞ (cid:88) v =1 c v ξ v + (cid:16) − ∞ (cid:88) v =1 c v (cid:17) / ξ (cid:17) − δ I (cid:111)(cid:105) = E (cid:104) (cid:110)(cid:12)(cid:12)(cid:12) ξ v ∗ (cid:12)(cid:12)(cid:12) ≤ (cid:16) q − α + (cid:80) ∞ v =1 λ v λ v ∗ (cid:17) / (cid:111) · exp (cid:110)(cid:16) δ I (cid:17) / (cid:16) c v ∗ ξ v ∗ + (cid:16) − c v ∗ (cid:17) / ξ (cid:17) − δ I (cid:111)(cid:105) = Φ (cid:16)(cid:16) q − α + (cid:80) ∞ v =1 λ v λ v ∗ (cid:17) / − c v ∗ (cid:16) δ I (cid:17) / (cid:17) − Φ (cid:16) − (cid:16) q − α + (cid:80) ∞ v =1 λ v λ v ∗ (cid:17) / − c v ∗ (cid:16) δ I (cid:17) / (cid:17) ≤ (cid:16) q − α + (cid:80) ∞ v =1 λ v λ v ∗ (cid:17) / ϕ (cid:16)(cid:110) | c v ∗ | · (cid:16) δ I (cid:17) / − (cid:16) q − α + (cid:80) ∞ v =1 λ v λ v ∗ (cid:17) / (cid:111) + (cid:17) , a quantity which is arbitrarily small for large enough δ , irrespective of the sign of c v ∗ . Proof of Theorem 5.4.
This result is a standard result connecting the Fisher information to minimaxlower bound (Groeneboom and Jongbloed, 2014, Chap. 6). Recall that X ∗ ni and X ni , i ∈ (cid:74) n (cid:75) areindependent copies of X ∗ and X , respectively, with δ = δ ( n ) = n − / δ . Recall P ( n ) := ⊗ ni =1 P ( n ) i , Q ( n ) := ⊗ ni =1 Q ( n ) i , where P ( n ) i and Q ( n ) i are the distributions of X ∗ ni and X ni , respectively. It sufficesto prove that for any small < β < − α , there exists | δ | = c β such that TV(Q ( n ) , P ( n ) ) < β ,which is implied by HL(Q ( n ) , P ( n ) ) < β using the fact that total variation and Hellinger distancessatisfy TV(Q ( n ) , P ( n ) ) ≤ HL(Q ( n ) , P ( n ) ) (Tsybakov, 2009, Equation (2.20)). It is also known (Tsybakov, 2009, p. 83) that − HL (Q ( n ) , P ( n ) )2 = n (cid:89) i =1 (cid:16) − HL (Q ( n ) i , P ( n ) i )2 (cid:17) . Hence, let us evaluate HL (Q ( n ) , P ( n ) ) in terms of I X (0) and δ . By definition, HL (Q ( n ) i , P ( n ) i ) = E (cid:2) − L ( X ∗ ni ; δ ( n ) ) / ) (cid:3) . Gieser (1993, Appendix B, p. 105–107) shows that n · E (cid:104) − L ( X ∗ ni ; δ ( n ) ) / ) (cid:105) = E (cid:104) n (cid:88) i =1 − L ( X ∗ ni ; δ ( n ) ) / ) (cid:105) → δ I X (0)4 . Therefore, − HL (Q ( n ) , P ( n ) )2 −→ exp (cid:110) − δ I X (0) / (cid:111) , and the result follows. 42 eferences Agarwal, P. K. and Sharathkumar, R. (2014). Approximation algorithms for bipartite matching with metricand geometric costs. In
STOC’14—Proceedings of the 2014 ACM Symposium on Theory of Computing ,pages 555–564. ACM, New York, NY.Ambrosio, L., Gigli, N., and Savaré, G. (2008).
Gradient flows in metric spaces and in the space of probabilitymeasures (2nd ed.). Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel.Azadkia, M. and Chatterjee, S. (2019). A simple measure of conditional dependence. Available atarXiv:1910.12327v3.Barbour, A. D. and Eagleson, G. K. (1986). Random association of symmetric arrays.
Stochastic Anal.Appl. , 4(3):239–281.Beirlant, J., Buitendag, S., del Bario, E., and Hallin, M. (2019). Center-outward quantiles and the measure-ment of multivariate risk. Available at arXiv:1912.04924v1.Belloni, A., Chernozhukov, V., and Hansen, C. (2014). Inference on treatment effects after selection amonghigh-dimensional controls.
Rev. Econ. Stud. , 81(2):608–650.Bergsma, W. (2006). A new correlation coefficient, its orthogonal decomposition and associated tests ofindependence. Available at arXiv:math/0604627v1.Bergsma, W. (2011). Nonparametric testing of conditional independence by means of the partial copula.Available at arXiv:1101.4607v1.Bergsma, W. and Dassios, A. (2014). A consistent test of independence based on a sign covariance relatedto Kendall’s tau.
Bernoulli , 20(2):1006–1028.Berrett, T. B., Kontoyiannis, I., and Samworth, R. J. (2020). Optimal rates for independence testing via U -statistic permutation tests. Available at arXiv:2001.05513v1.Bhattacharya, R. N. and Ranga Rao, R. (1986). Normal approximation and asymptotic expansions (Rpt.ed.). Robert E. Krieger Publishing Co., Inc., Melbourne, FL.Blum, J. R., Kiefer, J., and Rosenblatt, M. (1961). Distribution free tests of independence based on thesample distribution function.
Ann. Math. Statist. , 32:485–498.Chaudhuri, A. and Hu, W. (2019). A fast algorithm for computing distance correlation.
Comput. Statist.Data Anal. , 135:15–24.Chernozhukov, V., Galichon, A., Hallin, M., and Henry, M. (2017). Monge-Kantorovich depth, quantiles,ranks and signs.
Ann. Statist. , 45(1):223–256.Deb, N. and Sen, B. (2019). Multivariate rank-based distribution-free nonparametric testing using measuretransportation. Available at arXiv:1909.08733v2.del Barrio, E., Cuesta-Albertos, J. A., Hallin, M., and Matrán, C. (2018). Smooth cyclically monotoneinterpolation and empirical center-outward distribution functions. Available at arXiv:1806.01238v1.del Barrio, E., González-Sanz, A., and Hallin, M. (2019). A note on the regularity of center-outwarddistribution and quantile functions. Available at arXiv:1912.10719v1.Dhar, S. S., Dassios, A., and Bergsma, W. (2016). A study of the power and robustness of a new test forindependence against contiguous alternatives.
Electron. J. Stat. , 10(1):330–351. inic, E. A. and Kronrod, M. A. (1969). An algorithm for solving the assignment problem. Dokl. Akad.Nauk SSSR , 189(1):23–25.Drton, M., Han, F., and Shi, H. (2020+). High dimensional consistent independence testing with maximaof rank correlations.
Ann. Statist. (in press).Edmonds, J. and Karp, R. M. (1970). Theoretical improvements in algorithmic efficiency for network flowproblems. In
Combinatorial Structures and their Applications (Proc. Calgary Internat. Conf., Calgary,Alta., 1969) , pages 93–96. Gordon and Breach, New York.Edmonds, J. and Karp, R. M. (1972). Theoretical improvements in algorithmic efficiency for network flowproblems.
J. Assoc. Comput. Mach. , 19(2):248–264.Even-Zohar, C. and Leng, C. (2019). Counting small permutation patterns. Available at arXiv:1911.01414v2.Gabow, H. N. and Tarjan, R. E. (1989). Faster scaling algorithms for network problems.
SIAM J. Comput. ,18(5):1013–1036.Ghosal, P. and Sen, B. (2019). Multivariate ranks and quantiles using optimal transportation and applicationsto goodness-of-fit testing. Available at arXiv:1905.05340v2.Gieser, P. W. (1993).
A new nonparametric test for independence between two sets of variates . PhD thesis,University of Florida. Available at https://ufdc.ufl.edu/AA00003658/00001 and https://search.proquest.com/docview/304041219 .Gieser, P. W. and Randles, R. H. (1997). A nonparametric test of independence between two vectors.
J.Amer. Statist. Assoc. , 92(438):561–567.Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Pauls, J.,Schölkopf, B., and Logothetis, N. (2005). Kernel constrained covariance for dependence measurement. InCowell, R. G. and Ghahramani, Z., editors,
AISTATS05 , pages 112–119. Society for Artificial Intelligenceand Statistics.Groeneboom, P. and Jongbloed, G. (2014).
Nonparametric estimation under shape constraints: Estimators,algorithms and asymptotics , volume 38 of
Cambridge Series in Statistical and Probabilistic Mathematics .Cambridge University Press, New York.Hájek, J. and Šidák, Z. (1967).
Theory of rank tests . Academic Press, New York-London; AcademiaPublishing House of the Czechoslovak Academy of Sciences, Prague.Hallin, M. (2017). On distribution and quantile functions, ranks and signs in R d : a measure transportationapproach. Available at https://ideas.repec.org/p/eca/wpaper/2013-258262.html .Hallin, M., del Barrio, E., Cuesta-Albertos, J. A., and Matrán, C. (2020+a). Distribution and quantilefunctions, ranks, and signs in dimension d : a measure transportation approach. Ann. Statist. (in press).Hallin, M., Hlubinka, D., and Hudecová, Š. (2020b). Efficient center-outward rank tests for multiple-outputregression and MANOVA. Unpublished Manuscript.Hallin, M., La Vecchia, D., and Liu, H. (2019). Center-outward R-estimation for semiparametric VARMAmodels. Available at arXiv:1910.08442v1.Hallin, M., Mordant, G., and Segers, J. (2020c). Multivariate goodness-of-fit tests based on Wassersteindistance. Available at arXiv:2003.06684v1. allin, M. and Paindaveine, D. (2002a). Multivariate signed ranks: Randles’ interdirections or Tyler’sangles? In Statistical Data Analysis Based on the L -Norm and Related Methods (Neuchâtel, 2002) , Stat.Ind. Technol., pages 271–282. Birkhäuser, Basel.Hallin, M. and Paindaveine, D. (2002b). Optimal procedures based on interdirections and pseudo-Mahalanobis ranks for testing multivariate elliptic white noise against ARMA dependence. Bernoulli ,8(6):787–815.Hallin, M. and Paindaveine, D. (2002c). Optimal tests for multivariate location based on interdirections andpseudo-Mahalanobis ranks.
Ann. Statist. , 30(4):1103–1133.Hallin, M. and Paindaveine, D. (2008). Chernoff-Savage and Hodges-Lehmann results for Wilks’ test ofmultivariate independence. In
Beyond parametrics in interdisciplinary research: Festschrift in honor ofProfessor Pranab K. Sen , volume 1 of
Inst. Math. Stat. (IMS) Collect. , pages 184–196. Inst. Math. Statist.,Beachwood, OH.Han, F., Chen, S., and Liu, H. (2017). Distribution-free tests of independence in high dimensions.
Biometrika ,104(4):813–828.Han, F. and Liu, H. (2018). ECA: high-dimensional elliptical component analysis in non-Gaussian distribu-tions.
J. Amer. Statist. Assoc. , 113(521):252–268.Heller, R., Gorfine, M., and Heller, Y. (2012). A class of multivariate distribution-free tests of independencebased on graphs.
J. Statist. Plann. Inference , 142(12):3097–3106.Heller, R. and Heller, Y. (2016). Multivariate tests of association based on univariate tests. In Lee, D. D.,Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R., editors,
Advances in Neural InformationProcessing Systems 29 , pages 208–216. Curran Associates, Inc.Heller, R., Heller, Y., and Gorfine, M. (2013). A consistent multivariate test of association based on ranksof distances.
Biometrika , 100(2):503–510.Hoeffding, W. (1948). A non-parametric test of independence.
Ann. Math. Statist. , 19(4):546–557.Huang, C. and Huo, X. (2017). A statistically and numerically efficient independence test based on randomprojections and distance covariance. Available at arXiv:1701.06054v1.Huo, X. and Székely, G. J. (2016). Fast computing for distance covariance.
Technometrics , 58(4):435–447.Kendall, M. G. (1938). A new measure of rank correlation.
Biometrika , 30(1/2):81–93.Kim, I., Balakrishnan, S., and Wasserman, L. (2019). Robust multivariate nonparametric tests via projection-averaging. Available at arXiv:1803.00715v3.Konijn, H. S. (1956). On the power of certain tests for independence in bivariate populations.
Ann. Math.Statist. , 27(2):300–323.Koroljuk, V. S. and Borovskich, Y. V. (1994).
Theory of U -statistics (P. V. Malyshev and D. V. Malyshev,Trans.), volume 273 of Mathematics and its Applications . Kluwer Academic Publishers Group, Dordrecht,Netherlands.Kössler, W. and Rödel, E. (2007). The asymptotic efficacies and relative efficiencies of various linear ranktests for independence.
Metrika , 65(1):3–28.Le Cam, L. and Yang, G. L. (2000).
Asymptotics in statistics: some basic concepts (2nd ed.). Springer Seriesin Statistics. Springer-Verlag, New York. ee, A. J. (1990). U -statistics: Theory and practice , volume 110 of Statistics: Textbooks and Monographs .Marcel Dekker, Inc., New York, NY.Leeb, H. and Pötscher, B. M. (2008). Can one estimate the unconditional distribution of post-model-selectionestimators?
Econometric Theory , 24(2):338–376.Lin, J. (2017).
Copula versions of RKHS-based and distance-based criteria . PhD thesis, Pennsylvania StateUniversity. Available at https://etda.libraries.psu.edu/catalog/14485jul268 .Liu, R. Y. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests.
J. Amer.Statist. Assoc. , 88(421):252–260.Lyons, R. (2013). Distance covariance in metric spaces.
Ann. Probab. , 41(5):3284–3305.Lyons, R. (2018). Errata to “Distance covariance in metric spaces”.
Ann. Probab. , 46(4):2400–2405.Oja, H. (2010).
Multivariate nonparametric methods with R: an approach based on spatial signs and ranks ,volume 199 of
Lecture Notes in Statistics . Springer, New York.Oja, H., Paindaveine, D., and Taskinen, S. (2016). Affine-invariant rank tests for multivariate independencein independent component models.
Electron. J. Stat. , 10(2):2372–2419.Pearson, K. (1895). Note on regression and inheritance in the case of two parents.
Proceedings of the RoyalSociety of London , 58:240–242.Puri, M. L. and Sen, P. K. (1971).
Nonparametric methods in multivariate analysis . John Wiley & Sons,Inc., New York-London-Sydney.Puri, M. L., Sen, P. K., and Gokhale, D. V. (1970). On a class of rank order tests for independence inmultivariate distributions.
Sankhy¯a Ser. A , 32(3):271–298.Serfling, R. J. (1980).
Approximation theorems of mathematical statistics . Wiley Series in Probability andMathematical Statistics. John Wiley & Sons, Inc., New York, NY.Shah, R. D. and Peters, J. (2020+). The hardness of conditional independence testing and the generalisedcovariance measure.
Ann. Statist. (in press).Sharathkumar, R. and Agarwal, P. K. (2012). Algorithms for the transportation problem in geometricsettings. In
Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms ,pages 306–317, New York, NY. ACM.Shi, H., Drton, M., and Han, F. (2020+). Distribution-free consistent independence tests via center-outwardranks and signs.
J. Amer. Statist. Assoc. (in press).Shorack, G. R. (2017).
Probability for statisticians (2nd ed.). Springer Texts in Statistics. Springer, Cham,Switzerland.Spearman, C. (1904). The proof and measurement of association between two things.
Amer. J. Psychol. ,15(1):72–101.Székely, G. J. and Rizzo, M. L. (2013). The distance correlation t -test of independence in high dimension. J. Multivariate Anal. , 117:193–213.Székely, G. J. and Rizzo, M. L. (2014). Partial distance correlation with methods for dissimilarities.
Ann.Statist. , 42(6):2382–2412. zékely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation ofdistances. Ann. Statist. , 35(6):2769–2794.Taskinen, S., Kankainen, A., and Oja, H. (2003). Sign test of independence between two random vectors.
Statist. Probab. Lett. , 62(1):9–21.Taskinen, S., Kankainen, A., and Oja, H. (2004). Rank scores tests of multivariate independence. In
Theoryand applications of recent robust methods , Stat. Ind. Technol., pages 329–341. Birkhäuser, Basel.Taskinen, S., Oja, H., and Randles, R. H. (2005). Multivariate nonparametric tests of independence.
J.Amer. Statist. Assoc. , 100(471):916–925.Tomizawa, N. (1971). On some techniques useful for solution of transportation network problems.
Networks ,1:173–194.Tsybakov, A. B. (2009).
Introduction to nonparametric estimation (V. Zaiats, Trans.). Springer Series inStatistics. Springer, New York.van der Vaart, A. W. (1998).
Asymptotic statistics , volume 3 of
Cambridge Series in Statistical and Proba-bilistic Mathematics . Cambridge University Press, Cambridge, United Kingdom.Weihs, L., Drton, M., and Meinshausen, N. (2018). Symmetric rank covariances: a generalized frameworkfor nonparametric measures of dependence.
Biometrika , 105(3):547–562.Wilks, S. S. (1935). On the independence of k sets of normally distributed statistical variables.
Econometrica ,3(3):309–326.Yanagimoto, T. (1970). On measures of association and a related problem.
Ann. Inst. Statist. Math. ,22(1):57–63.Zhu, L., Xu, K., Li, R., and Zhong, W. (2017). Projection correlation between two random vectors.
Biometrika , 104(4):829–843.Zuo, Y. and He, X. (2006). On the limiting distributions of multivariate depth-based rank sum statisticsand related tests.
Ann. Statist. , 34(6):2879–2896., 34(6):2879–2896.