[PDF] The complex behaviour of Galton rank order statistic

Abstract

Galton's rank order statistic is one of the oldest statistical tools for two-sample comparisons. It is also a very natural index to measure departures from stochastic dominance. Yet, its asymptotic behaviour has been investigated only partially, under restrictive assumptions. This work provides a comprehensive {study} of this behaviour, based on the analysis of the so-called contact set (a modification of the set in which the quantile functions coincide). We show that a.s. convergence to the population counterpart holds if and only if {the} contact set has zero Lebesgue measure. When this set is finite we show that the asymptotic behaviour is determined by the local behaviour of a suitable reparameterization of the quantile functions in a neighbourhood of the contact points. Regular crossings result in standard rates and Gaussian limiting distributions, but higher order contacts (in the sense introduced in this work) or contacts at the extremes of the supports may result in different rates and non-Gaussian limits.

Full PDF

aa r X i v : . [ m a t h . S T ] F e b The complex behaviour of Galton rankorder statistic. ∗ E. del Barrio , J.A. Cuesta-Albertos and C. Matr´an Departamento de Estad´ıstica e Investigaci´on Operativa and IMUVA,Universidad de Valladolid Departamento de Matem´aticas, Estad´ıstica y Computaci´on,Universidad de Cantabria

February 5, 2021

Abstract

Galton’s rank order statistic is one of the oldest statistical tools for two-samplecomparisons. It is also a very natural index to measure departures from stochas-tic dominance. Yet, its asymptotic behaviour has been investigated only partially,under restrictive assumptions. This work provides a comprehensive study of thisbehaviour, based on the analysis of the so-called contact set (a modiﬁcation of theset in which the quantile functions coincide). We show that a.s. convergence to thepopulation counterpart holds if and only if the contact set has zero Lebesgue mea-sure. When this set is ﬁnite we show that the asymptotic behaviour is determinedby the local behaviour of a suitable reparameterization of the quantile functions in aneighbourhood of the contact points. Regular crossings result in standard rates andGaussian limiting distributions, but higher order contacts (in the sense introducedin this work) or contacts at the extremes of the supports may result in diﬀerentrates and non-Gaussian limits.

Keywords:

Relaxed stochastic dominance, asymptotics, consistency, Galton rank orderstatistic, comparison of quantile functions, contact points, crossings, tangencies, contactintensity. ∗ Research partially supported by FEDER, Spanish Ministerio de Econom´ıa y Competitividad, grantsMTM2014-56235-C2-1-P and MTM2017-86061-C2-1-P and Junta de Castilla y Le´on, grants VA005P17and VA002G18. Introduction and main results

The Introductory Remarks in Darwin’s report on the beneﬁts of cross-fertilization tothe propagation of vegetal species [Darwin (1876)] include the following comment, byGalton: “The observations. . . have no primˆa facie appearance of regularity. But as soonas we arrange them in order of their magnitudes,. . . . We now see, with few exceptions,that. . . the largest plant on the crossed side. . . exceeds the largest plant on the self-fertilisedside, that. . . the second exceeds the second,. . . and so on. . . ” . With this argument, Galtonopened a simple way of comparison of distributions, just by comparing the values withthe same ranks in their respective settings.Given two samples of equal size, X , . . . , X n and Y , . . . , Y n , respectively coming fromthe distribution functions (d.f.’s in the sequel) F and G , let us denote by F n and G n thecorresponding sample d.f.’s. Galton’s solution consisted in reordering both data samplesin increasing order: X (1) , . . . , X ( n ) (coming from the control) and Y (1) , . . . , Y ( n ) (from thetreatment) and computing G ( F n , G n ) := { i : X ( i ) > Y ( i ) } , concluding improvementunder the treatment whenever G ( F n , G n ) is small enough. When F = G is continu-ous, the distribution of this ‘Galton Rank Order’ statistic is uniform on { , , . . . , n } (see[Chung and Feller (1949)]; see also [Sparre-Andersen (1953)], [Hodges(1955)] or [Feller(1968)]for alternative proofs). As explained in [Hodges(1955)], in Darwin’s problem the samplesizes were 15 and G ( F , G ) = 2, thus the p -value associated to Galton’s approach is3/16, which is not as rare as he suspected.Galton’s strategy was related to the assessment of stochastic dominance of G over F , F < st G , being the alternative to the null hypothesis F = G . Recall that, by deﬁnition, F ≤ st G whenever F ( x ) ≥ G ( x ) for every x ∈ R . As noted in [Lehmann(1955)], this relation is better understood when it is stated interms of the quantile functions: if F − is the quantile function associated to F, deﬁnedby F − ( t ) := inf { x : t ≤ F ( x ) } , for t ∈ (0 , , (1)then F ≤ st G whenever F − ( t ) ≤ G − ( t ) for every t ∈ (0 , . A useful feature of the quantile functions is that they provide a canonical represen-tation of random variables (r.v.’s in that follows) with a given d.f.: if we consider theLebesgue measure, ℓ , on the unit interval (0 , F − is a r.v. with d.f. F .With this in mind, we set γ ( F, G ) := ℓ { t : F − ( t ) > G − ( t ) } (2) We have tried to use throughout standard or natural notation. However, a complete enough notationguide is included at the end of this section. G ( F n , G n ) = nγ ( F n , G n ) . (3)Early work on Galton’s rank statistic focused on the case F = G and equal samplesizes. Special mention should be given to [Cs´aki and Vincze (1961)], which analyzes thejoint behaviour of the Kolmogorov-Smirnov and Galton statistics (under F = G ). Alsofor equal sample sizes, later, [Gross and Holland(1968)] considered the intermediate casewith F = G possibly, but ℓ { F − = G − } >

0. Focusing on the dominance model F = G vs F ≤ st G , [Behnen and Neuhaus (1983)] addressed the local asymptotic eﬃciency of γ ( F n , G m ), noting that it is just a generalization of Galton’s statistic (recall (3)) andusing empirical processes techniques to obtain the asymptotic distribution of γ ( F n , G m )under the null F = G for independent samples with diﬀerent sizes. Independently, lookingfor a feasible statistical way of relaxing the idea of “treatment improvement” underlyingstochastic dominance, [ ´Alvarez-Esteban et al.(2017)] introduced (2) as a na¨ıf index tomeasure deviation from stochastic dominance, F ≤ st G and provided some asymptotictheory for the empirical index, for the case of d.f.’s with a single crossing point (the typicalcase in a location-scale family setting). In the same line, [Zhuang et al (2019)] adaptedthe theory to cover even a ﬁnite number of crosses between the d.f.’s, under the additionalassumption of an exponential density ratio model and using semiparametric estimates ofthe quantile functions.Here, in a wide setting, we provide a complete set of distributional limit results forGalton’s rank order statistic, showing the complex panorama of the asymptotic behaviourof γ ( F n , G m ). In particular, we pursuit on the goal of analyzing the scarcely treated caseof a ﬁnite number of contact points between F − and G − , leading to a sound study of thelocal behaviour at every isolated contact point between quantile functions. This focuseson the consideration of the “contact intensity” (to be properly deﬁned), which exceedsthe merely visual scope of crossing points of smooth enough curves and presents certainsimilarities with concepts lying in Stochastic Geometry. That contact intensity relies onthe existence of a local Lypschitzian reparameterization of a curve in terms of the other. Next we introduce the basic concepts we handle and explain the main results, whoseproofs are deferred to Sections 3 and 4.Intuitively, the asymptotic behaviour of γ ( F n , G m ) depends on the size of the contactset, namely, the set Γ := { t : F − ( t ) = G − ( t ) } . (4)For equal sized samples this was already observed in [Gross and Holland(1968)]. We notethat, since the index γ is invariant with respect to strictly increasing transformations,the set Γ could be equivalently expressed, in regular cases, as { t : Λ ( F − ( t )) = 0 } , where Λ ( x ) := G − ( F ( x )) − x is the shift function introduced in [Doksum (1974)] as a richer3lternative to the diﬀerence of means for comparing two continuous d.f.’s. The analysis ofthe Q-Q process associated to Λ was done in [Aly (1986)], under smoothness assumptions,through strong approximations. Yet, intuition may fail without some regularity conditionsand, as we show in this work, Γ is not really the right set to look at. In fact, theasymptotic analysis of Galton’s rank statistic is better handled in terms of the alternativeshift function h ( t ) := F G ( t ) − t , underlying the associated P-P process considered in[Aly et al.(1987)]. Here, and throughout this work, we denote F G := F ◦ G − (similarly, G F = G ◦ F − ) and ˜Γ := { t : F G ( t ) = t } . (5)We observe that if F and G are continuous, then ˜Γ = Γ. However, these sets can bequite diﬀerent: for F = G , a Bernoulli distribution with mean p , we have Γ = [0 , { − p, } . By focusing on the ‘right’ choice of contact set, our results go beyond thecases that could be treated from the analyses in [Aly (1986)] and [Aly et al.(1987)]. Infact, we provide necessary and suﬃcient conditions for the a.s. consistency of γ ( F n , G m ) without any smoothness assumption : Theorem 1.1

Let

F, G be arbitrary d.f.’s. Then γ ( F n , G m ) a.s. → γ ( F, G ) , as n, m → ∞ ifand only if ℓ (˜Γ) = 0 . A similar result holds for the one-sample statistic, γ ( F n , G ).As we see from Theorem 1.1, if ℓ (˜Γ) > γ ( F n , G m ) (or γ ( F n , G )) are not consistentestimators of γ ( F, G ). In this case, we provide a completely general result about theasymptotic behaviour of γ ( F n , G ): Theorem 1.2

Let

F, G be arbitrary d.f.’s. Then γ ( F n , G ) − γ ( F, G ) w → ℓ n t ∈ ˜Γ : B ( t ) > o , as n → ∞ , where B is a standard Brownian bridge on [0 , . Still in the case ℓ (˜Γ) >

0, we prove weak convergence of the two-sample statistic, γ ( F n , G m ), under mild assumptions. This problem was also treated in [Gross and Holland(1968)]for equal sample sizes ( n = m ), through combinatorial arguments and the method of mo-ments. That combinatorial approach seems to be inappropiate to handle the case ofunequal sample sizes. Additionally, our version yields a simple representation of the limitlaw. Theorem 1.3

Let

F, G be d.f.’s such that F G is Lipschitz. If B is a standard Brownianbridge on [0 , and m, n → ∞ satisfy < lim inf nm + n ≤ lim sup nm + n < , then γ ( F n , G m ) − γ ( F, G ) w → ℓ { t ∈ ˜Γ : B ( t ) > } .

4t should be noted that the limiting distribution in Theorems 1.2 or 1.3 is non-degenerate if and only if ℓ (˜Γ) > , o in [L´evy(1939)] or p. 85-86 in [Billingsley (1968)])is that if B is a standard Brownian bridge on [0 , P ( ℓ { t ∈ [0 ,

1] : B ( t ) > } ≤ x ) = x, for every x ∈ [0 , . From this and Theorem 1.3 we recover, asymptotically, the classical result for the case m = n and continuous G = F (recall that in this case γ ( F n , G n ) is uniformly distributedover { , n , . . . , n − n , } ; continuity of F ensures that F ( F − ( t )) = t for every t ∈ (0 ,

1) andTheorem 1.3 applies with ˜Γ = (0 , ℓ (˜Γ) = 0 the limiting distribution in Theorems 1.2 and 1.3 is degenerated at0. In Section 4 we obtain non-degenerated limiting distributions, with diﬀerent rates,when the contact set consists of a ﬁnite collection of contact points. The key is the localasymptotic behaviour of F − n − G − m around these inﬂuential points. To avoid unnecessarysmoothness assumptions here, we must consider contact points between nondecreasingfunctions in a generalized sense, including virtual contact points: those corresponding tocontacts between the vertical segments joining lateral limits at discontinuity points. Sincequantile functions are left continuous, the following deﬁnition includes all these contactpoints. Deﬁnition 1.4

We say that t ∈ (0 , is a (generalized) contact point between F − and G − if either (i) F − ( t ) = G − ( t ) or (ii) F − ( t ) < G − ( t ) ≤ F − ( t +) or (iii) G − ( t ) G − m } ∩ ( t − η, t + η ) (cid:1) − ℓ (cid:0) { F − > G − } ∩ ( t − η, t + η ) (cid:1) , (6)5or t ∈ Γ ∗ , assuming that η > t is the only point inΓ ∗ ∩ ( t − η, t + η ) (as we will see, asymptotically, ℓ t n,m does not depend on η , hence, wedo not include it in our notation). We relate this behaviour to the character, positionand intensity of the contact, in the following sense. For t such that F G ( t ) = t , set H := { h : t + h ∈ [0 , } and consider the function ∆ : H → R :∆( h ) := F G ( t + h ) − t − h. (7)We will assume that ∆ is locally Lipschitz at 0 (equivalently, that F G is locally Lipschitzat t ) plus a higher order expansion, possibly on the positive and the negative sides (forextremal contact points only one expansion makes sense). More precisely, we will assumeadditionally to the Lipschitz property that there exist η > r L = r L ( t ) , r R = r R ( t ) ≥ C L = C L ( t ) = 0 , C R = C R ( t ) = 0 such that∆( h ) = ( C L | h | r L + o ( | h | r L ) , if h ∈ ( − η, ,C R | h | r R + o ( | h | r R ) , if h ∈ (0 , η ) . (8)In these cases, we will say that r L (resp. r R ) is the intensity or order of the contact onthe left (resp. on the right ) between F − and G − at t . We observe that the assumptionsimply that, for small enough η , sgn( F G ( t ) − t ) = sgn( C L ) on ( t − η, t ) and sgn( F G ( t ) − t ) =sgn( C R ) on ( t , t + η ). A point t satisfying these conditions will be called a regular contactpoint .For integer r L and r R , expression (8) is a kind of left- and right- Taylor expansion.However, r L and r R are not necessarily integer in the deﬁnition above. We can classifyregular contact points as crossing points (the case sgn( C L ) = sgn( C R )) or tangency points (if sgn( C L ) = sgn( C R )). Notice also that under a proper Taylor expansion, t is a crossingor tangency point depending only on whether r L = r R is odd or even, while decomposition(8) allows to have a crossing point with odd r L = r R or a tangency point with even r L = r R .We must stress that (8) does not necessarily imply smoothness conditions on F − or G − . As an example, consider the case when F − ( t ) < G − ( t ) ≤ G − ( t +) < F − ( t +).Then t is a discontinuity point of F − (maybe also of G − ), but F G is then locallyconstant, namely, F G ( t ) = t for t close enough to t and (8) holds with r L = r R = 1, C R = − C L = −

1. We should also note that, while (8) excludes discontinuity points for F G , in particular, virtual contact points between F G and the identity, our approach allowsto handle these points in a rather straigthforward way (see (42), (43) and Theorem 4.10).Finally, we note that while (8) requires the contact orders to be at least 1, lower orderscan also be considered. If, for instance, ∆( h ) = sgn( h ) | h | r , with 0 < r <

1, then ∆is not Lipschitz around 0, but G F is and, under some additional assumptions, the localbehaviour can be studied through ˜ ℓ t m,n , the version of ℓ t n,m in which the roles of the X and Y samples are exchanged (see the comments before the proof of Theorem 1.5).For a compact description of the limit distribution for the terms ℓ t n,m we consider inde-pendent random elements B , B , W , W , { ξ ,n } n ≥ , { ξ ,n } n ≥ , { ξ ,n } n ≥ , { ξ ,n } n ≥ , where6 i are Brownian bridges on [0 , W i are Brownian motions on [0 , ∞ ) and { ξ i,n } n ≥ sequences of i.i.d. exponential r.v.’s with unit mean. We set S ik := ξ i, + · · · + ξ i,k , k ≥ , i = 1 , . . . ,

4. We ﬁx λ ∈ (0 ,

1) and set B λ := √ λ B − √ − λ B .We consider r L , r R ≥ r := max( r L , r R ). Also, for real numbers a, b , wewill use the notation a sgn ( b ) for a + (the positive part of a )) either a − (the negative part)depending on whether b > b < . For t ∈ (0 ,

1) we deﬁne T r L ,r R ( t ; C L , C R ) := sgn( C L ) (cid:16) ( B λ ( t )) sgn( CL ) | C L | (cid:17) /r I ( r L = r )+ sgn( C R ) (cid:16) ( B λ ( t )) sgn( CR ) | C R | (cid:17) /r I ( r R = r ) , (9)when r > r = 1 and C R C L >

0, while T , ( t ; C L , C R ) := ( B λ ( t )) sgn( CL ) C L + ( B λ ( t )) sgn( CR ) C R + sgn( C L ) B ( t ) √ − λ , (10)when C L C R <

0. Additionally, for r > t = 0 , T r ,r ( t ; C, C ) := sgn( C ) ℓ (cid:8) y ∈ (0 , ∞ ) : sgn( C ) W t ( y ) > ( λ (1 − λ )) / | C | y r (cid:9) , (11)while in the case r = 1 we set T , (0; C, C ) := sgn( C ) λ (1 − λ ) R ∞ I (cid:8) sgn ( C ) λS ⌈ (1 − λ ) y ⌉ > sgn ( C )(1 − λ )(1+ C ) S ⌈ λy ⌉ (cid:9) dyT , (1; C, C ) := sgn( C ) λ (1 − λ ) R ∞ I (cid:8) sgn ( C ) λS ⌈ (1 − λ ) y ⌉ > sgn ( C )(1 − λ )(1+ C ) S ⌈ λy ⌉ (cid:9) dy. (12)The double subindex and the double C are redundant for these extremal contact points,but allows to keep a simple notation.We are ready to present the results describing the asymptotic behaviour of ℓ t n,m forregular contact points. Theorem 1.5 deals with innner contact points, while extremalcontact points are considered in Theorem 1.6 (in fact, Theorem 1.5 remains valid forextremal contact points, but the limit distribution is Dirac’s measure on 0 in that case). Theorem 1.5

Assume t is a regular inner contact point with contact orders r L = r L ( t ) , r R = r L ( t ) ≥ and constants C L = C L ( t ) , C R = C R ( t ) . If r = max( r L , r R ) and n, m → ∞ with nn + m → λ ∈ (0 , , then, for every small enough η > , ( n + m ) r ℓ t n,m w → T r L ,r R ( t ; C L , C R ) . Theorem 1.6

Assume t ∈ { , } is regular with contact order r ≥ and constant C . If m, n → ∞ with nn + m → λ ∈ (0 , , then, for every small enough η > n + m ) r − ℓ t n,m w → T r,r ( t ; C, C ) . (13)7e see from Theorems 1.5 and 1.6 that, with the same contact intensities, ℓ t n,m vanishesfaster for extremal contact points. In Subsection 4.1 we provide examples of extremalcontact points for which ℓ t n,m converges at rate ( n + m ) − c for every c ∈ (0 ,

1] and of innercontact points for which the rate is ( n + m ) − c , c ∈ (0 , ]. Another distinctive featureof the limiting distributions for inner contact points is that for crossing points (thosewith C L ( t ) C R ( t ) <

0) the limiting distribution takes positive and negative values withpositive probabilities. If t is a tangency point ( C L ( t ) C R ( t ) >

0) then the limitingdistribution is concentrated on (0 , ∞ ) or on − ( ∞ , γ ( F n , G m ). Theorem 1.7

Assume Γ ∗ = { t , . . . , t k } where t i is a regular contact point with intensi-ties r L ( t i ) , r R ( t i ) and constants C L ( t i ) , C R ( t i ) , i = 1 , . . . , k . Set r i = max( r L ( t i ) , r R ( t i )) if t i ∈ (0 , , and r i = max( r L ( t i ) , r R ( t i )) − if t i ∈ { , } . Then, if r = max ≤ i ≤ k r i , ( n + m ) r ( γ ( F n , G m ) − γ ( F, G )) w → k X i =1 I ( r i = r ) T r L ( t i ) ,r R ( t i ) ( t i ; C L ( t i ) , C R ( t i )) . Theorem 1.7 shows that the rate of convergence of γ ( F n , G m ) is determined by themaximal intensity of contact, and that only points with maximal intensity contributeto the limiting distribution, with adjustments to take into acount the diﬀerent role ofinner and extremal contact points. If there are extremal contact points then the rate ofconvergence can be ( n + m ) c for any c ∈ (0 , n + m ) c with c ∈ (0 , ]. The only case in which γ ( F n , G m ) is asymptoticallynormal is when the inner contact points have intensity one, all of them with constants C L = − C R and there is no extremal contact point or its inﬂuence vanishes faster. The remaining sections of this work are organized as follows. Section 2 includes somekey results on quantile functions and analyzes the structure of the contact sets. We willexplicitly formulate several results on quantile functions. Some are classical, but, in factit is not an easy task to ﬁnd a comprehensive reference on quantile functions, with thenotable exception of Appendix A in [Bobkov and Ledoux(2016)] on ‘Inverse DistributionFunctions’. We observe that [Bobkov and Ledoux(2016)] is devoted to the analysis ofconvergence rates of Kantorovich transport distances between probability measures on thereal line, which can be expressed in terms of quantile functions as R | F − ( t ) − G − ( t ) | p dt, thus our problem corresponds to the limiting case p = 0. Remarkably, this problem alsoencompasses a wide range of convergence rates.In Section 3 we provide the proofs of Theorems 1.1, 1.2 and 1.3. Most of the limittheorems that we give for Galton’s rank statistic are based on convenient representa-tions of empirical quantile functions, combined with some type of strong approximation.8sing representation (14) below, we can derive limit theorems for Galton’s rank statis-tic relying on strong approximations for uniform quantile processes, rather than usingstrong approximations for general quantile processes (as, for instance, in Chapter 6 in[Cs¨orgo and Horvath (1993)]). This results in a signiﬁcant gain in generality, since ap-proximations for general quantile processes typically require strong smoothness assump-tions (existence of densities plus additional conditions on them) that we can circumventwith this approach.Section 4 gives the proofs of Theorems 1.5, 1.6 and 1.7. The key ingredients for thiswill be, as in Section 3, a convenient representation of the quantile processes and someapplication of strong approximations. With some simple localization results (Lemma 4.1and Corollary 4.2) we see that the asymptotic behaviour of γ ( F n , G n ) can be studiedthrough that of the localized terms ℓ t m,n with t in the contact set. Some results on theasymptotic independence between lower, central and upper order statistics allow then tocomplete the proof of Theorem 1.7. Subsection 4.1 in that section provides some examplesof contact points with diﬀerent positions and contact intensities. This subsection alsoincludes a simpliﬁed version of Theorem 1.7 under conditions that guarantee that F G issmooth (see Theorem 4.9); and a further limit theorem (Theorem 4.10) for the case when F and G have ﬁnite supports. This is an interesting example which can be handled withour approach even though the contact points here are not regular contact points.We include an Appendix with some additional material. The ﬁrst part is devotedto some properties of the F G transform, including a technical discussion on conditionswhich guarantee that F G is Lipschitz or locally Lipschitz. Finally, we present a strongapproximation result that we have used in several proofs. We end this Introduction with some words on notation. Through the paper L ( X ) willdenote the law of the random vector or r.v. X . We will consider a generic probabilityspace (Ω , σ, P ), where the involved random objects are deﬁned. Given the (measurable)sets A, B , by I A we will denote the indicator function of A and A \ B will denote the set { x ∈ A : x / ∈ B } . As before, ℓ will denote the Lebesgue measure on the unit interval(0 , a.s. → , p → , and w → . Given a real value, x , we will use ⌈ x ⌉ to denote thesmaller integer greater or equal than x , and x + := sup { x, } and x − = − inf { x, } . Alsowe use the notation f ( x − ) := lim y → x − f ( y ) and f ( x +) := lim y → x + f ( y ) for the laterallimits of a real function, f , whenever these limits exist, and sgn( x ) (deﬁned, for real x , as0 if x = 0 and x/ | x | otherwise). Also recall that for real numbers a, b , a sgn ( b ) will denoteeither a + or a − depending on whether b > b < . Throughout, X , . . . , X n and Y , . . . , Y m will be independent samples of i.i.d. r.v.’s suchthat L ( X i ) and L ( Y i ) have respective d.f.’s F and G . As above, F n and G m will denotethe respective sample d.f.’s based on the X ′ s and Y ′ s samples. Occasionally, we will use9he superscript ω in functions computed from the sample values X i ( ω ) , i = 1 , . . . , n or Y j ( ω ) , j = 1 , . . . , m , (for instance, the empirical d.f. F ωn or the empirical quantile function( F ωn ) − ). Without loss of generality we can (and often do) assume that the samples havebeen obtained from independent U (0 ,

1) samples U , . . . , U n and V , . . . , V m through thetransformations X i = F − ( U i ) , Y j = G − ( V j ) . From now on, we will denote the empiricalquantile functions of these uniform samples by U n and V m . We have the obvious relations F − n = F − ( U n ) , G − m = G − ( V m ) . Writing u n and v m for the quantile processes based onthe U i ’s and the V j ’s, respectively, ( u n ( t ) = √ n ( U n ( t ) − t ) and similarly for v m ) we notethat F − n ( t ) = F − (cid:16) t + u n ( t ) √ n (cid:17) and G − m ( t ) = G − (cid:16) t + v m ( t ) √ m (cid:17) . (14)As already noted, the limiting behaviour of Galton’s rank statistic is best describedin terms of the contact set between the identity and the function F G ( t ) := F ( G − ( t )) . (15)We note that, while the role of F and G is symmetric in the deﬁnitions of Γ and Γ ∗ ,this is not true in the case of ˜Γ. For a more clear description of the relations among thesesets we sometimes write ˜Γ F = ˜Γ, ˜Γ G = { t ∈ (0 ,

1) : G F ( t ) = t } and ˜Γ ∗ F (resp. ˜Γ ∗ G ) forthe set of generalized contact points between F G (resp. G F ) and the identity (see (19)).Obviously, ˜Γ F ⊂ ˜Γ ∗ F . Quantile functions deﬁned as in (1) provide a useful description of probabilities on thereal line in terms of nondecreasing, left-continuous functions on (0 , H , deﬁned on (0 ,

1) is the unique quantilefunction associated to just a unique d.f.: as a dual relation to (1), such a function H isthe quantile function associated to the d.f. F ( x ) = sup { t ∈ (0 ,

1) : H ( t ) ≤ x } . (16)As already noted, it will be convenient at some points to extend F − to 0 and 1 in theobvious way (hence, F − (0) := F − (0+) and F − (1) := F − (1 − )).In this section we present some relevant facts on the relation between quantile functionsand the composite functions F G deﬁned in (15) without any smoothness assumption onthe d.f.’s. We must begin by stressing the fact that, in general, we cannot guaranteeeven lateral continuity of F G (that would we only guaranteed for F ( G − ( t ) − ) on the leftand for F ( G − ( t +)) on the right). On the other hand, from the well known relation for t ∈ (0 , t ≤ F ( x ) ⇐⇒ F − ( t ) ≤ x , it is easy to see the relations F G ( t ) = max { s ∈ [0 ,

1] : F − ( s ) ≤ G − ( t ) } , for t ∈ (0 ,

1) (17)10 − ( t ) > G − ( s ) ⇐⇒ t > F G ( s ) for t, s ∈ (0 , . (18)We note that F G ( t ) and G F ( t ) could be diﬀerent even when t ∈ Γ. This possibility isnaturally related to the behaviour of the composition F ( F − ( t )). Clearly, F ( F − ( t ) − ) ≤ t ≤ F F ( t ), thus F F ( t ) = t when F is continuous at F − ( t ), but this could fail otherwise.More precisely, for t ∈ (0 , F F ( t ) = t is equivalent to t ∈ Im( F ),where Im( F ) := { F ( x ) , x ∈ R } (see Lemma A.3 in [Bobkov and Ledoux(2016)]). Nowlet t ∈ (0 , ∩ Γ: if t ∈ Im( F ), then t = F F ( t ) = F G ( t ), while if t / ∈ Im( F ), then t < F F ( t ) = F G ( t ). We collect these facts and some easy consequences for further referencein the following lemma. Lemma 2.1

Let

F, G be arbitrary d.f.’s. For t ∈ (0 , , with the above notation, we have:a) If t ∈ Γ ∩ Im ( F ) , then F G ( t ) = t .b) If t ∈ Γ \ Im ( F ) , then F G ( t ) > t .c) If F G ( t ) = t , then either t ∈ Γ or F − ( t ) < G − ( t ) .d) If F G ( t ) = t and G F ( t ) = t , then t ∈ Γ . The conclusions in Lemma 2.1 can be rewritten with the notation of (4) and (5). Item a) , for instance, becomes Γ ∩ Im( F ) ⊂ ˜Γ. Generalized contact points in the sense ofDeﬁnition 1.4 (that is, points in Γ ∗ ) can also be characterized in terms of the compositefunctions F G and G F .As already noted in the Introduction, the consideration of virtual contact points asso-ciated to left-jump discontinuities of F G is not necessary because they are in fact contactpoints in the strict sense or they are associated to right-jump discontinuities of G F (seeProposition 2.5). In consequence, we consider a point t ∈ (0 ,

1) as a contact point of F G and the identity whenever F G ( t ) = t , or F G ( t ) < t ≤ F G ( t +) (19)Note that the virtual contact condition F G ( t ) < t ≤ F G ( t +) is equivalent to F G ( t ) < t ≤ F G ( s ) for all s > t , hence also to G − ( t ) < F − ( t ) ≤ G − ( t +),which is condition (iii) in Deﬁnition 1.4. Therefore, we have shown that Proposition 2.2

The virtual contact points of F − and G − are exactly the virtual con-tact points of F G or G F with the identity. We note that ˜Γ ∗ F \ ˜Γ F is contained in the set of discontinuity points of the nondecreasingfunction F G , which must be at most countable. Hence, ℓ (˜Γ ∗ F \ ˜Γ F ) = 0 and ℓ (˜Γ ∗ F ) = ℓ (˜Γ F ).Proposition 2.2 means that Γ ∗ \ Γ = (˜Γ ∗ F \ ˜Γ F ) ∪ (˜Γ ∗ G \ ˜Γ G ). We explore next the situationfor contact points in the strict sense. 11 roposition 2.3 If t ∈ Γ ∩ (0 , then F G ( t ) = t , or G F ( t ) = t (that is t ∈ ˜Γ F ∪ ˜Γ G ),or the set { t ∈ (0 ,

1) : F − ( t ) = G − ( t ) = F − ( t ) } is a non-degenerate interval (hence,in the latter case, the point x = F − ( t ) is a common discontinuity point of F and G and t cannot be an isolated element of Γ ∗ ). Proof.

It is easy to see that if t ∈ (0 , \ (Im( F ) ∪ Im( G )) satisﬁes F − ( t ) = G − ( t ),then the point x = F − ( t ) would have positive mass under both distributions, hencethe set { t ∈ (0 ,

1) : F − ( t ) = G − ( t ) = F − ( t ) } is a non-degenerate interval. Any otherpoint in Γ ∩ (0 ,

1) must belong to Im( F ) ∪ Im( G ) and, by Lemma 2.1, must satisfy either F G ( t ) = t or G F ( t ) = t . • Proposition 2.4

Let t ∈ (0 , be such that t ∈ ˜Γ F ∪ ˜Γ G , that is F G ( t ) = t or G F ( t ) = t . Then t ∈ Γ ∗ ( t is a contact point between F − and G − ). Proof.

For any t ∈ (0 ,

1) such that F G ( t ) = t (the case G F ( t ) = t is identical), wemust have one of the following exclusive possibilities:i) G F ( t ) < t , and then F − ( t ) < G − ( t ), andii) F G ( t ) = t = G F ( t ), or F G ( t ) = t < G F ( t ), which lead to F − ( t ) = G − ( t ) . If i) holds, then we would have G − ( t ) ≤ F − ( t +) (this follows easily from the fact thatthe strict inequality G − ( t ) > F − ( t +) would imply F G ( t ) = F ( G − ( t )) > t ). Hence,i) implies F − ( t ) < G − ( t ) ≤ F − ( t +). • The next proposition shows that it is not necessary to consider contact points associ-ated to left-discontinuities.

Proposition 2.5

Let t ∈ (0 , . If F G ( t − ) ≤ t < F G ( t ) , then G F ( t ) ≤ t ≤ G F ( t +) , or the point x = G − ( t ) is a common discontinuity point of F and G and t cannot bean isolated element of Γ ∗ . Proof.

If we suppose that G − ( t ) = G − ( t ) for some t < t , then for every sequence { t n } such that t n → t − , F ( G − ( t n )) = F ( G − ( t )) will hold eventually, thus leading tothe absurd F G ( t − ) = F G ( t ) . Therefore it must be x := G − ( t ) > G − ( t ) for every t < t , and F G ( t − ) = F ( G − ( t ) − ) . Moreover, the discontinuity of F and its link with F − easily show that F − ( t ) = x if t ∈ ( F ( x − ) , F ( x )]. Now, on the ﬁrst hand, fromthe hypothesis we obtain F − ( t ) ≤ x and F − ( s ) = x for every s ∈ ( t , F ( x )), hence G F ( t +) = G ( x ) = G ( G − ( t )) ≥ t . (20)On the other hand, from the relation t < F G ( t ) we obtain F − ( t ) ≤ G − ( t ), hence F − ( t ) = G − ( t ) or, alternatively, F − ( t ) < G − ( t ) what gives G F ( t ) < t . This12elation and (20) imply that G F ( t ) < t ≤ G F ( t +). Finally, if F − ( t ) = G − ( t ) and x = F − ( t ) is a continuity point of G , from (20) we obtain that G F ( t ) = t what provesthe result. • We conclude this section with some easy consequences of the last results.

Corollary 2.6

Let t ∈ (0 , such that F G ( t ) = t (resp. G F ( t ) = t ) then t is acontact point (possibly virtual) between G F (resp. F G ) and the identity. Corollary 2.6 states that ˜Γ F ⊂ ˜Γ ∗ G . From the comments after Proposition 2.2 we seethat ℓ (˜Γ F ) ≤ ℓ (˜Γ ∗ G ) = ℓ (˜Γ G ). The same argument shows that ℓ (˜Γ G ) ≤ ℓ (˜Γ F ), hence ℓ (˜Γ F ) = ℓ (˜Γ G ). This means, in particular, that the roles of F and G in the condition ℓ (˜Γ) = 0 in Theorems 1.1, 1.2 and 1.3 are completely symmetric. Proposition 2.7 If Γ ∗ is ﬁnite then Γ ∗ = ˜Γ ∗ F ∪ ˜Γ ∗ G . In particular, ˜Γ ∗ F and ˜Γ ∗ G are ﬁnite. We remark that, while ˜Γ ∗ F ∪ ˜Γ ∗ G ⊂ Γ ∗ always holds (this follows from Proposition 2.4),the set Γ ∗ can be much bigger that ˜Γ ∗ F ∪ ˜Γ ∗ G (recall the comments in the Introduction; thecase G = F , with F the d.f. of the Bernoulli law with mean p gives a simple example ofthis).In Section 4 we prove distributional limit theorems for γ ( F n , G m ) under the assumptionthat Γ ∗ is ﬁnite, say, Γ ∗ = { t < · · · < t r } . The diﬀerences F − ( t ) − G − ( t ) must haveconstant sign in the open intervals ( t i , t i +1 ) (the same happens in (0 , t ) or ( t r ,

1) if 0 or1 are not contact points). The next result will enable us to focus on neighbourhoods ofisolated contact points to study γ ( F n , G m ). Lemma 2.8

Assume < a ≤ b < are such that [ a, b ] ∩ Γ ∗ = ∅ , and also thatsgn ( F − ( t ) − G − ( t )) > (resp. sgn ( F − ( t ) − G − )( t ) < ) for every t ∈ [ a, b ] . Thenthere exists δ > such that F − ( t ) − G − ( t +) > δ (resp. F − ( t ) − G − ( t +) < − δ ) forevery t ∈ [ a, b ] . Proof : Let us consider the case sgn( F − ( t ) − G − ( t )) >

0. Assume, on the contrary, thatthere exist a sequence { t k } ⊂ [ a, b ] such that F − ( t k ) − G − ( t k +) →

0. In this case it ispossible to choose { t ∗ k } ⊂ (0 ,

1) such that, t k < t ∗ k , t k − t ∗ k → F − ( t k +) − F − ( t ∗ k ) → G − ( t k +) − G − ( t ∗ k ) →

0. Since [ a, b ] is compact, we can assume that { t k } converges. Thenalso { t ∗ k } converges. We write t ∈ [ a, b ] for the common limit. By taking subsequences,if necessary, we can also assume that both sequences are monotone.Now, we only need to consider four possible cases. If, for instance, { t k } and { t ∗ k } are increasing, then we would obtain that F − ( t ) = G − ( t ) which is impossible byassumption. If the sequence { t k } is increasing and { t ∗ k } is decreasing, then F − ( t k ) → F − ( t ) and G − ( t ∗ k ) → G − ( t +), and we would have that F − ( t ) = G − ( t +) and,consequently, t would be a contact point what is not possible either, because [ a, b ] ∩ Γ ∗ = ∅ .The two remaining cases lead to similar contradictions. •

13e conclude this section with two observations. First, we note that sgn( t − F G ( t )) =sgn( F − ( t ) − G − ( t )) for every t / ∈ Γ ∗ . To check this recall relation (18), giving that F − ( t ) > G − ( t ) if and only if t > F G ( t ). This also implies that F − ( t ) ≤ G − ( t ) if andonly if t ≤ F G ( t ), but t = F G ( t ) cannot happen if t / ∈ Γ ∗ (Proposition 2.4). On the otherhand, if t < F G ( t ) then F − ( t ) ≤ G − ( t ) but, again, F − ( t ) = G − ( t ) is not possible if x / ∈ Γ ∗ . This means that sgn( t − F G ( t )) is constant in the intervals ( t i , t i +1 ) as above.Our second observation arises from the fact that every nondecreasing left-continuousreal function, H , deﬁned on (0 ,

1) is the quantile function associated to the d.f. given by(16). We can apply Lemma 2.8 to the quantile function H ( t ) = F G ( t − ) and the identityand conclude, for instance, that in a compact interval where t − F G ( t ) > δ > t − F G ( t ) ≥ δ . We will exploit these facts in later sections. In this section we provide proofs of Theorems 1.1, 1.2 and 1.3. These results show thatGalton’s rank order statistic is a consistent estimator of the index γ ( F, G ) = ℓ { t : F − ( t ) >G − ( t ) } = ℓ { t : t > F G ( t ) } if and only if the contact set ˜Γ = { t : t = F G ( t ) } has zeroLebesgue measure. The key to the proof of Theorem 1.1 is the following lemma. Here wedenote Γ := Im( F ) ∩ Γ ∩ (0 , Lemma 3.1

Let

F, G be arbitrary d.f.’s. With the notation above, we have: γ ( F n , G m ) − γ ( F, G ) − ℓ ( { F − n > G − m } ∩ Γ) a.s. → as n, m → ∞ , (21) γ ( F n , G m ) − γ ( F, G ) − ℓ ( { F − n > G − m } ∩ Γ ) a.s. → as n, m → ∞ , (22) and γ ( F n , G m ) − γ ( F, G ) − ℓ ( { F − n > G − m } ∩ ˜Γ) a.s. → as n, m → ∞ . (23) Proof.

By right continuity, if t / ∈ Im( F ), then there exists δ t > t, t + δ t ) ∩ Im( F ) = ∅ and F F ( s ) = t + δ t , for every s ∈ [ t, t + δ t ]. From this, it easy to see that thereexists an at most countable family of disjoint intervals I k = [ a k , b k ), with a k < b k whichis a partition of the complement of Im( F ) and F F ( s ) = b k , for every s ∈ [ a k , b k ).The Glivenko-Cantelli Theorem gives that for some Ω ∈ σ , with P (Ω ) = 1, if ω ∈ Ω ,then sup t | F ωn ( t ) − F ( t ) | → t | G ωm ( t ) − G ( t ) | → . Now, recalling the elementary Skorohod theorem (see e.g. Lemma A.5 in [Bobkov and Ledoux(2016)]),for every ω ∈ Ω , the set T ω := (cid:8) t ∈ (0 ,

1) : ( F ωn ) − ( t ) → F − ( t ) and ( G ωm ) − ( t ) → G − ( t ) (cid:9) \{ a , a , . . . } ω ∈ Ω , γ ( F ωn , G ωm ) − γ ( F, G ) − ℓ ( { ( F ωn ) − > ( G ωm ) − } ∩ Γ)= ℓ (cid:2) { ( F ωn ) − > ( G ωm ) − , F − < G − } ∩ T ω (cid:3) − ℓ (cid:2) { ( F ωn ) − ≤ ( G ωm ) − , F − > G − } ∩ T ω ) (cid:3) , which converges to 0 because both sets within brackets converge to the empty set. Thisproves (21). To prove (22) we show that if ω ∈ Ω , then d n := ℓ (cid:0)(cid:8) ( F ωn ) − > ( G ωm ) − (cid:9) ∩ Γ (cid:1) − ℓ (cid:0)(cid:8) ( F ωn ) − > ( G ωm ) − (cid:9) ∩ Γ (cid:1) → . (24)To check this, notice that d n = ℓ (cid:0)(cid:8) ( F ωn ) − > ( G ωm ) − (cid:9) ∩ T ω ∩ Γ ∩ (cid:0) ∪ k ( a k , b k ) (cid:1)(cid:1) = ℓ (cid:0)(cid:8) t > ( F ωn ) G ωm ( t ) (cid:9) ∩ T ω ∩ Γ ∩ (cid:0) ∪ k ( a k , b k ) (cid:1)(cid:1) . Now, Glivenko-Cantelli again, and the construction of T ω yield that if ω ∈ Ω and t ∈ T ω ∩ Γ, then0 = lim n (cid:12)(cid:12) F ωn (cid:2) ( G ωm ) − ( t ) (cid:3) − F (cid:2) ( G ωm ) − ( t ) (cid:3)(cid:12)(cid:12) and lim n ( G ωm ) − ( t ) = G − ( t ) = F − ( t ) . (25)From here, if t ∈ T ω ∩ Γ ∩ ( a k , b k ) for some k , then, eventually, F [( G ωm ) − ( t )] = b k > t which, combined with the ﬁrst statement in (25), makes eventually impossible that t > ( F ωn ) G ωm ( t ) and shows (24).The proof of (23) is now obvious taking into account that, from Lemma 2.1Γ ⊂ ˜Γ ⊂ Γ ∪ { F − < G − } . (26) • Proof of Theorem 1.1.

Suﬃciency is a trivial consequence of Lemma 3.1. To provenecessity, if γ ( F n , G m ) a.s. → γ ( F, G ) , as n, m → ∞ , according to Lemma 3.1, we have that D n := ℓ (cid:16)(cid:8) F − n > G − m (cid:9) ∩ ˜Γ (cid:17) a.s. → , (27)From (58), we have D n = ℓ (cid:16) { U n > F G ( V m ) } ∩ ˜Γ (cid:17) ≥ ℓ (cid:16) { t : U n ( t ) > t } ∩ { t : t ≥ F G ( V m ( t )) } ∩ ˜Γ (cid:17) . Now Fubini’s theorem and independence between samples yield E [ D n ] ≥ Z ˜Γ P [ U n ( t ) > t ] P [ t ≥ F G ( V m ( t ))] dt ≥ Z ˜Γ P [ U n ( t ) > t ] P [ t > V m ( t )] dt, (28)15here the last inequality follows from the fact that, since F G is nondecreasing, F G ( t ) = t and V m ( t ) < t imply that F G ( V m ( t )) ≤ t. On the other hand, for every t ∈ (0 , /

2. But, since | D n | ≤

1, (27) implies E [ D n ] → ℓ (˜Γ) = 0. • Next, we give a proof of Theorem 1.2. We remark that our approach allows to handlethis one-sample statistic without any smoothness assumption on F or G . Proof of Theorem 1.2.

Assuming, w.l.o.g., the construction in Theorem B.1 we haveˆ γ n := γ ( F n , G ) = ℓ n t : F − (cid:16) t + u n ( t ) √ n (cid:17) > G − ( t ) o = ℓ n t : t + u n ( t ) √ n > F G ( t ) o = ℓ (cid:8) t : u n ( t ) > √ n ( F G ( t ) − t ) (cid:9) , and similarly γ := γ ( F, G ) = ℓ { t : F G ( t ) − t < } . Therefore, we see thatˆ γ n − γ = ℓ (cid:8) t : u n ( t ) > √ n ( F G ( t ) − t ) ≥ (cid:9) − ℓ (cid:8) t : 0 > √ n ( F G ( t ) − t ) ≥ u n ( t ) (cid:9) . Obviously, for the Brownian bridges B Fn ( t ), ℓ (cid:8) t : u n ( t ) > √ n ( F G ( t ) − t ) > (cid:9) ≤ ℓ n t : B Fn ( t ) + K log n √ n ≥ √ n ( F G ( t ) − t ) > o + ℓ n t : | B Fn ( t ) − u n ( t ) | > K log n √ n o . By Theorem B.1, the last summand eventually vanishes. For a ﬁxed Brownian bridge B ( t ) and t ∈ (0 ,

1) such that F G ( t ) − t >

0, we have B ( t ) + K log n √ n < √ n ( F G ( t ) − t )eventually. This and the bounded convergence theorem imply that ℓ n t : B ( t ) + K log n √ n ≥ √ n ( F G ( t ) − t ) > o a.s. → . As a result we obtain that ℓ (cid:8) t : u n ( t ) > √ n ( F G ( t ) − t ) > (cid:9) p → . Similarly we see that ℓ (cid:8) t : 0 > √ n ( F G ( t ) − t ) ≥ u n ( t ) (cid:9) p → γ n − γ = ℓ { t : u n ( t ) ≥ , F G ( t ) = t } + o P (1) . (29)Next, we observe that, eventually, ℓ n t ∈ ˜Γ : B Fn ( t ) − K log n √ n ≥ o ≤ ℓ { t ∈ ˜Γ : u n ( t ) ≥ }≤ ℓ n t ∈ ˜Γ : B Fn ( t ) + K log n √ n ≥ o . ℓ n t ∈ ˜Γ : B ( t ) + K log n √ n ≥ o → ℓ (cid:16) t ∈ ˜Γ : B ( t ) ≥ (cid:17) , and ℓ n t ∈ ˜Γ : B ( t ) − K log n √ n ≥ o → ℓ n t ∈ ˜Γ : B ( t ) ≥ o . This and (29) show the announced result. • We recall that the set involved in the limit law in the last result is ˜Γ, which generallydoes not coincide with Γ (see Lemma 2.1 and (26) for more details). For a better under-standing of the links between Theorem 1.1 and 1.2, we note that degeneracy in the limitlaw is equivalent to ℓ (˜Γ) = 0. This is an obvious consequence of the next, simple result. Lemma 3.2 If B ( t ) is a standard Brownian bridge on [0 , , for any Borel set A in [0 , ,the r.v. ℓ ( { B > } ∩ A ) is a.s. constant if and only if ℓ ( A ) = 0 . Proof. If ℓ ( A ) = 0 then, obviously, ℓ ( { B > } ∩ A ) = 0. Assume now that ℓ ( A ) > ℓ { t ∈ [0 ,

1] : B ( t ) = 0 } = 0 (this follows easily from Fubini’sTheorem). Moreover, if B is a Brownian bridge then B = d − B . Hence, ℓ ( { B < }∩ A ) = d ℓ ( { B > } ∩ A ), while ℓ ( { B < } ∩ A ) + ℓ ( { B > } ∩ A ) = ℓ ( A ). This implies that E ( ℓ ( { B > } ∩ A )) = ℓ ( A ) /

2. Thus, if ℓ ( { B > } ∩ A ) were a.s. constant, that constantshould equal ℓ ( A ) /

2. However, ℓ { B > } stochastically dominates ℓ {{ B > } ∩ A } ,and degeneracy on the value ℓ ( A ) / U (0 ,

1) lawstochastically dominates Dirac’s measure on ℓ ( A ) /

2, which cannot hold if ℓ ( A ) > • To deal with Galton’s rank statistic in the two-sample case we must adapt the ar-gument in the proof of Theorem 1.2. This is done with Lemma 3.3, which will play animportant role in our development. It relies on the strong approximation given in Theo-rem B.1 in the Appendix. Given two real functions f and g and versions of independentsequences of Brownian bridges { B Fn } , { B Gm } and of uniform quantile processes, u n and v m , as in Theorem B.1, we set f n ( t ) := f ( t + u n ( t ) √ n ) and g m := g ( t + v m ( t ) √ m ) , ˜ f n ( t ) := f ( t + B Fn ( t ) √ n ) and ˜ g m := g ( t + B Gm ( t ) √ m ) . (30) Lemma 3.3

Consider A ⊂ (0 , such that ℓ ( A ) > . With the notation and constructionof Theorem B.1, if we assume that f, g are two real Lipschitz functions, then there exists L > such that, if C n,m := L ( log nn + log mm ) , then whenever n, m → ∞ , eventually, ℓ (cid:8) t ∈ A : ˜ f n ( t ) > ˜ g m ( t ) + C n,m (cid:9) ≤ ℓ (cid:8) t ∈ A : f n ( t ) > g m ( t ) (cid:9) ≤ ℓ (cid:8) t ∈ A : ˜ f n ( t ) > ˜ g m ( t ) − C n,m (cid:9) . (31)17 roof: Since f is Lipschitz, for t ∈ A we have that (cid:12)(cid:12) f n ( t ) − ˜ f n ( t ) (cid:12)(cid:12) = (cid:12)(cid:12) f ( t + u n ( t ) √ n ) − f ( t + B Fn ( t ) √ n ) (cid:12)(cid:12) ≤ k f k Lip k u n − B Fn k ∞ √ n , with a similar bound for (cid:12)(cid:12) g m ( t ) − ˜ g m ( t ) (cid:12)(cid:12) . These bounds and (62) imply that on a probabilityone set, eventually,sup t ∈ A (cid:12)(cid:12)(cid:12) ( f n − g m ) − ( ˜ f n − ˜ g m ) (cid:12)(cid:12)(cid:12) ≤ L ( log nn + log mm ) = C n,m for some positive constant L (depending only on f and g ). Observe that ℓ (cid:8) t ∈ A : f n ( t ) > g m ( t ) (cid:9) ≤ ℓ (cid:8) t ∈ A : ˜ f n ( t ) > ˜ g m ( t ) − C n,m (cid:9) + ℓ (cid:8) t ∈ A : | ( f n ( t ) − ˜ f n ( t )) − ( g − m ( t ) − ˜ g m ( t )) | > C n,m (cid:9) ,ℓ (cid:8) t ∈ A : ˜ f n ( t ) > ˜ g m ( t ) + C n,m (cid:9) ≤ ℓ (cid:8) t ∈ A : f n ( t ) > g m ( t ) (cid:9) + ℓ (cid:8) t ∈ A : | ( f n ( t ) − ˜ f n ( t )) − ( g − m ( t ) − ˜ g m ( t )) | > C n,m (cid:9) . On a probability one set the second summands on the last two upper bounds eventuallyvanish. Hence, on that probability one set, (31) eventually holds. • We will apply Lemma 3.3 to the cases in which f = F − and g = G − and when f isthe identity and g = F G (see Section A in the Appendix for the analysis of the Lipschitzcondition on F G ).We end the section with the proof of the two-sample analogue of Theorem 1.2. Proof of Theorem 1.3.

By taking subsequences we can assume nn + m → λ ∈ (0 , ℓ { t ∈ ˜Γ : F − n ( t ) > G − m ( t ) } w → ℓ { t ∈ ˜Γ : B ( t ) > } , which, using the approximation in Theorem B.1 and Lemma 3.3, will hold if ℓ (cid:8) t ∈ ˜Γ , t + B Fn ( t ) √ n > F G ( t + B Gm ( t ) √ m ) + C n,m (cid:9) w → ℓ { t ∈ ˜Γ : B ( t ) > } (32)and ℓ (cid:8) t ∈ ˜Γ : t + B Fn ( t ) √ n > F G ( t + B Gm ( t ) √ m ) − C n,m (cid:9) w → ℓ { t ∈ ˜Γ : B ( t ) > } . (33)Both terms can be handled similarly, hence we will address here only (32). First, we notethat ℓ { t ∈ ˜Γ : F G ( t ) ≤ x } = ℓ ((0 , x ] ∩ ˜Γ) , thus it deﬁnes a measure with density function I ˜Γ ( t ) , and, by the Lebesgue diﬀerentiation theorem,lim h → F G ( t + h ) − th = 1 for almost every t ∈ ˜Γ . (34)18ow, from ℓ (cid:8) t ∈ ˜Γ : t + B Fn ( t ) √ n > F G ( t + B Gm ( t ) √ m ) + C n,m (cid:9) (35)= ℓ (cid:8) t ∈ ˜Γ : q m + nn B Fn ( t ) > q m + nm B Gm ( t )( F G ( t + B Gm ( t ) √ m ) − t ) / B Gm ( t ) √ m + √ m + nC n,m (cid:9) d = ℓ (cid:8) t ∈ ˜Γ : q m + nn B F ( t ) > q m + nm B G ( t )( F G ( t + B G ( t ) √ m ) − t ) / B G ( t ) √ m + √ m + nC n,m (cid:9) , where B F and B G are independent standard Brownian bridges, (34), the expression of C n,m and dominated convergence imply convergence to ℓ (cid:8) t ∈ ˜Γ , λ − / B F ( t ) − (1 − λ ) − / B G ( t ) > (cid:9) . Finally, independence between B F and B G gives that λ − / B F ( t ) − (1 − λ ) − / B G ( t ) is ascaled Brownian bridge (it can be written as ( λ − + (1 − λ ) − ) / B ( t ), where B ( t ) is astandard Brownian bridge). Therefore the limit law in (35) is that ℓ { t ∈ ˜Γ : B ( t ) > } . • Remark 3.4

It is obvious that, for any Borel set A ⊂ [0 , B , the distribution of ℓ ( { B > } ∩ A ) is supported by [0 , ℓ ( A )]. One could conjucturethat this distribution should be also uniform on (0 , ℓ ( A )). However, a second thoughtshows that this distribution, in fact, depends on the set A and that it could even benon-continuous. It is well known (see e.g. pag. 42 in [Shorack and Wellner(1986)]) that P ( B ( t ) = 0 for a < t } ∩ A ) = ℓ ( A ) } is strictly positive. In fact, thisdistribution has two atoms: at ℓ ( A ) and at 0. • When the set ˜Γ is negligible, Theorems 1.2 and 1.3 yield convergence of Galton’s rankstatistic to the index γ ( F, G ). We investigate in this section the rate of convergence inthis result when the contact set Γ ∗ (recall Deﬁnition 1.4) is ﬁnite. The following simpleresult will be crucial in our analysis. Lemma 4.1

Assume that [ a, b ] ⊂ [0 , \ Γ ∗ is such that t − F G ( t ) > δ > for every t ∈ [ a, b ] . If nn + m → λ ∈ (0 , then, for every ε > such that a + ε G − m ( t ) (cid:9) = (cid:8) t ∈ [ a + ε, b − ε ] : F − ( t ) > G − ( t ) } . The same conclusion holds if t − F G ( t ) < − δ for every t ∈ [ a, b ] . roof: We have F − ( t ) > G − ( t ) for every t ∈ [ a, b ]. Using the representation (14), (cid:8) t ∈ [ a + ε, b − ε ] : F − n ( t ) > G − m ( t ) (cid:9) = (cid:8) t ∈ [ a + ε, b − ε ] : t + u n ( t ) √ n > F G ( t + v m ( t ) √ m ) (cid:9) . Without loss of generality we can assume that the chosen version of u n satisﬁes sup ≤ t ≤ | u n ( t ) | is a.s. bounded, and the same for v m . Then, a.s., we have that for all t ∈ [ a + ε, b − ε ],eventually t + v m ( t ) √ m ∈ [ a, b ] and therefore F G (cid:0) t + v m ( t ) √ m (cid:1) < t + v m ( t ) √ m − δ < t + u n ( t ) √ n for largeenough n and m and the result follows. The same argument ﬁxes the case t − F G ( t ) < − δ . • Now, recalling (see (6)) the notation ℓ t n,m := ℓ (cid:0) { F − n > G − m } ∩ ( t − η, t + η ) (cid:1) − ℓ (cid:0) { F − > G − } ∩ ( t − η, t + η ) (cid:1) , we obtain, as an inmediate consequence of Lemma 4.1 and Lemma 2.8 and the subsequentcomments, the following result. Corollary 4.2 If Γ ∗ = { t , . . . , t k } , k > , nn + m → λ ∈ (0 , and η > is such that { t i } = Γ ∗ ∩ ( t i − η, t i + η ) , i = 1 , . . . , k , then for s > n s ( γ ( F n , G m ) − γ ( F, G )) = n s k X i =1 ℓ t i n,m + o P (1) . The main consequence of Lemma 4.1 and Corollary 4.2 is that when Γ ∗ is ﬁnite thekey to the asymptotic behaviour of γ ( F n , G m ) is the (joint) asymptotic behaviour of ℓ t i n,m .We address this problem in this section when Γ ∗ consists of regular contact points. Wenote that these regular contact points (recall (8)) are elements of ˜Γ ∗ F . This, apparently,excludes contact points in ˜Γ ∗ G but not in ˜Γ ∗ F or points which would be regular if weexchange the roles of F and G but are not with the present deﬁnition. However, thesecases can often be handled with the same approach. To see this, observe that when Γ ∗ is ﬁnite (recall the concluding remarks in Section 2) we have that ℓ ( t ∈ A : F − ( t ) ≤ G − ( t )) = ℓ ( t ∈ A : F − ( t ) < G − ( t )) for every measurable A .If we assume further that F and G have no common discontinuity point (see Propo-sition A.2 and the more general Proposition A.3, involving just local conditions, in theAppendix), then ℓ ( t ∈ A : F − n ( t ) ≤ G − m ( t )) = ℓ ( t ∈ A : F − n ( t ) < G − m ( t )) a.s. and we seethat ℓ t n,m = − (cid:0) ℓ (cid:0) { F − n < G − m } ∩ ( t − η, t + η ) (cid:1) − ℓ (cid:0) { F − < G − } ∩ ( t − η, t + η ) (cid:1) (cid:1) = : − ˜ ℓ t m,n a.s. . Observe that ˜ ℓ t m,n is the same statistic as ℓ t n,m after exchanging the roles of the X and the Y samples. Hence, we restrict our analysis to points in Γ ∗ F . Our results hold for points in˜Γ ∗ G with obvious changes. 20e note that for every regular contact point, t , there exists η ∗ > F G ( t ) − t ) is non-null and constant on each of ( t − η ∗ , t ) and ( t , t + η ∗ ). (36)We recall from the ﬁnal comments in Section 2 that, by taking η ∗ small enough (toexclude other contact points from the interval), sgn( F G ( t ) − t ) = sgn( G − ( t ) − F − ( t )) forevery t ∈ ( t − η ∗ , t ) ∪ ( t , t + η ∗ ). Now, if (36) holds, the study of ℓ t n,m can be carriedout through the study, for η ∈ (0 , η ∗ ), of the pieces L >n,m := Z t t − η I { F − n ( s ) >G − m ( s ) } ds and R >n,m := Z t + ηt I { F − n ( s ) >G − m ( s ) } ds,L G − . For example, for a crossingpoint t such that F − < G − on ( t − η, t ) and F − > G − on ( t , t + η ), ℓ t n,m = L >n,m − R , C R ( t ) < Proof of Theorem 1.5.

We assume, for instance, that C L >

0, and r L ≥ r R , thus r = r L . The other cases can be handled similarly. We note that ℓ t n,m = L >n,m + R >n,m if C R >

0, while ℓ t n,m = L >n,m − R

0. We consider ﬁrst the case r L >

1. We set d n = ( n + m ) / r and prove next that d n L >n,m w → ℓ { y C L | y | r } . (37)To check this we note that, using (62), (30), Lemma 3.3 and (15), it is enough to provethat d n ℓ (cid:8) t ∈ I : t + B ( t ) √ n > F G ( t + B ( t ) √ m ) − C n,m (cid:9) w → ℓ { y C L | y | r } (38)and similarly with d n ℓ (cid:8) t ∈ I : t + B ( t ) √ n > F G ( t + B ( t ) √ m ) + C n,m (cid:9) , where I = [ t − η, t ].The proofs are similar, hence, we only prove (38). To ease notation we write C ( t ) for C L when t < C R when t > | t | r will mean | t | r L or | t | r R , whenever t < t >

0. Then I n t ∈I : t + B t ) √ n >F G (cid:16) t + B t ) √ m (cid:17) − C n,m o = I n t ∈I : t + B t ) √ n >t + ξ m + C ( ξ m ) | ξ m | r + o ( | ξ m | r ) − C n,m o = I { t ∈I : B , n ( t ) > √ n + m ( C ( ξ m ) | ξ m | r + o ( | ξ m | r ) − C n,m ) } , (39)where α n = p ( n + m ) /n , β m = p ( n + m ) /m , B , n ( t ) = α n B ( t ) − β m B ( t ), ξ m = (cid:0) t + B ( t ) √ m − t (cid:1) and we have used that t = F G ( t ). Denoting ξ ∗ m ( y ) = yd n + B G ( t + ydn ) √ m , the21hange of variable t = t + yd n , and (39) lead to d n ℓ (cid:8) t ∈ I : t + B ( t ) √ n > F G ( t + B ( t ) √ m ) − C n,m (cid:9) (40)= Z − d n η I (cid:8) B , n ( t + ydn ) >C (cid:0) ξ ∗ m ( y ) (cid:1)(cid:12)(cid:12) ( n + m )1 / r ( n + m )1 / rR y + ( n + m )1 / r √ m B ( t + ydn ) (cid:12)(cid:12) r + √ n + m ( o ( | ξ ∗ m ( y ) | r ) − C n,m ) (cid:9) dy. Since the Brownian bridges have continuous trajectories with probability one, theyare bounded and a.s.: sup y ∈ [ − ηd n , ξ ∗ m ( y ) ≤ sup x ∈ [0 , | B ( t ) |√ m → . inf y ∈ [ − ηd n , ξ ∗ m ( y ) ≥ − η − sup x ∈ [0 , | B ( t ) |√ m → − η. Thus, eventually, for every y < ξ ∗ m ( y ) ∈ [ − η ∗ , η ∗ ] and C (cid:0) ξ ∗ m ( y ) (cid:1) ≥ min( | C L | , | C R | ) > B and B are a.s. bounded, and also that ( n + m ) / r ( n + m ) / rR is eitherequal to one or, else, goes to inﬁnity, yield that, a.s., the order of √ n + m | ξ ∗ m ( y ) | r is | y | r or higher. Finally, the deﬁnition of C n,m allows us to conclude that there exists M > I (cid:26) y ∈ [ − ηd n , B , n ( t + ydn ) >C ( ξ ∗ m ( y )) (cid:12)(cid:12)(cid:12)(cid:12) y + ( n + m )1 / r √ m B ( t + ydn ) (cid:12)(cid:12)(cid:12)(cid:12) r + √ n + m ( o ( | ξ ∗ m ( y ) | r ) − C n,m ) (cid:27) ≤ I {− M ≤ y ≤ } . Now, if we ﬁx y C ( ξ ∗ m ( y )) (cid:12)(cid:12)(cid:12)(cid:12) y + ( n + m )1 / r √ m B ( t + ydn ) (cid:12)(cid:12)(cid:12)(cid:12) r + √ n + m ( o ( | ξ ∗ m ( y ) | r ) − C n,m ) (cid:9) → I (cid:8) B λ ( t ) >C L | y | r (cid:9) . From here, dominated convergence yields (38), hence, as noted above, (37). We note thatthe limit in (37) equals sgn( C L ) (cid:16) ( B λ ( t )) sgn( CL ) | C L | (cid:17) /r . (41)A completely similar analysis shows that d n R >n,m w → sgn( C R ) (cid:16) ( B λ ( t )) sgn( CR ) | C R | (cid:17) /r I ( r R = r )when C R > d n R >n,m vanishes in probability if r R < r L = r ). Furthermore, we are usingthe same strong approximation to handle d n L >n,m and d n R >n,m , which implies that thereis weak convergence of ( d n L >n,m , d n R >n,m ) and, consequently, of d n ℓ t n,m = d n ( L >n,m + R >n,m ).This completes the proof in the case r L ≥ r R , C L > , C R >

0. The other cases with r > r L = r R = 1 goes along the same lines, the only diﬀerence beingthat, a.s., if y C ( ξ ∗ m ( y )) (cid:12)(cid:12)(cid:12)(cid:12) y + ( n + m )1 / √ m B ( t + ydn ) (cid:12)(cid:12)(cid:12)(cid:12) + √ n + m ( o ( | ξ ∗ m ( y ) | ) − C n,m ) (cid:9) → I (cid:8) B λ ( t ) >C ∗ | y + B t √ − λ | (cid:9) and, by dominated convergence, d n L >n,m w → ℓ (cid:8) y C ∗ | y + B ( t ) √ − λ | (cid:9) = ℓ (cid:8) y − C L (cid:0) y + B ( t ) √ − λ (cid:1) , y + B ( t ) √ − λ < (cid:9) + ℓ (cid:8) y C R (cid:0) y + B ( t ) √ − λ (cid:1) , y + B ( t ) √ − λ > (cid:9) . The right side of the interval is dealt with in a similar way. In the case C L > , C R > d n R >n,m w → ℓ (cid:8) y > B λ ( t ) > − C L (cid:0) y + B ( t ) √ − λ (cid:1) , y + B ( t ) √ − λ < (cid:9) + ℓ (cid:8) y > B λ ( t ) > C R (cid:0) y + B ( t ) √ − λ (cid:1) , y + B ( t ) √ − λ > (cid:9) . Hence, d n ℓ t n,m = d n ( L >n,m + R >n,m ) w → ℓ (cid:0) y : B λ ( t ) > − C L ( y + B ( t ) √ − λ ) , y + B ( t ) √ − λ < (cid:1) + ℓ (cid:0) y : B λ ( t ) > C R ( y + B ( t ) √ − λ ) , y + B ( t ) √ − λ > (cid:1) = ( B λ ( t )) − C L + ( B λ ( t ))+ C R = T , ( t ; C L , C R ) . If C L > , C R < d n R B λ ( t ) < − C L (cid:0) y + B ( t ) √ − λ (cid:1) , y + B ( t ) √ − λ < (cid:9) + ℓ (cid:8) y > B λ ( t ) < C R (cid:0) y + B ( t ) √ − λ (cid:1) , y + B ( t ) √ − λ > (cid:9) . Therefore, d n ℓ t n,m = d n ( L >n,m − R >n,m ) w → ℓ (cid:0) y : B λ ( t ) > − C L ( y + B ( t ) √ − λ ) , y + B ( t ) √ − λ < (cid:1) − ℓ (cid:0) y > y + B ( t ) √ − λ < (cid:1) − ℓ (cid:0) y : B λ ( t ) < C R ( y + B ( t ) √ − λ ) , y + B ( t ) √ − λ > (cid:1) + ℓ (cid:0) y < y + B ( t ) √ − λ > (cid:1) = ( B λ ( t )) + C L − ( B ( t )) − √ − λ + ( B λ ( t )) − C R + ( B ( t )) + √ − λ = ( B λ ( t )) + C L + ( B λ ( t )) − C R + B ( t ) √ − λ = T , , ( t ; C L , C R ) . • Some comments are in order here. First note that, by focusing on the transform F G ,Theorem 1.5 is able to handle virtual contact points for F − and G − . As an illustrationof this claim, assume F − ( t ) < G − ( t ) ≤ G − ( t +) < F − ( t +) ( t is then a virtualcrossing point). As noted above, F G ( t ) = t in an interval ( t − η, t + η ) for η smallenough, and (7) holds with r R = r L = 1, C R ( t ) = − C L ( t ) = +1. Thus, Theorem1.5 applies and gives (42) below.The case F − ( t ) < G − ( t ) ≤ F − ( t +) < G − ( t +) (a virtual tangency point) canbe handled similarly, although it does not ﬁt exactly in the setup of Theorem 1.5. In thiscase we have that, for some small enough η, δ > F G ( t ) = t , t ∈ ( t − η, t ), F G ( t ) > t + δ , t ∈ ( t , t + η ). It is easy to see that, eventually, ℓ t n,m = ℓ { t ∈ ( t − η, t + η ) : t + u n ( t ) √ n >t , t + v m ( t ) √ m < t } . From this point one can argue as in the proof of Theorem 1.5 to obtain(43) below.We include in the following proposition these results for virtual contact points. Noticethat this proposition includes the possibility of non-continuous d.f.’s F or G . Proposition 4.3

Let t ∈ Γ ∗ ∩ (0 , , such that for some η > , ( t − η , t + η ) ∩ Γ ∗ = { t } . Then, for every small enough η > , if n/ ( n + m ) → λ ∈ (0 , as n, m → ∞ , wehave that:(i) ( virtual crossing points ) If F − ( t ) < G − ( t ) ≤ G − ( t +) < F − ( t +) , then, √ n + mℓ t n,m w → B ( t ) √ λ . (42) (ii) ( virtual tangency points ) If F − ( t ) < G − ( t ) ≤ F − ( t +) < G − ( t +) , then, √ n + mℓ t n,m w → ℓ { y : − B ( t ) √ − λ > y, − B ( t ) √ λ < y } = (cid:16) B ( t ) √ λ − B ( t ) √ − λ (cid:17) + . (43)We can easily adapt Proposition 4.3 to the case G − ( t ) < F − ( t ) ≤ F − ( t +)

Let us take t = 0 and C >

0. The cases with

C < t = 1 are similar. We handle ﬁrst the case r = 1. Then for small enough η we have ℓ ( { F − > G − } ∩ (0 , η )) = 0. We recall that ℓ n,m = ℓ (cid:0) { F − n > G − m } ∩ (0 , η ) (cid:1) . (47)We use the well-known fact that the joint law of S n +1 ( S , . . . , S n ) is the same as that theordered sample of size n of i.i.d. U (0 ,

1) r.v.’s. Thus,( X (1) , . . . , X ( n ) ) d = (cid:16) F − (cid:0) S S n +1 (cid:1) , . . . , F − ( S n S n +1 ) (cid:17) , (48)with a similar expression for the Y -sample. From (47) and (48) we see that ℓ n,m d = Z η I ( F − (cid:0) S ⌈ nt ⌉ S n +1 (cid:1) >G − (cid:0) S ⌈ mt ⌉ S m +1 (cid:1) ) dt = 1 n + m Z ( n + m ) η I (cid:8) F − ( ξ n ( y )) >G − ( ξ m ( y )) (cid:9) dy, where ξ n ( y ) := S ⌈ nn + m y ⌉ /S n +1 , and ξ m ( y ) := S ⌈ mn + m y ⌉ /S m +1 . Now (8) yields that, if y ∈ (0 , ( n + m ) η ), then I { F − ( ξ n ( y )) >G − ( ξ m ( y )) } = I { ξ n ( y ) >F G ( ξ m ( y )) } = I { ( n + m ) ξ n ( y ) > (1+ C )( n + m ) ξ m ( y )+( n + m ) o ( ξ m ( y )) } . (49)25he SLLN implies that there exists Ω , with P (Ω ) = 1, such that for every ω ∈ Ω , S n n → S m m → . Therefore, for any δ ∗ >

0, if ω ∈ Ω , eventually0 < ξ m ( y ) ≤ S ⌈ (1 − λ + δ ∗ ) y ⌉ +1 S m +1 → . (50)This and (49) show that if ω ∈ Ω , I (cid:8) F − (cid:0) ξ n ( y ) (cid:1) >G − (cid:0) ξ m ( y ) (cid:1)(cid:9) → I (cid:8) (1 − λ ) S ⌈ λy ⌉ >λ (1+ C ) S ⌈ (1 − λ ) y ⌉ (cid:9) , for every y not belonging to the countable set { j − λ : j = 0 , , . . . } ∪ { jλ : j = 0 , , . . . } .Clearly, in Ω we have lim y →∞ (1 − λ ) S ⌈ λy ⌉ λS ⌈ (1 − λ ) y ⌉ = 1 . Hence, the fact that

C > , I n (1 − λ ) S ⌈ λy ⌉ >λ (1+ C ) S ⌈ (1 − λ ) y ⌉ o = 0 for largeenough y . This shows that R ∞ I n (1 − λ ) S ⌈ λy ⌉ >λ (1+ C ) S ⌈ (1 − λ ) y ⌉ o dy is an a.s. ﬁnite r.v..We will conclude (13) as soon as we prove that for every ω ∈ Ω we can applydominated convergence. To check this, notice that (50) gives that, for m large enough, I { ( m + n ) ξ n ( y ) > ( m + n )(1+ C ) ξ m ( y )+( m + n ) o ( ξ m ( y )) } (51) ≤ I { ( m + n ) ξ n ( y ) > (1+ C/ m + n ) ξ m ( y ) } . Now, for every ω ∈ Ω , there exist a natural number and a positive real numberdepending on ω , N ( ω ) and Y ( ω ), such that, if n ≥ N ( ω ) then both S n +1 /n and S m +1 /m are close to one, and, if we take y ≥ Y ( ω ), then, both S ⌈ nn + m y ⌉ nn + m and S ⌈ mn + m y ⌉ mn + m are close to y .This completes the proof for the case r = 1, since (51) gives that, for all n ≥ N ( ω ), I { ξ n ( y ) > (1+ C ( ξ m ( y )))( m + n ) ξ m ( y )+( m + n ) o ( ξ m ( y )) } ≤ I [0 ,Y ( ω )] . For the case r >

C > T r,r (0; C, C ) isa.s. ﬁnite (this follows, for instance, from the fact that, a.s., W ( y ) /y → y → ∞ ).Now ℓ n,m has the same expression as in (47). We will use the same notation as in Lemma3.3. First, we have that ℓ n,m = ℓ n t ∈ (0 , η ) : F − (cid:0) t + u n ( t ) √ n (cid:1) > G − (cid:0) t + v m ( t ) √ m (cid:1)o = ℓ n t ∈ (0 , η ) : t + u n ( t ) √ n > F G (cid:0) t + v m ( t ) √ m (cid:1)o . Therefore, if we take f equal to the identity and g = F G in Lemma 3.3, we only need toshow that d n ℓ (cid:16)n t + B Fn ( t ) √ n > ˜ F G ( t ) − L n o ∩ (0 , η ) (cid:17) → w ℓ (cid:8) y ∈ (0 , ∞ ) : W ( y ) > ( λ (1 − λ )) / C (0) y r (cid:9) , (52)26nd similarly for d n ℓ (cid:16)n t + B Fn ( t ) √ n > ˜ F G ( t ) + L n o ∩ (0 , η ) (cid:17) , where, now, d n = ( n + m ) r − (since t + B Gm ( t ) √ m can take negative values, we take F G ( t ) = F G (0) for t <

0; notice that t + B Gm ( t ) √ m → t >

0, hence, this assumption has no eﬀect in the limit) . The proofs are thesame, thus we only consider (52).We can assume, without loss of generality that B Fn ( t ) = d − / n ( W F ( d n t ) − tW F ( d n )),and B Gm ( t ) = d − / n ( W G ( d n t ) − tW G ( d n )), 0 ≤ t ≤ W F , W G independent Brownianmotions. Thus, the change of variable t = y/d n and the fact that p ( n + m ) d n = d rn give d n ℓ n,m = d n Z η I (cid:16) α n B Fn ( t ) >β m B Gm ( t )+ √ n + mC (cid:0) t + B Gm ( t ) √ m (cid:1)(cid:12)(cid:12) t + B Gm ( t ) √ m (cid:12)(cid:12) r + √ n + m (cid:0) o (cid:0)(cid:12)(cid:12) t + B Gm ( t ) √ m (cid:12)(cid:12) rR (cid:1) − L n,m (cid:1)(cid:17) dt = Z d n η I (cid:16) α n (cid:0) W F ( y ) − y W F ( d n ) d n (cid:1) >β m (cid:0) W G ( y ) − y W G ( d n ) d n (cid:1) + C (( ξ n ( y )) d rn | ξ n ( y ) | r + d rn (cid:0) o ( | ξ n ( y ) | r ) − L n,m (cid:1)(cid:17) dy, where α n = (( m + n ) /n ) / , β m = (( m + n ) /m ) / and ξ n ( y ) = yd n + √ m B Gm ( yd n ).As it is well known, there exists Ω ∈ σ , with P (Ω ) = 1 such that, if ω ∈ Ω , then, W i is continuous, W i ( x ) /x →

0, as x → ∞ , i = F, G and the set (cid:8) y : λ − / W F ( y ) = (1 − λ ) − / W G ( y ) + C (0) y r (cid:9) has Lebesgue measure zero. If we ﬁx ω ∈ Ω , then, we have thatsup y ∈ [0 ,d n η ] | ξ n ( y ) | ≤ η + 1 √ md n sup y ∈ [0 ,d n η ] (cid:12)(cid:12)(cid:12)(cid:12) W F ( y ) − y W F ( d n ) d n (cid:12)(cid:12)(cid:12)(cid:12) → η, and we can conclude that, eventually, { ξ n ( y ) : y ∈ [0 , d n η ] } ⊂ [0 , η ∗ ], and, consequently,from an index onward, inf y ∈ [0 ,d n η ] C ( ξ n ( y )) ≥ inf h ∈ [0 ,η ∗ ] | C ( h ) | > . On the other hand, wehave d rn L n,m → d rn | ξ n ( y ) | r = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y + β m W G ( y ) − y W G ( d n ) d n d r − n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r = y (1 + o (1)) → ∞ , as y → ∞ . Therefore, there exists a constant M (which possibly depends on the chosen ω ) such that I (cid:18) α n (cid:0) W F ( y ) − y W F ( d n ) d n (cid:1) >β m (cid:0) W G ( y ) − y W G ( d n ) d n (cid:1) + C ( ξ n ( y )) d rn | ξ n ( y ) | r + d rn (cid:0) o ( | ξ n ( y ) | r ) − L n,m (cid:1) (cid:19) ≤ I (cid:8) ≤ y ≤ M (cid:9) , for every large enough n . Moreover, I (cid:26) α n (cid:18) W F ( y ) − y W F ( d n ) d n (cid:19) >β m (cid:18) W G ( y ) − y W G ( d n ) d n (cid:19) + C ( ξ n ( y )) d rn | ξ n ( y ) | r + d rn (cid:0) o ( | ξ n ( y ) | r ) − L n,m (cid:1) (cid:27) → I { λ − / W F ( y ) − (1 − λ ) − / W G ( y ) >C (0) y r } . ω , d n ℓ (cid:8) ( ˜ F − n > ˜ G − m − L n,m ) ∩ (0 , η ) (cid:9) → ℓ (cid:8) y ∈ (0 , ∞ ) : λ − / W F ( y ) − (1 − λ ) − / W G ( y ) > C (0) y r (cid:9) The fact that ( λ (1 − λ )) / ( λ − / W F ( y ) − (1 − λ ) − / W G ( y )) is a standard Brownianmotion yields (52). • We prove next Theorem 1.7, a global asymptotic result for Galton’s statistic underthe assumption of a ﬁnite contact set consisting of regular contact points. From a tech-nical point of view the main issue here is to prove asymptotic independence between thelocalized statistics around central and extremal contact points.

Proof of Theorem 1.7:

From Corollary 4.2 it is enough to prove that ( n + m ) r ( ℓ t i n,m ) ≤ i ≤ k converges weakly. This follows trivially if Γ ∗ ⊂ (0 ,

1) after checking that the strong ap-proximation used in the proof of Theorem 1.5 allows to deal with all the ℓ t i n,m simulta-neously. Hence, it suﬃces to prove asymptotic independence among ℓ n,m , ( ℓ t i n,m ) i : t i ∈ (0 , and ℓ n,m when 0 or 1 (or both) are contact points. Let us assume, for instance, thatΓ ∗ = { < t · · · < t s < } and set A n = ( n + m ) r ℓ n,m , B n = ( n + m ) r ( ℓ t i n,m ) ≤ i ≤ s and C n = ( n + m ) r ℓ n,m . We have that there exist A, B, C such that A n w → A , B n w → B , C n w → C . Assume ( ˜ A, ˜ B, ˜ C ) is a random vector with ˜ A, ˜ B, ˜ C independent, ˜ A d = A, ˜ B d = B and ˜ C d = C and consider ( ˜ A n , ˜ B n , ˜ C n ), with the same properties with respect ( A n , B n , C n ). A n is a function of the smallest ⌈ ηn ⌉ elements in the X sample and the smallest ⌈ ηm ⌉ elements in the Y sample. Similarly, B n and C n are functions of the central and upperorder statistis. If d T V denotes the distance in total variation, then there exists a universalconstant

H > d T V ( L ( A n , B n , C n ) , L ( ˜ A n , ˜ B n , ˜ C n )) ≤ H h η (1 − t − η ) t − η + η ( t s + η )1 − t s − η i / for small enough η (this follows from Theorem 4.2.9 and Lemma 3.3.7 in [1]). If ρ denotesthe Prokhorov metric, then the fact that ρ ( µ , µ ) ≤ d T V ( µ , µ ) implies ρ ( L ( A n , B n , C n ) , L ( ˜ A n , ˜ B n , ˜ C n )) ≤ H h η (1 − t − η ) t − η + η ( t s + η )1 − t s − η i / . We prove now that ( A n , B n , C n ) w → ( ˜ A, ˜ B, ˜ C ). Obviously ( ˜ A n , ˜ B n , ˜ C n ) w → ( ˜ A, ˜ B, ˜ C ).Having weakly convergent components, ( A n , B n , C n ) is tight. To complete the proof it suf-ﬁces to show that for any weakly convergent subsequence ( A n ′ , B n ′ , C n ′ ) w → γ , necessarily γ = L ( ˜ A, ˜ B, ˜ C ). To check this, we observe that, since ρ metrizes the weak convergence,we have ρ ( γ, L ( ˜ A, ˜ B, ˜ C )) ≤ H h η (1 − t − η ) t − η + η ( t s + η )1 − t s − η i / . (53)28ow, using Corollary 4.2 we see that we can repeat the argument leading to (53) forevery small enough η . Hence, ρ ( γ, L ( ˜ A, ˜ B, ˜ C )) = 0. This completes the proof. • We provide here some simple examples that illustrate the diﬀerent limiting distributionsfor ℓ t n,m that result from Theorems 1.5 and 1.6. Later we give simple suﬃcient conditionsunder which extremes have no inﬂuence on the asymptotic behaviour of γ ( F n , G m ) andgive a simpliﬁed version of Theorem 1.7 under the assumption that F and G have regulardensities (Theorem 4.9). Finally, we consider the case of ﬁnitely supported distributions(Theorem 4.10). Example 4.4

In this example G ( t ) = t (the uniform law on (0 , r > F − ( t ) = + sgn( t − ) | t − / | r , 0 ≤ t ≤

1. Now we have F ( x ) = + sgn( x − ) | x − | /r , − r ≤ x ≤ + r , F G = F and F G ( ) = . Thus, is a contact point. If r < F ′ G ( t ) = r | t − | r − . In particular, F G is Lipsichitz in aneighbourhood of . We easily check that ∆( h ) = − h + sgn( h ) | h | /r = − h + o ( h ), thatis, is an isolated regular contact point (a crossing point) with intensities r L = r R = 1and constants C R = − C L = −

1. We can apply Theorem 1.5 to conclude that( n + m ) / ℓ t n,m w → B ( ) √ λ . If r > F ′ G ( ) = + ∞ and F G is not Lipschitz around the contact point. However,following the reasoning after Corollary 4.2, we have that ℓ t n,m = − ˜ ℓ t m,n and we can handlethis case exchanging the roles of the F and G samples and studying G F ( t ) = F − ( t ).Now G ′ F ( t ) = r | t − | r − and G F is Lipschitz in a neighbourhood of . Furthermore,∆( h ) = − h + sgn( h ) | h | r = − h + o ( h ). Thus we can, again, apply Theorem 1.5 to ˜ ℓ t m,n (with r L = r R = 1 , C R = − C L = −

1) and conclude that ( n + m ) / ˜ ℓ t m,n w → B ( ) √ − λ . Hence,for r > n + m ) / ℓ t n,m w → − B ( ) √ − λ . • Example 4.5

Now F denotes the d.f. of the uniform law on (0 ,

1) and G − ( t ) = t +sgn( t − ) | t − / | r , 0 ≤ t ≤

1. As before, is a contact point. For r ≥ F G = G − is diﬀerentiable, with F ′ G ( t ) = 1 + r | t − / | r − . We have ∆( h ) = sgn( h ) | h | r , that is,Theorem 1.5 can be applied here with r L = r R = r , C R = − C L = 1. Thus, for r = 1 weget ( n + m ) / ℓ t n,m w → B ( ) √ λ + B ( ) √ − λ , r > n + m ) / r ℓ t n,m w → (( B λ ( )) + ) /r − (( B λ ( )) − ) /r . The case 0 < r < • Example 4.6

Here we consider a Student’s t location model. Let F = F ν be a t -distribution with ν > G ( x ) = G ν ( x ) = F ν ( x − µ ), for some µ >

0. Obviously, in this case F − (0) = G − (0) = −∞ and F G (0) = 0. We write f v for the density of F ν . To ease notation, we set s = ( ν + 1) /

2, write K for a non-nullgeneric constant which can change from line to line (in particular, f ν ( t ) = K ( ν + t ) − s )and f ( x ) ≈ g ( x ) when f ( x ) g ( x ) → x → x .Using l’Hˆopital’s rule we see that F ν ( t ) ≈ Kt − s +1 as t → ∞ and, as a consequence, F − ν ( h ) ≈ Kh / (1 − s ) as h → F ′ G ( h ) = f ν ( F − ν ( h ) + µ ) f ν ( F − ν ( h )) = ( ν + ( F − ν ( h ) + µ ) ) − s ( ν + ( F − ν ( h )) ) − s → , as h → . (54)Some simple but tedious computations give that F ′′ G ( h ) ≈ K µ ( F − ν ( h )) s + O (cid:16) ( F − ν ( h )) s − (cid:17) ν + ( F − ν ( h )) ;therefore, F ′′ G ( h ) ≈ K ( F − ν ( h )) s − . Consequently, F ′′ G ( h ) ≈ Kh (2 s − / (1 − s ) . Now, ap-plying l’Hˆopital’s rule twice we get that ∆( h ) ≈ Kh s − − s +2 = Kh s s − = Kh ν +1 ν , thatis, ∆( h ) = Kh ( ν +1) /ν + o ( h ( ν +1) /ν ) for some K = 0 as h → . We see from (54) that F ′ G is bounded. Hence F G is Lipschitz and Theorem 1.6 can beapplied here with r R = ν +1 ν to obtain( n + m ) νν +2 ℓ n,m w → T ν +1 ν , ν +1 ν (0; K, K )for any ν > • Example 4.7

Let F (resp. G ) be centered (resp. with mean µ >

0) normal distributionswith common variance σ . Let f denote the density function of F . Now, F G ( t ) = F ( F − ( t ) + µ ), t ∈ [0 ,

1] and F ′ G ( t ) = f ( F − ( t ) + µ ) f ( F − ( t )) = e − (2 µF − ( t )+ µ ) / σ → ∞ , as t → . F G is not Lipschitz in a neighbourhood of 0. However, we can use the factthat ℓ t n,m = − (cid:0) R η I ( F − n ( t ) ≤ G − m ( t )) dt − R η I ( F − ( t ) ≤ G − ( t )) dt (cid:17) = − (cid:0) R η I ( F − n ( t )

1. UsingTheorem 1.6 we conclude that( n + m ) ℓ t n,m w → λ (1 − λ ) R ∞ I (cid:8) − (1 − λ ) S ⌈ λy ⌉ > (cid:9) dy = λ (1 − λ ) R ∞ I (cid:8) (1 − λ ) S ⌈ λy ⌉ < (cid:9) dy = 0 , since, a.s., S i > i ≥

1. Thus the rate of convergence in this example is fasterthan ( n + m ) − . • We explore now some consequences of Theorem 1.7. If the extremal contact pointshave a non-null contribution to the limiting distribution, then this cannot be normal.We pay now attention to obtaining conditions under which √ n + mℓ n,m vanishes (andsimilarly for the upper extreme). The special attention to the rate √ n + m is due to thefact that it is the only one which can result in a normal limit. Of course, Theorem 1.6provides some answer to this problem, but we will give here simpler suﬃcient conditions.If the supports of F and G are bounded andlim inf | F − ( t ) − G − ( t ) | > t → t → − , (55)then ℓ n,m and ℓ n,m can be dealt with as in Lemma 4.1 to see that they eventually vanish.Note that, in the case of non-bounded support, (55) does not exclude that 0 or 1 couldbe contact points (recall Example 4.7). For this case the following criterion on the tailscan be useful to guarantee asymptotic negligibility of ℓ n,m and ℓ n,m in presence of innercontact points: Z (0 ,ε ) ∪ (1 − ε, (cid:16) p t (1 − t ) f ( F − ( t )) (cid:17) p dt < ∞ and Z (0 ,ε ) ∪ (1 − ε, (cid:16) p t (1 − t ) g ( G − ( t )) (cid:17) p dt < ∞ , (56)for some p > ε > F − ( t ) − G − ( t )) > δ > , η ) ⊂ (0 , ε ). We then focus on theintegral R η I { F − n ( t ) − G − m ( t ) ≤ } dt , noting that( F − − G − > δ ) ∩ ( F − n − G − m ≤ ⊂ ( | F − n − F − | > δ/ ∪ ( | G − m − G − | > δ/ n / Z η I { F − n ( t ) − G − m ( t ) ≤ } dt ≤ n / (cid:18)Z η I {| F − n ( t ) − F − ( t ) |≥ δ/ } dt + Z η I {| G − m ( t ) − G − ( t ) |≥ δ/ } dt (cid:19) ≤ n − p − ( δ/ p (cid:18)Z η |√ n ( F − n ( t ) − F − ( t )) | p dt + Z η |√ n ( G − m ( t ) − G − ( t )) | p dt (cid:19) p → , where the last convergence follows from the fact that by (56) and Theorem 5.3, p. 46 in[Bobkov and Ledoux(2016)], the integrals in parentheses are stochastically bounded.Now, we are ready for a general result for probabilities with smooth densities, f and g .Assuming enough diﬀerentiability, we write h ( t ) = F G ( t ) − t (the function used to obtain(44)) and, for any k ∈ N , deﬁne the setsΓ k := (cid:8) t ∈ Γ : h j ) ( t ) = 0 , j = 0 , . . . , k − h k ) ( t ) = 0 (cid:9) . Notice that the set Γ k is the set of contact points with intensity k and let k := k F,G = sup { k : Γ k = ∅} . For points in Γ k the derivatives of h can be easily related of thederivatives of f and g , as follows. Lemma 4.8 If t ∈ Γ k for some k ≥ , and we denote x = F − ( t ) , then, h k ) ( t ) =  f ( x ) g ( x ) − , if k = 1 f k − ( x ) − g k − ( x ) f k ( x ) if k > , with f ( x ) = g ( x ) in the ﬁrst case and f k − ( x ) = g k − ( x ) in the second one. Combining (44), (45) and (46) with the above considerations we obtain the followingversion of Theorem 1.7.

Theorem 4.9

Assume that F and G have positive densities f and g on possibly un-bounded intervals which are k times continuously diﬀerentiable. Assume further that theset of contact points is ﬁnite with maximal intensity k and that condition (55) holds.Suppose in addition that either the supports are bounded or that condition (56) is sat-isﬁed. Then, if B and B are independent Brownian bridges, and n, m → ∞ with nn + m → λ ∈ (0 , ,(i) if k = 1 and x i = F − ( t i ) , ( n + m ) / ( γ ( F n , G m ) − γ ( F, G )) w → X t i ∈ Γ (cid:16) g ( x i ) | f ( x i ) − g ( x i ) | B ( t i ) √ λ + f ( x i ) | f ( x i ) − g ( x i ) | B ( t i ) √ − λ (cid:17) , ii) if k ≥ is odd ( n + m ) k ( γ ( F n , G m ) − γ ( F, G )) w → X t i ∈ Γ k (cid:16) k ! | h k ( t i ) | (cid:17) /k (cid:0) (( B λ ( t i )) /k ) + − (( B λ ( t )) /k ) − (cid:1) , (iii) if k is even ( n + m ) k ( γ ( F n , G m ) − γ ( F, G )) w → X t i ∈ Γ k sgn( h k ) ( t i ))2 (cid:16) k ! | h k ( t i ) | (cid:17) /k (cid:0) ( B λ ( t i )) sgn ( h k ( t i )) (cid:1) /k . We see from Theorem 4.9 that asymptotic normality (arguably, the most useful casefor statistical applications) holds, with the standard √ n + m rate, only when F and G have a ﬁnite number of ‘simple’ crossings. In all the other cases we get a slower rate anda nonnormal limit.While Theorem 1.7 (hence, also Theorem 4.9) involves only the case when Γ ∗ = Γ ∗ F consists of regular contact points, the comments about virtual contact points between F G and the identity that led to (43) apply to the global analysis of γ ( F n , G m ). As an importantexample, we consider the case when F and G are ﬁnitely supported. More precisely,let us assume F and G have a ﬁnite support x < x < · · · < x k , with probabilities p , p , . . . , p k and q , q , . . . , q k , respectively, with p i + q i > p i or q i could benull), i = 1 , . . . , k . We set P i := P ij =1 p j and Q i := P ij =1 q j , i = 1 , . . . k −

1. Then, F G ( t ) = P i for t ∈ ( Q i − , Q i ]. Hence, the only possible inner contact points are P i , Q i , i = 1 , . . . , k − P i if Q i − < P i < Q i ), vertical crossings ( Q i if Q i < P i < Q i +1 ), upper tangency points ( Q i if Q i − < Q i = P i < P i +1 ) or lower tangency points ( P i if P i − < P i = Q i − < Q i ), usingthe same terms as in the discussion following Proposition 4.3. Combining that discussionwith Corollary 4.2 we obtain the following consequence. Theorem 4.10

With the above notation, if F and G are ﬁnitely supported and H , V , U and L denote, respectively, the sets of horizontal crossing, vertical crossing, upper tan-gency and lower tangency points for F and G , then, assuming that nn + m → λ ∈ (0 , , √ n + m ( γ ( F n , G m ) − γ ( F, G )) w → X t ∈H B ( t ) √ λ − X t ∈V B ( t ) √ − λ + X t ∈U (cid:16) B ( t ) √ λ − B ( t ) √ − λ (cid:17) + − X t ∈L (cid:16) B ( t ) √ λ − B ( t ) √ − λ (cid:17) − , where B and B are independent Brownian bridges. Similar to Theorem 1.5, we get a Gaussian limiting distribution only when all thecontact points are crossing points (which, necessarily, have orders r L = r R = 1). In the33ase F = G we have Q i − < Q i = P i < P i +1 for all i , that is, every P i is an upper tangencypoint and Theorem 4.10 yields √ n + mγ ( F n , G m ) w → k − X i =1 (cid:16) B ( P i ) √ λ − B ( P i ) √ − λ (cid:17) + . (57)Of course, using the fact that √ − λB −√ λB is a Brownian bridge, we can, equivalently,write (57) as p nmn + m γ ( F n , G m ) w → P k − i =1 ( B ( P i )) + . References [ ´Alvarez-Esteban et al.(2017)] ´Alvarez-Esteban, P.C.; del Barrio, E.; Cuesta-Albertos,J.A. and Matr´an, C. (2017). Models for the assessment of treatment improvement:the ideal and the feasible.

Statist. Sci. , , 469–485.[Aly (1986)] Aly, Emad-Eldin A.A. (1986). Strong Approximations of the Q-Q Process, J. Multiv. Analysis , 114–128.[Aly et al.(1987)] Aly, Emad-Eldin A.A.; Cs¨orgo, M. and Horvath, L. (1987). P-P Plots,Rank Processes and Chernoﬀ-Savage Theorems. In New Perspectives in Theoreticaland Applied Statistics (M.L. Puri, J.P. Vilaplana and W. Wertz Eds.), 135–156.Wiley, New York[Behnen and Neuhaus (1983)] Behnen, K. and Neuhaus, G. (1983). Galton tests as lin-ear rank tests with estimated scores and its local asymptotic eﬃciency,

Ann.Statist. (2), 588–599.[Billingsley (1968)] Billingsley, P. (1968). Convergence of Probability Measures.

Wiley.[Bobkov and Ledoux(2016)] Bobkov, S. and Ledoux, M. (2016). One-dimensional empir-ical measures, order statistics and Kantorovich transport distances.

Memoirs Am.Math. Soc.

Vol.: 261, Number 1259.[Chung and Feller (1949)] Chung, K. L., and Feller, W. (1949). On ﬂuctuations in coin-tossing.

Proc. Nat. Acad. Sci. of USA , , 605–608.[Cs´aki and Vincze (1961)] Cs´aki, E. and Vincze, I. (1961). On some problems connectedwith the Galton-test. Publ. Math. Inst. Hungar. Acad. Sci. , 97–109[Cs¨orgo and Horvath (1993)] Cs¨orgo, M. and Horvath, L. (1993). Weighted approxima-tions in probability and statistics.

Wiley.34Darwin (1876)] Darwin, C. (1876).

The eﬀect of Cross- and Self-fertilization in the Veg-etable Kingdom . John Murray.[Doksum (1974)] Doksum, K. (1974). Empirical probability plots and statistical inferencefor nonlinear models in the two-sample case.

Ann. Statist. (2), 267–277.[Feller(1968)] Feller, W. (1968). An Introduction to Probability Theory and its Applica-tions Vol. I (Third edition) . Wiley.[Gross and Holland(1968)] Gross, S., and Holland, P. W. (1968). The Distribution ofGalton Statistic.

Ann. Math. Statist., (6), 2114–2117.[Hodges(1955)] Hodges, J.L. (1955). Galton rank-order test. Biometrika , 261–262.[Lehmann(1955)] Lehmann, E.L. (1955). Ordered families of distributions. Ann. Math.Statist. , 399–419.[L´evy(1939)] L´evy, P. (1939). Sur certains processus stochastiques homog´enes. CompositioMath. , 283–339.[1] Reiss, R.D. (1989). Approximate Distributions of Order Statistics With Applicationsto Nonparametric Statistics.

Springer.[Shorack and Wellner(1986)] Shorack, J.R. and Wellner, J.A. (1986).

Empirical Processeswith Applications to Statistics . John Wiley and Sons, New York.[Sparre-Andersen (1953)] Sparre-Andersen, E. (1953). On the ﬂuctuations of sums of ran-dom variables.

Math. Scand. , 263–285.[Zhuang et al (2019)] Zhuang, W.W., Hu, B.Y., and Chen, J. (2019). Semiparametricinference for the dominance index under the density ratio model. Biometrika , ,1, 229–241 Appendix.

A On the composite map F G We collect here some useful fact about the transform F G (and G F ). At some points wehave used the fact that, as a consequence of (18), for every measurable A ⊂ [0 , ℓ { t ∈ A : t > F G ( t ) } = ℓ { t ∈ A : F − ( t ) > G − ( t ) } ,ℓ { t ∈ A : U n ( t ) > F G ( V m ( t )) } = ℓ { t ∈ A : F − n ( t ) > G − m ( t ) } . (58)35ooking at (58), the corresponding statement for G F would be ℓ { t ∈ A : V m ( t ) > G F ( U n ( t )) } = ℓ { t ∈ A : F − n ( t ) < G − m ( t ) } . This shows that we can base our analysis indistinctly using G F or F G , and, in particular,to study ˜ ℓ t m,n instead of ℓ t n,m , (recall the discussion after Corollary 4.2) ifi) ℓ { t ∈ ( t − η, t + η ) : F − ( t ) = G − ( t ) } = 0 , ii) P ( { ℓ { t ∈ ( t − η, t + η ) : F − n ( t ) = G − m ( t ) } > } inﬁnitely often) = 0 , iii) P ( ℓ { t ∈ ( t − η, t + η ) : V m ( t ) = G F ( U n ( t )) } >

0) = 0 , andiv) P ( ℓ { t ∈ ( t − η, t + η ) : U n ( t ) = F G ( V m ( t )) } >

0) = 0hold. When t is an isolated contact point then i) is satisﬁed. The other relations can beeasily guaranteed taking into account the next lemma and its consequences. Lemma A.1

Let

X, Y be independent r.v.’s with respective d.f.’s F and G . Then P ( X = Y ) = 0 if and only if F and G have no common discontinuity point. Therefore, if F and G have no common discontinuity point, the samples { X , . . . , X n } and { Y , . . . , Y m } are a.s. disjoint. Since these samples are the images of F − n and G − m respectively, the set { F − n = G − m } must be a.s. empty. On the contrary, if there exists acommon discontinuity point, x , for F and G , then P (cid:0) ℓ { F − n = G − m } > (cid:1) ≥ P (cid:0) ℓ { F − n = G − m } = 1 (cid:1) = P (cid:0) X = x ) n P (cid:0) Y = x ) m > . This proves the following proposition.

Proposition A.2

Let F − n and G − m be the sample quantile functions based on indepen-dent samples of i.i.d. r.v.’s from the d.f.’s F and G . Then P ( ℓ { F − n = G − m ) } > > for some n, m if and only if F and G have a common discontinuity point. Elaborating on the same ideas, it easily follows the following summarizing proposition.

Proposition A.3

Relations iii) and iv) above always hold. Moreover, if F and G do nothave common discontinuity points on the set [ F − ( t − η ) , F − ( t + η )] , then ii) holdsfor every η ∈ (0 , η ) . Proposition A.25 in [Bobkov and Ledoux(2016)] provides simple necessary and suﬃ-cient conditions under which a quantile function is Lipschitz. We exploit that character-ization to give here necessary and suﬃcient conditions under which F G is Lipschitz.36 roposition A.4 The transform F G is Lipschitz if and only if F − is increasing on [ F G (0) , F G (1)] and supp( F ) ∩ ( G − (0) , G − (1)) ⊂ supp( G ) (59) and there exists some δ > such that lim sup y → x,y>x G ( F − ( y ) − ) − G ( F − ( x ) − ) y − x > δ a.e. on supp( F ) . Condition (59) can be equivalently stated as F is continuous and G increasing on supp( F ) ∩ ( G − (0) , G − (1)) . (60) Proof.

We set H − ( t ) = F G ( t − ), t ∈ (0 ,

1) and note that H − is left-continuous, hencea quantile function, and also that H − is Lipschitz if and only if F G is Lipschitz. Wewrite H for the associated d.f., namely, H ( x ) = ℓ { t : F G ( t − ) ≤ x } = ℓ { t : F G ( t ) ≤ x } . By Proposition A.25 in [Bobkov and Ledoux(2016)] H − is Lipschitz if and only ifthe associated probability is supported in a ﬁnite interval and its absolutely continuouscomponent has a density separated from zero on that interval.Since the support of the law L ( H − ) is contained in [0 , H is strictly increasing on [ F G (0) , F G (1)] (see Proposition A.7 in [Bobkov and Ledoux(2016)]).This, in turn, is equivalent to (59) and to (60).In fact, to check the equivalence to (59), we note that if H is increasing on [ F G (0) , F G (1)]then for every a, b ∈ ( F G (0) , F G (1)), a

0, which holdsif and only if ℓ { t : F G ( t ) ∈ [ a, b ) } > a, b ∈ ( F G (0) , F G (1)), a

0. This implies that F − must be increasingon ( F G (0) , F G (1)) . Moreover, if x ∈ ( G − (0) , G − (1)) \ supp( G ) , then x ∈ ( G − ( t ∗ ) , G − ( t ∗ +)) for some t ∗ ∈ (0 , x ∈ supp( F ), taking δ > x − δ, x + δ ) ⊂ ( G − ( t ∗ ) , G − ( t ∗ +)), we would have t ∗ = G ( x − δ ) = G ( x + δ ). Thus, G − ( t ∗ ) < x − δ x H ( y ) − H ( x ) y − x > δ a.e. on supp( F )37or some δ >

0. This completes the proof. • Remark A.5

Our analysis of the local behaviour of Galton’s statistic around a contactpoint, t , required F G to be Lipschitz on a neighbourhood ( t − η, t + η ). This can becharacterized in the same way as (59) and (60). Without trying to give the best possibleresult, this can be guaranteed, e.g., if G − is increasing and continuous on ( t − η, t + η )except perhaps at t , F is continuous on G − (( t − η, t + η )), and the derivatives of F and G (that exist almost everywhere) satisfy ess inf { G ′ ( x ) F ′ ( x ) , x ∈ G − (( t − η, t + η )) ∩ supp( F ) } > B Approximation of uniform quantile processes

The following result has been used extensively in this paper. It is a consequence of areﬁned version of the Komlos-Major-Tusnady construction for the quantile process (see,e.g., Theorem 3.2.1, p. 152 in [Cs¨orgo and Horvath (1993)]), from which we know thatthere exists a sequence of Brownian bridges on [0 , { B n } , versions of u n and positiveconstants, C , C and C , such that P n sup ≤ t ≤ | u n ( t ) − B n ( t ) | > x + C log n √ n o ≤ C e − C x , x > . (61)Making use of this construction for both quantile processes and taking x = aC log n with a > K = aC + C >

0, we obtain useful independent sequences of Brownian bridges { B Fn } , { B Gm } and versions of u n and v m . Theorem B.1

With the previous notation, in a probability one set, the sequences { B Fn } , { B Gm } , { u n } and { v m } eventually satisfy sup ≤ t ≤ | u n ( t ) − B Fn ( t ) | ≤ K log n √ n and sup ≤ t ≤ | v m ( t ) − B Gm ( t ) | ≤ K log m √ m ..