[PDF] Empirical processes indexed by estimated functions

Abstract

We consider the convergence of empirical processes indexed by functions that depend on an estimated parameter η and give several alternative conditions under which the ``estimated parameter'' η n can be replaced by its natural limit η 0 uniformly in some other indexing set Θ . In particular we reconsider some examples treated by Ghoudi and Remillard [Asymptotic Methods in Probability and Statistics (1998) 171--197, Fields Inst. Commun. 44 (2004) 381--406]. We recast their examples in terms of empirical process theory, and provide an alternative general view which should be of wide applicability.

Full PDF

aa r X i v : . [ m a t h . S T ] S e p IMS Lecture Notes–Monograph SeriesAsymptotics: Particles, Processes and Inverse Problems

Vol. 55 (2007) 234–252c (cid:13)

Institute of Mathematical Statistics, 2007DOI:

Empirical processes indexed byestimated functions

Aad W. van der Vaart1 and Jon A. Wellner2 , ∗ Vrije Universiteit Amsterdam and University of Washington

Abstract:

We consider the convergence of empirical processes indexed byfunctions that depend on an estimated parameter η and give several alterna-tive conditions under which the “estimated parameter” η n can be replacedby its natural limit η uniformly in some other indexing set Θ. In particularwe reconsider some examples treated by Ghoudi and Remillard [ AsymptoticMethods in Probability and Statistics (1998) 171–197,

Fields Inst. Commun.

1. Introduction

Let X , . . . , X n be i.i.d. random elements in a measurable space ( X , A ) with law P , and for a measurable function f : X → R let the expectation, empirical measureand empirical process at f be denoted by P f = Z f dP, P n f = 1 n n X i =1 f ( X i ) , G n f = √ n ( P n − P ) f. Given a collection { f θ,η : θ ∈ Θ , η ∈ H } of measurable functions f θ,η : X → R indexed by sets Θ and H and “estimators” η n , we wish to prove that, as n → ∞ ,(1) sup θ ∈ Θ (cid:12)(cid:12)(cid:12) G n ( f θ,η n − f θ,η ) (cid:12)(cid:12)(cid:12) → p . Here an “estimator” η n is a random element with values in H deﬁned on the sameprobability space as X , . . . , X n , and η ∈ H is a ﬁxed element, which is typicallya limit in probability of the sequence η n .The result (1) is interesting for several applications. A direct application is tothe estimation of the functional θ P f θ,η . If the parameter η is unknown, we mayreplace it by an estimator η n and use the empirical estimator P n f θ,η n . The result(1) helps to derive the limit behaviour of this estimator, as we can decompose(2) √ n ( P n f θ,η n − P f θ,η ) = G n ( f θ,η n − f θ,η ) + G n f θ,η + √ nP ( f θ,η n − f θ,η ) . If (1) holds, then the ﬁrst term on the right converges to zero in probability. Un-der appropriate conditions on the functions f θ,η , the second term on the right ∗ Supported in part by NSF Grant DMS-05-03822, NI-AID Grant 2R01 AI291968-04, and bygrant B62-596 of the Netherlands Organisation of Scientiﬁc Research NWO Section Stochastics, Department of Mathematics, Faculty of Sciences, Vrije Universiteit, DeBoelelaan 1081a, 1081 HV Amsterdam, e-mail: [email protected] University of Washington, Department of Statistics, Box 354322, Seattle, Washington 98195-4322, USA, e-mail: [email protected]

AMS 2000 subject classiﬁcations:

Keywords and phrases: delta-method, Donsker class, entropy integral, pseudo observation.234 andom empirical processes will converge to a Gaussian process by the (functional) central limit theorem. Thebehavior of the third term depends on the estimators η n , and would typicallyfollow from an application of the (functional) delta-method, applied to the map η ( P f θ,η : θ ∈ Θ).In an interesting particular case of this situation, the functions f θ,η take the form f θ,η ( x ) = θ (cid:0) η ( x ) (cid:1) , for maps θ : R d → R and each η ∈ H being a map η : X → R d . The realizationsof the estimators η n are then functions x η n ( x ) = η n ( x ; X , . . . , X n ) on thesample space X and can be evaluated at the observations to obtain the randomvectors η n ( X ) , . . . , η n ( X n ) in R d . The process { P n f θ,η n : θ ∈ Θ } is the empiricalmeasure of these vectors indexed by the functions θ . For instance, if Θ consistsof the indicator functions 1 ( −∞ ,θ ] for θ ∈ R d , then this measure is the empiricaldistribution function θ P n f θ,η n = 1 n n X i =1 { η n ( X i ) ≤ θ } of the random vectors η n ( X ) , . . . , η n ( X n ). The properties of such empirical proces-ses were studied in some generality and for examples of particular interest in Ghoudiand Remillard [6, 7]. Ghoudi and Remillard [6] apparently coined the name “pseudo-observations” for the vectors η n ( X ) , . . . , η n ( X n ). The examples include, for in-stance, regression residuals, Kendall’s dependence process, and copula processes;see the end of Section 2 for explicit formulation of these three particular examples.One purpose of the present paper is to extend the results in these papers also toother index classes Θ besides the class of indicator functions. Another purpose is torecast their results in terms of empirical process theory, which leads to simpliﬁcationand alternative conditions.A diﬀerent, indirect application of (1) is to the derivation of the asymptoticdistribution of Z -estimators. A Z -estimator for θ might be deﬁned as the solutionˆ θ n of the equation P n f θ,η n = 0, where again an unknown “nuisance” parameter η is replaced by an estimator η n . In this case (1) shows that P n f ˆ θ n ,η n − P n f ˆ θ n ,η = P ( f ˆ θ n ,η n − f ˆ θ n ,η ) + o P (1 / √ n ) , so that the limit behavior of ˆ θ n can be derived by comparison with the estimatingequation deﬁned by P n f θ,η (with η substituted for η n ). The “drift” sequence P ( f ˆ θ n ,η n − f ˆ θ n ,η ), which will typically be equivalent to P ( f θ ,η n − f θ ,η ) up toorder o P (1 / √ n ), may give rise to an additional component in the limit distribution.The paper is organized as follows. In Section 2 we derive general conditions for thevalidity of (1) and formulate several particular examples to be considered in moredetail in the sequel. In Section 3 we specialize the general results to compositionmaps. In Section 4 we combine these results with results on Hadamard diﬀeren-tiability to obtain the asymptotic distribution of empirical processes indexed bypseudo observations. Finally in Section 5 we formulate our results for several of theparticular examples mentioned above and at the end of Section 2.

2. General result

In many situations we wish to establish (1) without knowing much about the natureof the estimators η n , beyond possibly that they are consistent for some value η . A. W. van der Vaart and J. A. Wellner

For instance, this is true if (1) is used as a step in the derivation of M − or Z − estimators. (Cf. Van der Vaart and Wellner [12] and Van der Vaart [11].) Then anappropriate method of establishing (1) is through a Donsker or entropy condition,as in the following theorems. Proofs of the Theorems 2.1 and 2.2 can be found inthe mentioned references.Both theorems assume that η n is “consistent for η ” in the sense that(3) sup θ ∈ Θ P ( f θ,η n − f θ,η ) → p . Theorem 2.1.

Suppose that H is a ﬁxed subset of H such that Pr( η n ∈ H ) → and suppose that the class of functions { f θ,η : θ ∈ Θ , η ∈ H } is P -Donsker. If (3) holds, then (1) is valid. For the second theorem, let N ( ǫ, F , L ( P )) and N [ ] ( ǫ, F , L ( P )) be the ǫ -coveringand ǫ -bracketing numbers of a class F of measurable functions (cf. Pollard [8] andvan der Vaart and Wellner [12]) and deﬁne entropy integrals by J ( δ, F , L ) = Z δ sup Q q log N ( ǫ k F k Q, , F , L ( Q )) dǫ, (4) J [ ] ( δ, F , L ( P )) = Z δ q log N [ ] ( ǫ k F k P, , F , L ( P )) dǫ. (5)Here F is an arbitrary, measurable envelope function for the class F : a measurablefunction F : X → R such that | f ( x ) | ≤ F ( x ) for every f ∈ F and x ∈ X . Wesay that a sequence F n of envelope functions satisﬁes the Lindeberg condition if P F n = O (1) and P F n F n ≥ ǫ √ n → ǫ > Theorem 2.2.

Suppose that H n are subsets of H such that Pr( η n ∈ H n ) → and such that the classes of functions F n = { f θ,η : θ ∈ Θ , η ∈ H n } satisfy either J [ · ] ( δ n , F n , L ( P )) → , or J ( δ n , F n , L ) → for every sequence δ n → , relativeto envelope functions that satisfy the Lindeberg condition. In the second case alsoassume that the classes F n are suitably measurable (e.g. countable). If (3) holds,then (1) is valid. Because there are many techniques to verify that a given class of functions isDonsker, or to compute bounds on its entropy integrals, the preceding lemmas givequick results, if they apply. Furthermore, they appear to be close to best possibleunless more information about the estimators η n can be brought in, or explicitcomputations are possible for the functions f θ,η .In some applications the estimators η n are known to converge at a certain rateand/or known to possess certain regularity properties (e.g. uniform bounded deriva-tives). Such knowledge cannot be exploited in Theorem 2.1, but could be used forthe choice of the sets H n in Theorem 2.2. We now discuss an alternative approachwhich can be used if the estimators η n are also known to converge in distribution,if properly rescaled.Let H be a Banach space, and suppose that the sequence √ n ( η n − η ) convergesin distribution to a tight, Borel-measurable random element in H . The “convergencein distribution” may be understood in the sense of Hoﬀmann-Jørgensen, so that η n need not be Borel-measurable itself.The tight limit of the sequence √ n ( η n − η ) takes its values in a σ -compact subset H ⊂ H . For θ ∈ Θ, h ∈ H , and δ > F n ( θ, h , δ ) = (cid:8) f θ,η + n − / h − f θ,η + n − / h : h ∈ H, k h − h k < δ (cid:9) . andom empirical processes Let F n ( θ, h , δ ) be arbitrary measurable envelope functions for these classes. Theorem 2.3.

Suppose that the sequence √ n ( η n − η ) converges in distribution toa tight, random element with values in a given σ -compact subset H of H . Supposethat(i) sup θ | G n ( f θ,η + n − / h − f θ,η ) | → p for every h ∈ H .(ii) sup θ | G n F n ( θ, h , δ ) | → p for every δ > and every h ∈ H ;(iii) sup θ sup h ∈ K √ n P F n ( θ, h , δ n ) → for every δ n → and every compact K ⊂ H ;Then (1) is valid.Proof. Suppose that √ n ( η n − η ) ⇒ Z and let ǫ > K ⊂ H with P ( Z ∈ K ) > − ǫ and hence for every δ >

0, with K δ the set of all points at distance less than δ to K ,lim inf n →∞ Pr (cid:0) √ n ( η n − η ) ∈ K δ/ (cid:1) > − ǫ. In view of the compactness of K there exist ﬁnitely many elements h , . . . , h p ∈ K ⊂ H (with p = p ( δ ) depending on δ ) such that the balls of radius δ/ K . Then K δ/ is contained in the union of the balls of radius δ ,by the triangle inequality. Thus, with B ( h, δ ) denoting the ball of radius δ around h in the space H , (cid:8) √ n ( η n − η ) ∈ K δ/ (cid:9) ⊂ p ( δ ) [ i =1 (cid:8) η n ∈ B ( η + n − / h i , δ ) (cid:9) . It follows that with probability at least 1 − ǫ , as n → ∞ ,sup θ | G n ( f θ,η n − f θ,η ) |≤ sup θ max i sup k h − h i k <δ | G n ( f θ,η + n − / h − f θ,η ) |≤ sup θ max i sup k h − h i k <δ h | G n ( f θ,η + n − / h − f θ,η + n − / h i ) | + | G n ( f θ,η + n − / h i − f θ,η ) | i ≤ sup θ max i | G n F n ( θ, h i , δ ) | + 2 sup θ sup h ∈ K √ n P F n ( θ, h , δ )+ sup θ max i | G n ( f θ,η + n − / h i − f θ,η ) | , where in the last step we use the inequality | G n f | ≤ | G n F | + 2 √ nP F , valid for anyfunctions f and F with | f | ≤ F . The maxima in the display are over the ﬁnite set i = 1 , . . . , p ( δ ), and the elements h , . . . , h p ( δ ) ∈ K depend on δ . By assumptions(i) and (ii) the ﬁrst and third terms converge to zero as n → ∞ , for every ﬁxed δ . It follows that there exists δ n ↓ δ n substitutedfor δ converge to 0. For this δ n , all three terms converge to zero in probability as n → ∞ .The rate of convergence √ n in the preceding theorem may be replaced by anotherrate, with appropriate changes in the conditions, but the rate √ n appears natural inthe following context. For more general metrizable topological vector spaces thereare similar, but less attractive, results possible. A. W. van der Vaart and J. A. Wellner

The two conditions (i), (ii) of Theorem 2.3 concern the empirical process indexedby the classes of functions { f θ,η + n − / h − f θ,η : θ ∈ Θ } , (7) { F n ( θ, h , δ ) : θ ∈ Θ } . (8)These classes are indexed by Θ only, and hence Theorem 2.3, if applicable, avoidsconditions for (1) that involve measures of the complexity of the class { f θ,η : θ ∈ Θ , η ∈ H } due to the parameter η ∈ H .Condition (iii) of Theorem 2.3 involves the mean of the envelopes of the classes F n ( θ, h , δ ). For the minimal envelopes this condition takes the formsup θ sup h ∈ K √ n P sup k h − h k <δ n | f θ,η + n − / h − f θ,η + n − / h | → δ n ↓

0. This is an “integrated uniform local Lipschitz assumption” on thedependence η f θ,η . In some applications it may be useful not to use the minimalenvelope functions. The lemma is valid for any envelope functions, as long as thesame envelopes are used in both (ii) and (iii).The set K in (iii) or (9) is a compact set in the support of the limit distributionof the sequence √ n ( η n − η ). In some cases condition (iii) may be valid for anycompact K ⊂ H , whereas in other cases more precise information about the limitprocess must be exploited. For instance, if the sequence √ n ( η n − η ) converges indistribution to a tight zero-mean Gaussian process G in the space H = ℓ ∞ ( T ) ofbounded functions on some set T , then K may be taken to be a set of functions z : T → R that is uniformly bounded and uniformly equicontinuous relative to thesemimetric with square d ( s, t ) = E( G s − G t ) (and T will be totally bounded for d ). Cf. e.g. van der Vaart and Wellner [12], page 39.Condition (iii) is an analytical condition, whereas conditions (i) and (ii) areempirical process conditions. In many cases the latter pair of conditions can beveriﬁed by standard empirical process type arguments. For reference we quote twolemmas that allow handling the empirical process indexed by a sequence of classes,as in (8) or (7). (For proofs see e.g. van der Vaart [10, 11].) Both lemmas apply toclasses F n of measurable functions f : X 7→ R such thatsup f ∈F n P f → . (10) Lemma 2.1.

Suppose that the class of functions S n F n is P -Donsker. If (10) holds, then sup f ∈F n | G n ( f ) | → p . Lemma 2.2.

Suppose that either J [ · ] ( δ n , F n , L ( P )) → or J ( δ n , F n , L ) → forall δ n ↓ relative to envelope functions F n that satisfy the Lindeberg condition. Inthe second case also assume that each class F n is suitably measurable. If (10) holds,then sup f ∈F n | G n ( f ) | → p . Example 1 (Regression residual processes) . Suppose that ( X , Y ) , . . . , ( X n ,Y n ) are a random sample distributed according to the regression model Y = g η ( X )+ e . For given estimators η n we can form the residuals ˆ e i = Y i − g η n ( X i ) and may beinterested in the empirical process corresponding to ˆ e , . . . , ˆ e n , i.e. for a collectionΘ of functions θ : R → R we consider the process (cid:8) n − P ni =1 θ (ˆ e i ) : θ ∈ Θ (cid:9) . Thisﬁts the general set-up with the functions f θ,η deﬁned as f θ,η ( x, y ) = θ (cid:0) y − g η ( x ) (cid:1) . andom empirical processes In many cases it will be possible to apply Theorem 2.1. For instance, if x ∈ R d , g η is a polynomial in x , and Θ is the class of indicator functions 1 ( −∞ ,θ ] for θ ∈ R , thenthe functions f θ,η are the indicator functions of the sets { ( x, y ) : y − g η ( x ) − θ ≤ } .Because the set of functions ( x, y ) y − g η ( x ) − θ is contained in a ﬁnite-dimensionalvector space, it is a VC-class, and hence so are their negativity sets (e.g. van derVaart and Wellner [12], Lemma 2.6.18). Thus the class of functions f θ,η is Donsker,and Theorem 2.1 can be applied directly. Example 2 (Kendall’s process) . Let η n be the empirical distribution function ofa random sample X , . . . , X n from a distribution η on R d . Barbe, Genest, Ghoudiand Remillard [2] and Ghoudi and Remillard [6] study the behavior of the empiricaldistribution function K n of the pseudo-observations η n ( X i ), K n ( θ ) = 1 n n X i =1 { η n ( X i ) ≤ θ } , θ ∈ [0 , , and the resulting Kendall’s process(11) √ n ( K n ( θ ) − K ( θ )) , θ ∈ [0 , K ( θ ) = P ( η ( X ) ≤ θ ). This ﬁts the general set-up with f θ,η the compositionfunction f θ,η = θ ◦ η , and θ the indicator function 1 ( −∞ ,θ ] (where we abuse notationby using the symbol θ in two diﬀerent ways).An attempt to apply Theorem 2.1 to this problem would lead to the considerationof the class of all indicator functions of sets of the form { x ∈ R d : η ( x ) ≤ θ } for η ranging over the cumulative distribution functions on R d and θ ∈ [0 , R d , and, unfortunately,fails to be Donsker for most distributions (cf. Dudley [3], page 264, 373 or Dudley[4]). In this case it appears to be necessary to exploit the limit behaviour of thesequence √ n ( η n − η ). Ghoudi and Remillard [6] have shown that (1) is valid in thiscase, under some strong smoothness assumptions on the underlying measure η . InSections 4 and 5 we rederive some of their results by empirical process methodsusing Theorem 2.3.We also consider the empirical process of the variables η n ( X ) , . . . , η n ( X n ) in-dexed by classes of functions other than the indicators 1 ( −∞ ,θ ] . If the indexing func-tions are smooth, then this empirical process will converge even without smoothnessconditions on η . A proof can be based on Theorem 2.3. Example 3 (Copula processes) . Suppose that X , . . . , X n are a sample from a dis-tribution η on R d . Write X i = ( X i, , . . . , X i,d ) and let η , , . . . , η ,d be the marginaldistributions. The copula function C associated with η is the distribution functionof the vector (cid:0) η , ( X , ) , . . . , η ,d ( X ,d ) (cid:1) , i.e. with η − ,j ( u ) = inf { x : η ,j ( x ) ≥ u } for u ∈ [0 , C ( u , . . . , u d ) = η ( η − , ( u ) , . . . , η − ,d ( u d ))for ( u , . . . , u d ) ∈ [0 , d . For j = 1 , . . . , d let η n,j be the empirical distributionfunction of X ,j , . . . , X n,j (on R ), and let η n be the empirical distribution functionof X , . . . , X n (on R d ). Then a natural estimator C n of C is given by C n ( u ) = 1 n n X i =1 { ˆ e n,i ≤ u } , u ∈ [0 , d , A. W. van der Vaart and J. A. Wellner for the “pseudo-observations” ˆ e n,i = (cid:0) η n, ( X i, ) , . . . , η n,d ( X i,d ) (cid:1) . The resulting“copula processes”,(12) √ n ( C n ( u ) − C ( u )) , u ∈ [0 , d , have been considered by Stute [9], G¨anssler and Stute [5], and Ghoudi and Remil-lard [7]. This example can be treated using Theorem 2.3, but also with the morestraightforward Theorem 2.1, or even by employing the theory of Hadamard diﬀer-entiability, as in Chapter 3.9 of Van der Vaart and Wellner [12].

3. Composition

In this section we consider the case where the functions f θ,η take the form(13) f θ,η ( x ) = θ ( η ( x )) , for θ ranging over a class Θ of functions θ : R d → R and η ranging over a class H ofmeasurable functions η : X → R d . We ﬁrst give general conditions for the validityof condition (iii) of Theorem 2.3, and next consider also the conditions (i) and (ii)for the special cases of functions θ that are Lipschitz and monotone, respectively.We develop these results for the case that the sequence √ n ( η n − η ) converges indistribution in the space H = ℓ ∞ ( X , R d ) of uniformly bounded functions z : X → R d , equipped with the uniform norm k z k = sup x ∈X (cid:13)(cid:13) z ( x ) (cid:13)(cid:13) . (Variations of theseresults are possible. For instance, R d could be replaced by a more general Banachspace, and H could be equipped with a weighted uniform norm.) (i) For f θ,η taking the form (13), condition (i) of Theorem 2.3 takes the form(14) sup f ∈F n | G n,Q f | → Q Q = P ◦ ( η , h ) − , G n,Q the empirical process of a random sample from themeasure Q , and F n the class of functions(15) F n = (cid:8) ( y, z ) θ ( y + n − / z ) − θ ( y ) : θ ∈ Θ (cid:9) . Condition (i) requires that (14) is valid for every ﬁxed choice of h ∈ H , i.e. forevery measure Q determined as the law of ( η ( X ) , h ( X )) for some h ∈ H and X distributed according to P .This situation is of the form considered in Lemmas 2.1 and 2.2, and both lemmasmay be applicable in a given setting. It is not especially helpful to restate theselemmas for the present special situation. Instead, we give one easy to check set ofsuﬃcient conditions. This covers VC-classes Θ, and much more.If Θ env : R d → R is an envelope function for Θ, then(16) F n ( y, z ) = Θ env ( y + n − / z ) + Θ env ( y )is an envelope function for F n . (A crude one, because we do not exploit that thefunctions in F n are diﬀerences.) andom empirical processes Lemma 3.1.

Suppose that J (1 , Θ , L ) < ∞ , that Θ is suitably measurable, that P (Θ env ◦ η ) < ∞ , and that the functions Θ env ◦ ( η + n − / h ) satisfy the Lindebergcondition in L ( P ) , for every h ∈ H . If sup θ ∈ Θ P (cid:0) θ ◦ ( η + n − / h ) − θ ◦ η (cid:1) → , for every h ∈ H , then condition (i) of Theorem is satisﬁed for the functions f θ,η given by (13) .Proof. It suﬃces to prove (14) for the classes F n given in (15). The class F n iscontained in the diﬀerence of the classes F ′ n = { ( y, z ) θ ( y + n − / z ) : θ ∈ Θ } , F ′′ = { ( y, z ) θ ( y ) : θ ∈ Θ } . These classes possess envelope functions F ′ n and F ′′ deﬁned by F ′ n ( y, z ) = Θ env ( y + n − / z ) ,F ′′ ( y, z ) = Θ env ( y ) . The uniform entropy of F ′′ relative to F ′′ is ﬁnite by assumption. The uniformentropy of F ′ n relative to F ′ n is exactly the same, as the law of Y + n − / Z runsthrough all possible laws on R d if the law of ( Y, Z ) runs through all possible lawson R d × R d . The uniform entropy of F n relative to F n is bounded by the sum ofthe uniform entropies of F ′ n and F ′′ . (Cf. e.g. Theorem 2.10.20 of van der Vaartand Wellner [12].) Now apply Lemma 2.2. θ Assume that every function θ : R d → R in the class Θ is uniformly Lipschitz inthat(17) | θ ( r ) − θ ( r ) | ≤ k r − r k . Then, for every x ∈ X , (cid:12)(cid:12)(cid:12) θ (cid:0) η ( x ) + n − / h ( x ) (cid:1) − θ (cid:0) η ( x ) + n − / h ( x ) (cid:1)(cid:12)(cid:12)(cid:12) ≤ k h ( x ) − h ( x ) k√ n . The norm in the right side is bounded by the supremum norm k h − h k on ℓ ∞ ( X , R d ). It follows that the classes F n ( θ, h , δ ) as in (6) possess envelope functions(18) F n ( θ, h , δ ) = δ/ √ n. Theorem 3.1. If Θ is a suitably measurable collection of uniformly bounded, uni-formly Lipschitz functions θ : R d → R such that J (1 , Θ , L ) < ∞ (relative to aconstant envelope function), η ∈ ℓ ∞ ( X , R d ) , and the sequence √ n ( η n − η ) con-verges weakly in ℓ ∞ ( X , R d ) to a tight random element, then sup θ ∈ Θ (cid:12)(cid:12) G n (cid:0) θ ( η n ) − θ ( η ) (cid:1)(cid:12)(cid:12) → p . A. W. van der Vaart and J. A. Wellner

Proof.

With the envelope functions F n ( θ, h , δ ) as deﬁned in (18), condition (ii) ofTheorem 2.3 is trivially satisﬁed because the envelopes are actually constants, andthe validity of condition (iii) is immediate.By assumption we can choose the envelope function of Θ equal to a constantand J (1 , Θ , L ) < ∞ . This suﬃces for the veriﬁcation of most of the conditions ofLemma 3.1. Finally, it suﬃces to note that P (cid:0) θ ◦ ( η + n − / h ) − θ ◦ η (cid:1) ≤ k h k /n. By Lemma 3.1 we conclude that condition (i) of Theorem 2.3 is also satisﬁed,whence the theorem follows from Theorem 2.3.For the veriﬁcation of condition (i) of Theorem 2.3 it suﬃces to consider thefunctions θ on the range of the functions η + h / √ n for a ﬁxed h in the supportof the limit distribution of the sequence √ n ( η n − η ). Thus we may restrict thefunctions θ to a subset of R d that contains the ranges of these functions and in-terpret the condition J (1 , Θ , L ) < ∞ in Lemma 3.1 accordingly. In particular, inTheorem 2.3 we may replace this condition by the condition that J (1 , Θ K , L ) < ∞ for every norm-bounded subset K ⊂ R d , where Θ K is the collection of restrictions θ : K → R of the functions θ ∈ Θ.Any collection of uniformly bounded, Lipschitz functions θ : K → R on a com-pact interval K satisﬁes J (1 , Θ , L ) < ∞ . (Cf. e.g. van der Vaart and Wellner [12],page 157.) Thus in the case that d = 1 the assertion of the preceding theorem istrue for any collection of uniformly bounded Lipschitz functions.For d > C α ( K ) for a compact interval K ⊂ R d possesses a ﬁnite uniform entropy integral provided α > d/

2. (Cf. e.g. van der Vaartand Wellner [12], page 157.) The assertion of the preceding theorem is also true forsuch a class.There are many other examples of classes of Lipschitz functions with ﬁnite uni-form entropy integrals. For instance, VC-classes of Lipschitz functions. θ Assume that every function θ : R d → R in Θ is the survival function θ ( x ) = R [ x, ∞ ) dθ of a subprobability measure on R d . Then each θ is nonincreasing in eachof its arguments. If H = ℓ ∞ ( X , R d ) equipped with the uniform norm relative to themax-norm on R d , then (cid:12)(cid:12)(cid:12) θ ( η + n − / h ) − θ ( η + n − / h ) (cid:12)(cid:12)(cid:12) ≤ θ ( η + n − / h − n − / k h − h k ) − θ ( η + n − / h + n − / k h − h k ) . It follows that the classes F n ( θ, h , δ ) possess envelope functions, with δ the vector( δ, . . . , δ ), F n ( θ, h , δ ) = θ ( η + n − / h − n − / δ ) − θ ( η + n − / h + n − / δ ) . In order to verify condition (iii) of Theorem 2.3, we assume that for given (possiblyinﬁnite) a < b in R d and every δ n ↓ K ⊂ H ∪ { } ,(19) sup t ∈ R d ,a ≤ t ≤ b sup h ∈ K √ nP (cid:0) η + n − / h ≤ t + n − / δ n − η + n − / h ≤ t (cid:1) → . andom empirical processes Theorem 3.2.

Let Θ be a collection of survival functions θ : R d → [0 , of subprob-ability measures supported on an interval ( a, b ) ⊂ R d . If the sequence √ n ( η n − η ) converges in distribution in ℓ ∞ ( X , R d ) to a tight Borel measure concentrating on the σ -compact set H , and (19) holds for every δ n ↓ and every compact K ⊂ H ∪{ } ,then sup θ (cid:12)(cid:12) G n (cid:0) θ ( η n ) − θ ( η ) (cid:1)(cid:12)(cid:12) → p . Proof.

The survival functions of subprobability measures are in the convex hull ofthe set of indicator functions 1 [ t, ∞ ) , which is a VC-class. Therefore the entropyintegral J (1 , Θ , L ) of Θ relative to a constant envelope is ﬁnite. (Cf. e.g. van derVaart and Wellner [12], page 145.)Deﬁning F n ( θ, η , δ ) as in the display preceding the theorem, we can write P F n ( θ, η , δ )= Z P (cid:0) ( −∞ ,s ] ( η + n − / h − n − / δ ) − ( −∞ ,s ] ( η + n − / h ) (cid:1) dθ ( s ) ≤ k θ k sup a ≤ s ≤ b P (cid:0) ( −∞ ,s ] ( η + n − / h − n − / δ ) − ( −∞ ,s ] ( η + n − / h ) (cid:1) . By assumption (19) the right side converges to zero faster than 1 / √ n , for every δ = δ n ↓

0, uniformly in h ∈ K , and uniformly in θ because the total variationnorms k θ k are uniformly bounded. This veriﬁes condition (iii) of Theorem 2.3.Because θ is monotone with range contained in [0 , P (cid:0) θ ( η + n − / δ ) − θ ( η ) (cid:1) ≤ sup a ≤ s ≤ b P (cid:0) ( −∞ ,s ] ( η − n − / δ ) − ( −∞ ,s ] ( η ) (cid:1) . By assumption (19) with h = 0 this converges to zero faster than 1 / √ n for everysequence δ n ↓

0. This can be seen to imply that the expression in the display (whichdoes not have the leading √ n ) converges to zero also for ﬁxed δ . By monotonicityof θ we can bound (cid:12)(cid:12) θ ( η + h / √ n ) − θ ( η ) (cid:12)(cid:12) by (cid:12)(cid:12) θ ( η − δ/ √ n ) − θ ( η ) (cid:12)(cid:12) for δ = k h k .By Lemma 3.1 we now conclude that condition (i) of Theorem 2.3 is satisﬁed.In the present case the envelope functions F n ( θ, h , δ ) are equal to the functions f θ,η + n − / h − f θ,η − n − / h for h = h − δ . Therefore, the validity of condition (ii)of Theorem 2.3 follows by the same arguments as used for the validity of condition(i).Condition (19) is a uniform Lipschitz condition on the distribution functions ofthe variables η ( X ) + h ( X ) / √ n . If the distribution of η ( X ) is smooth, then wemight expect that the distribution functions of the perturbed variables η ( X ) + h ( X ) / √ n will be smooth as well. However, this appears not to be true in general,and it will usually be necessary to exploit some information about the functions h .(We need to consider functions in the support of the limit measure of the sequence √ n ( η n − η ).) In this respect the conditions of Theorem 3.2 for composition withmonotone functions are much more stringent than the conditions of Theorem 3.1for the composition with Lipschitz functions.The condition (19) is in terms of the indicator functions 1 ( −∞ ,s ] , and would haveexactly the same form if we considered only indicator functions θ = 1 ( −∞ ,θ ] , ratherthan general monotone functions. Thus the restrictive condition is connected tostudying the classical empirical process.The following lemma allows the veriﬁcation of condition (19) in many cases. Itwill also be used in the next section to prove applicability of the delta-method. Thelemma is similar to Lemma 5.1 of Ghoudi and R´emillard [6]. A. W. van der Vaart and J. A. Wellner

Lemma 3.2.

Suppose that

X, Y, Y t (with t > ) are real-valued random variableson a common probability space such that(i) X possesses a Lebesgue density f that is continuous in a neighbourhood of x ;(ii) k Y t − Y k ∞ → and k Y k ∞ < ∞ ;(iii) the conditional distribution of Y given X = s can be represented by a Markovkernel K ( s, · ) such that the map s K ( s, · ) is continuous at x for the weaktopology.Then for every continuous function g : R → R and every converging sequences x t → x , a t → a and ≤ b t → b , as t → , t E g ( Y t )1 x t

First consider the case that Y t = Y for every t . By the deﬁnitions of K and f , we can write1 t E g ( Y )1 x t

0. Therefore1 t E (cid:12)(cid:12) g ( Y t ) − g ( Y ) (cid:12)(cid:12) x t

0. We conclude from this that1 t (cid:12)(cid:12) E g ( Y )(1 x t

X, Y, Y t ) equal to (cid:0) η ( X ) , h ( X ) , h n ( X ) (cid:1) and t = 1 / √ n . Example 2, continued.

Suppose that η n is the empirical distribution of a ran-dom sample from the cumulative distribution function η on R d . Then the limitdistribution of the sequence √ n ( η n − η ) is the d -dimensional η -Brownian sheeton R d .If d = 1, then the Brownian sheet is a Brownian bridge and can be representedas B ◦ η for B a standard Brownian bridge on the unit interval. A typical functionin the support of the limit distribution of the sequence √ n ( η n − η ) can be repre-sented as h = h ◦ η for some function h : [0 , → R . The conditional law of thevariable h ( X ) given η ( X ) = z is the Dirac measure at h ( z ). Because the standardBrownian bridge is continuous, the function h can be taken continuous and hencethe corresponding Markov kernels K ( z, · ) = δ h ( z ) ( · ) are weakly continuous in z , asrequired by the preceding lemma.If d >

1, then we can, without loss of generality, suppose that η is a distributionfunction on [0 , d with uniform marginal distributions (i.e. a copula function). Thenthe conditioning event η ( X ) = z will typically restrict X to a one-dimensionalcurve in [0 , d . Under suﬃcient smoothness of η , this curve will vary continuouslywith z , and under smoothness conditions on the law of X , the conditional distribu-tion of h ( X ) given η ( X ) = z for a continuous function h will vary continuouslyas well. Ghoudi and Remillard [6] give suﬃcient conditions for this continuity in anumber of examples.The preceding lemma can be extended to the case of multidimensional variables.For simplicity we only consider the two-dimensional case. Lemma 3.3.

Suppose that

X, Y, Y t (with t > ) are random variables in R deﬁnedon a common probability space such that A. W. van der Vaart and J. A. Wellner (i) X possesses a Lebesgue density f that has continuous conditional densities;(ii) k Y t − Y k ∞ → and k Y k ∞ < ∞ ;(iii) the conditional distribution of Y given X = s can be represented by a Markovkernel K ( s, · ) such that the map s K ( s, · ) is continuous at x for the weaktopology.Then for every continuous function g : R → R and every converging sequences x t → x , a t → a and b t → b > , as t → , t E g ( Y t )(1 X + ta t Y t ≤ x t + tb t − X + ta t Y t ≤ x t ) → b Z x −∞ Z g ( y ) K (cid:0) ( x , s ) , dy (cid:1) f ( x , s ) ds + b Z x −∞ Z g ( y ) K (cid:0) ( s , x ) , dy (cid:1) f ( s , x ) ds . Proof.

The event { X + ta t Y t ≤ x t + tb t } ∩ { X + ta t Y t ≤ x t } c can be decomposedin the three events I = { x t < X + ta t Y t ≤ x t + tb t , X + ta t Y t ≤ x t } , II = { x t < X + ta t Y t ≤ x t + tb t , x t < X + ta t Y t ≤ x t + tb t } , III = { X + ta t Y t ≤ x t , x t < X + ta t Y t ≤ x t + tb t } . In view of the boundedness of the Y t the event II is contained in an event of theform { X ∈ B } for B t rectangles of area O ( t ). Therefore, this event does notcontribute to the limit.The contribution of the event I with Y t = Y can be written Z Z Z g ( y )1 u

Z Z Z g ( y )1 u

4. Pseudo observations

In this section we consider the asymptotic behaviour of the process {√ n ( P n θ ◦ η n − P θ ◦ η ) : θ ∈ Θ } for a given class Θ of functions θ : R d → R . The set-up is the sameas in Section 3. As explained in the introduction we can decompose this process as G n ( θ ◦ η n − θ ◦ η ) + G n θ ◦ η + √ nP ( θ ◦ η n − θ ◦ η ) . Under the conditions of Theorem 3.1 or Theorem 3.2, Theorem 2.3, or their exten-sions, the ﬁrst term will converge to zero in probability in ℓ ∞ (Θ). The second term andom empirical processes will converge in distribution to a Gaussian process in this space if and only if theclass of functions Θ is Donsker for the law P ◦ η − . If the third term also convergesin distribution, then the sum of the three processes is asymptotically tight, and itwill usually be straightforward to deduce its limit distribution from considerationof the marginal distributions.The behaviour of the third term will follow by the (functional) delta-method ifthe sequence √ n ( η n − η ) converges in distribution in the Banach space H and themap η ( P θ ◦ η : θ ∈ Θ) from H to ℓ ∞ (Θ) is suitably diﬀerentiable. If the limitdistribution of the sequence √ n ( η n − η ) concentrates on the space H ⊂ H , then itsuﬃces that the map η ( P θ ◦ η : θ ∈ Θ) be “Hadamard diﬀerentiable tangentiallyto H ”, i.e. for every converging sequence h t → h ∈ H ⊂ H t P h ( θ ◦ ( η + th t ) − θ ◦ η ) i → L ( h )( θ ) , uniformly in θ ∈ Θ, for a continuous linear map L : lin H → ℓ ∞ (Θ). Under theadditional condition that L is deﬁned on all H , this implies √ nP ( θ ◦ η n − θ ◦ η ) = L ( √ n ( η n − η ))( θ ) + o P (1) . Cf. van der Vaart and Wellner [12], page 374.As in the preceding section we consider the cases that the functions θ are smoothor of bounded variation separately. In the former case the diﬀerentiability is relativeto a weak norm on H (and is easy to prove), but for discontinuous functions θ , suchas the indicator functions 1 ( −∞ ,θ ] , the diﬀerentiability requires a strong norm on H and some conditions on the underlying distribution. θ If the functions θ are diﬀerentiable with bounded derivatives, then the Hadamarddiﬀerentiability is true for H equipped with the L ( P )-norm on H . Lemma 4.1.

Let the functions θ : R d → R in Θ be continuously diﬀerentiable withderivative ˙ θ such that k ˙ θ ( x ) k ≤ for every x ∈ R d . Then the map η ( P θ ◦ η : θ ∈ Θ) from L ( X , A , P ) to ℓ ∞ (Θ) is Hadamard diﬀerentiable at η with derivative h (cid:0) P ˙ θ ◦ h : θ ∈ Θ (cid:1) .Proof. Given a sequence h t with P | h t − h | → (cid:12)(cid:12)(cid:12) t P (cid:0) θ ( η + th t ) − θ ( η ) (cid:1) − P ˙ θ ( η ) h (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)Z P (cid:0) ˙ θ ( η + sth t ) h t − ˙ θ ( η ) h (cid:1) ds (cid:12)(cid:12)(cid:12) ≤ Z P (cid:13)(cid:13) ˙ θ ( η + sth t ) − ˙ θ ( η ) (cid:13)(cid:13) k h t k ds + P k ˙ θ ( η ) kk h t − h k . The second term on the right is bounded above by P | h t − h | and converges tozero by assumption. The ﬁrst term on the right converges to zero by the dominatedconvergence theorem. θ of bounded variation In the second result we let Θ be a set of functions of bounded variation on abounded interval in R , and consider the Hadamard diﬀerentiability of the map A. W. van der Vaart and J. A. Wellner η ( P θ ◦ η : θ ∈ Θ) as a map from ℓ ∞ ( X ) to ℓ ∞ (Θ). For simplicity of notation,let X be a random variable with law P . Lemma 4.2.

Let the functions θ ∈ Θ be distribution functions of subprobabilitymeasures supported on a compact interval I ⊂ R . Suppose that the variable η ( X ) possesses a Lebesgue density f that is continuous on a neighbourhood of I . Thenthe map η ( P θ ◦ η : θ ∈ Θ) from ℓ ∞ ( X ) to ℓ ∞ (Θ) is Hadamard diﬀerentiable at η tangentially to the set of all h such that there exists a version of the conditionaldistribution of h ( X ) given η ( X ) = s that is weakly continuous in s ∈ I . Thederivative is given by h (cid:0)R E (cid:0) h ( X ) | η ( X ) = s (cid:1) f ( s ) dθ ( s ) : θ ∈ Θ (cid:1) .Proof. Let h be as given and suppose h t → h in ℓ ∞ ( X ).For given s ∈ R and u > χ s,u be the continuous function that takes thevalue 0 on ( −∞ , s − u ], takes the value 1 on [ s, ∞ ) and is linear on the interval[ s − u, s ]. Then 1 [ s, ∞ ) ≤ χ s,u , and hence P (1 s ≤ η + th t − s ≤ η ) ≤ P (cid:0) χ s,u ( η + th t ) − χ s,u ( η ) (cid:1) + P (cid:0) χ s,u ( η ) − [ s, ∞ ) ( η ) (cid:1) . Because η ( X ) possesses a Lebesgue density that is bounded on a neighbourhoodof I and 1 [ s, ∞ ) − χ s,u vanishes oﬀ the set ( s − u, s ), the second term on the right isbounded in absolute value by a multiple of u , uniformly in s ranging through I , forsmall u . By choosing u = δt this term divided by t can be made arbitrarily smallby choice of δ .Because χ s,u is absolutely continuous with derivative 1 /u on ( s − u, s ) and 0elsewhere, the ﬁrst term on the right divided by 1 /t can be written in the form Z u P ( h t s − u<η + vth t ≤ s ) dv. For u = δt this converges to E (cid:0) h ( X ) | η ( X ) = s (cid:1) f ( s ), by Lemma 3.2, uniformly in s ranging over I .It follows that, uniformly in s ranging over I ,lim sup t ↓ (cid:16) t P (1 s ≤ η + th t − s ≤ η ) − E (cid:0) h ( X ) | η ( X ) = s (cid:1) f ( s ) (cid:17) ≤ . A similar argument using the functions χ s + u,u instead of χ s,u gives a correspondinglower bound, whence the expression in brackets converges to zero, uniformly in s ranging through compacta. This concludes the proof of the lemma for Θ equal tothe set of functions 1 [ s, ∞ ) with s in a compact interval.For a general collection Θ of functions of bounded variation we can write P (cid:0) θ ( η + th t ) − θ ( η ) (cid:1) = Z P (1 s ≤ η + th t − s ≤ η ) dθ ( s ) . Next we use the assumption that the functions θ ∈ Θ are supported on the compactinterval I with total variation bounded by 1.The applicability of the second lemma depends on whether the set H of func-tions such that the conditional distribution of h ( X ) given η ( X ) = s is weaklycontinuous in s is large enough to support the limit distribution of the sequence √ n ( η n − η ). As noted in the preceding section, under some smoothness conditionson η and on the distribution of X , the set H typically contains all continuousfunctions. Then it suﬃces that the sequence possesses a continuous weak limitingprocess. andom empirical processes

5. Examples: completion

In this section we return to two of the three examples discussed at the end ofSection 3, Example 2 and Example 3. We give the theorems (and proofs) resultingfrom our approach. The general theme here is that the traditional results given inCorollaries 5.1 and 5.3 for indicator functions involve non-trivial restrictions on theunderlying distribution η of the data, while the results for indexing by Lipschitzfunctions given in Corollaries 5.2 and 5.4 involve almost no restrictions on η (butsigniﬁcantly smoother indexing functions θ ). For the Kendall process, Example 2, it suﬃces to consider the case in which η isconcentrated on [0 , d and has uniform marginal distributions (i.e. is a copula func-tion), as noted by Ghoudi and Remillard [6]. We ﬁrst give a corollary for indexingby indicator functions, and then a corollary for indexing by Lipschitz functions. Corollary 5.1.

Suppose that for a given interval [ a, b ] ⊂ (0 , :(i) The variable η ( X ) possesses a density k with respect to Lebesgue measurethat is continuous on a neighbourhood of [ a, b ] .(ii) The conditional distribution of X given η ( X ) = s , has a regular versionrepresentable as a Markov kernel K ( s, · ) such that s K ( s, · ) is continuouson [ a, b ] for the weak topology.Then the sequence of processes √ n ( K n − K ) as in (11) tends in ℓ ∞ [ a, b ] in distri-bution to the process (cid:0) G η f θ : θ ∈ [ a, b ] (cid:1) for G η an η -Brownian bridge process and f θ : [0 , d → R deﬁned as f θ ( x ) = 1 η ( x ) ≤ θ − k ( θ ) E[1 x ≤ X | η ( X ) = θ ] . Corollary 5.2. (Kendall processes, Example 2, indexed by Lipschitz functions).Suppose that Θ is a suitably measurable collection of continuously diﬀerentiablefunctions θ : [0 , → [ − , with derivatives ˙ θ satisfying k ˙ θ ( x ) k ≤ for every x ∈ [0 , . Then the sequence of processes n − / P ni =1 (cid:0) θ ( η n ( X i )) − P θ ( η ) (cid:1) tendsin distribution in ℓ ∞ (Θ) to the process (cid:0) G η f θ : θ ∈ Θ (cid:1) for G η an η -Brownianbridge process in ℓ ∞ (Θ) and f θ : [0 , d → R deﬁned as f θ ( x ) = θ (cid:0) η ( x ) (cid:1) − P ˙ θ (1 x ≤ X ) . Proof of Corollary 5.1.

We apply the decomposition (2) with f θ,η ( x ) = 1 { η ( x ) ≤ θ } , for distribution functions η on [0 , d , θ ∈ [0 ,

1] and x ∈ [0 , d .As discussed following the proof of Lemma 3.2, hypotheses (i) and (ii) imply thatthe condition (19) for Theorem 3.2 (with d = 1) holds by way of Lemma 3.2, andhence the ﬁrst term on the right side of (2) converges in probability to 0 uniformlyin θ ∈ [ a, b ].The second term is simply the usual empirical process for the i.i.d. one-dimen-sional random variables η ( X ) , . . . , η ( X n ), and hence it converges weakly as claim-ed by standard theory.To handle the third term, note that (i) and (ii) imply that the hypotheses ofLemma 4.2 hold, and hence that the map η

7→ {

P f θ,η : θ ∈ Θ } from ℓ ∞ ( X ) A. W. van der Vaart and J. A. Wellner to ℓ ∞ ([ a, b ]) is Hadamard diﬀerentiable tangentially to C ([0 , d ) with derivative L : C ([0 , d ) → ℓ ∞ ([ a, b ]) given by L ( h )( θ ) = − E (cid:0) h ( X ) || η ( X ) = θ (cid:1) k ( θ ) . Weak convergence of the third term then follows from van der Vaart and Wellner[12], Theorem 3.9.5, page 375.The joint limit law of the second and third term can be determined from themarginals, and the limit of the sum of the two terms can be represented in theform as given. An insightful way to derive this is from asymptotic linearity of thetwo terms as follows. The second term is already linear with inﬂuence functions x η ( x ) ≤ θ . The third term can be approximated by L (( √ n ( η n − η ) (cid:1) , where η n − η = n − P ni =1 (1 X i ≤ x − η ( x )), so that L (( √ n ( η n − η ) (cid:1) = n − / P ni =1 L (1 [ X i , − η ).The terms in the latter sum should be understood as L acting on the functions x [ X i , ( x ) − η ( x ) for ﬁxed X i . We thus obtain that L (( √ n ( η n − h ) (cid:1) = n − / n X i =1 L (1 [ X i , ) − √ nL ( η ) = G η L (1 [ X i , ) . The representation of the limit process as given in the corollary follows.For many distribution functions η the corresponding density k of K is un-bounded at 0 and hence not continuous on [0 , η isthe uniform distribution on [0 , d . For such distributions the preceding corollarydoes not yield convergence of Kendall’s process in the space ℓ ∞ ([0 , k is unbounded. Barbe et al. [2] show thatunder the growth condition k ( t ) = o ( t − / (log(1 /t )) − / − ǫ ) , t ↓ , ǫ > . convergence in the full domain still holds. They achieve this using results of Alexan-der [1] to show that the empirical process √ n ( η n − η )1 { η ≥ a n } converges in theweighted metric k · /q ( η ) k ∞ if q ( t ) = t / (log(1 /t )) p for some 1 / < p < r/ a n = n − (log n ) r . This strengthening of the convergence of √ n ( η n − η ) thencompensates for the growth of k at 0. Proof of Corollary 5.2.

This follows by combining Theorems 3.1 and Lemma 4.1with the fact that F = { θ ◦ η : θ ∈ Θ } is Donsker. For the copula processes (12) in Example 3 it again suﬃces to consider the casein which η = C , so that all all the marginal distributions η ,j , j = 1 , . . . , d , areUniform(0 , Corollary 5.3.

Suppose that:(i) η = C is continuous.(ii) The copula function η = C is continously diﬀerentiable on [0 , d with gra-dient ∇ C ( u ) . andom empirical processes Then the sequence of copula processes √ n ( C n − C ) given in (12) converges in dis-tribution in ℓ ∞ ([0 , d ) to the process ( G η f u : u ∈ [0 , d ) for G η an η -Brownianbridge process, and f u : [0 , d → R deﬁned as f u ( x ) = 1 x ≤ u − ∇ C ( u ) ′ (1 x ≤ u , . . . , x d ≤ u d ) . Corollary 5.4. (Copula processes, Example 3, indexed by Lipschitz functions).Suppose that Θ is a suitably measurable collection of continuously diﬀerentiablefunctions θ : [0 , d → R such that with derivative k ˙ θ ( x ) k ≤ for every x ∈ [0 , d and satisfying J (1 , Θ , L ) < ∞ . Then the sequence of processes n − / P ni =1 (cid:0) θ ( η n ( X i )) − P θ ( η ) (cid:1) tends in distribution in ℓ ∞ (Θ) to the process (cid:0) G η f θ : θ ∈ Θ (cid:1) for G η an η -Brownian bridge process in ℓ ∞ (Θ) and f θ : [0 , d → R deﬁned as f θ ( x ) = θ ( x ) − P ˙ θ (cid:0) x ≤ X , . . . , x d ≤ X d (cid:1) . Proof of Corollary 5.3.

We apply the decomposition (2) with f θ,η ( x ) = 1 { η ( x ) ≤ θ , . . . , η d ( x d ) ≤ θ d } for θ = ( θ , . . . , θ d ) ∈ [0 , d , x = ( x , . . . , x d ) ∈ [0 , d , and η j the j th one-dimensional marginal distribution function on [0 ,

1] of the distribution function η (so η j ( u j ) = η (1 , . . . , , u j , , . . . , θ ∈ [0 , d , wecan apply Theorem 2.1. The class of functions f θ,η ( x ) = 1 { x ≤ η − ( θ ) , . . . , x d ≤ η − d ( θ d ) } is a class of indicators of a Vapnik-Chervonenkis-class of sets. Thus Theorem 2.1applies if we show that (3) holds. But this is easily veriﬁed by the assumed continuityof η = C and the uniform consistency of the empirical quantile functions η − n,j for j = 1 , . . . , d . Thus (1) holds.The second term in (2) is simply the classical empirical process of the randomvectors X , . . . , X n in [0 , d , and converges weakly by classical theory.Finally, the third term in (2) converges weakly to ∇ C ( u ) ′ · G η ( v ( X, u )), for v ( x, u ) = (1 x ≤ u , . . . , x d ≤ u d ), by the delta-method for the map η P f u,η . Thismap can be decomposed as η (cid:0) η − ( u ) , . . . , η − d ( u d ) (cid:1) (cid:0) C ◦ (cid:0) η − ( u ) , . . . , η − d ( u d ) (cid:1) , u ∈ [0 , d and can be shown to Hadamard-diﬀerentiable from the domain of distribution func-tions in ℓ ∞ ( X ) = ℓ ∞ ([0 , d ) to ℓ ∞ (Θ) = ℓ ∞ ([0 , d ) by the chain rule, using thecontinuity of ∇ C and the fact that the quantile transformation is Hadamard dif-ferentiable.It is possible to extend Corollary 5.3 to the case in which ∇ C is continuous on(0 , d but satisﬁes certain growth restrictions at 0 and/or 1. Then weighted metricsare involved in the proof. Proof of Corollary 5.4.

This follows by combining Theorems 3.1 and Lemma 4.1with the fact that F = { θ : θ ∈ Θ } is Donsker and the delta-method, e.g. van derVaart and Wellner [12], Theorem 3.9.5, page 375. A. W. van der Vaart and J. A. Wellner

References [1]

Alexander, K. (1987). The central limit theorem for weighted empiricalprocesses indexed by sets.

J. Mult. Anal. Barbe, P., Genest, C., Ghoudi, K. and Remillard, B. (1996). OnKendall’s process.

J. Mult. Anal. Dudley, R. M. (1999).

Uniform Central Limit Theorems . Cambridge Univ.Press, Cambridge. MR1720712[4]

Dudley, R. M. (1984).

A Course on Empirical Processes . ´Ecole d’ ´Et´e de St.Flour . Springer, New York. MR0876079[5] G¨anssler, P. and Stute, W. (1987).

Seminar on Empirical Processes. DMVSeminar . Birkh¨auser, Basel. MR0902803[6]

Ghoudi, K. and Remillard, B. (1998). Empirical processes based onpseudo-observations. In

Asymptotic Methods in Probability and Statistics (Ot-tawa, ON, 1997) 171–197. North-Holland, Amsterdam. MR1661480[7]

Ghoudi, K. and Remillard, B. (2004). Empirical processes based onpseudo-observations. II. The multivariate case. In

Asymptotic Methods inStochastics

Fields Inst. Commun. . Amer. Math. Soc., Providence,RI. MR2106867[8] Pollard, D. (1984).

Convergence of Stochastic Processes . Springer, NewYork. MR0762984[9]

Stute, W. (1984). The oscillation behavior of empirical processes: The mul-tivariate case.

Ann. Statist. van der Vaart, A. W. (1998). Asymptotic Statistics . Cambridge Univ. Press,Cambridge. MR1652247[11] van der Vaart, A. W. (2002). Semiparametric statistics. ´Ecole d’ ´Et´e de St.Flour 1999 van der Vaart, A. W. and Wellner, J. A. (1996).