Empirical processes indexed by estimated functions
aa r X i v : . [ m a t h . S T ] S e p IMS Lecture Notes–Monograph SeriesAsymptotics: Particles, Processes and Inverse Problems
Vol. 55 (2007) 234–252c (cid:13)
Institute of Mathematical Statistics, 2007DOI:
Empirical processes indexed byestimated functions
Aad W. van der Vaart1 and Jon A. Wellner2 , ∗ Vrije Universiteit Amsterdam and University of Washington
Abstract:
We consider the convergence of empirical processes indexed byfunctions that depend on an estimated parameter η and give several alterna-tive conditions under which the “estimated parameter” η n can be replacedby its natural limit η uniformly in some other indexing set Θ. In particularwe reconsider some examples treated by Ghoudi and Remillard [ AsymptoticMethods in Probability and Statistics (1998) 171–197,
Fields Inst. Commun.
1. Introduction
Let X , . . . , X n be i.i.d. random elements in a measurable space ( X , A ) with law P , and for a measurable function f : X → R let the expectation, empirical measureand empirical process at f be denoted by P f = Z f dP, P n f = 1 n n X i =1 f ( X i ) , G n f = √ n ( P n − P ) f. Given a collection { f θ,η : θ ∈ Θ , η ∈ H } of measurable functions f θ,η : X → R indexed by sets Θ and H and “estimators” η n , we wish to prove that, as n → ∞ ,(1) sup θ ∈ Θ (cid:12)(cid:12)(cid:12) G n ( f θ,η n − f θ,η ) (cid:12)(cid:12)(cid:12) → p . Here an “estimator” η n is a random element with values in H defined on the sameprobability space as X , . . . , X n , and η ∈ H is a fixed element, which is typicallya limit in probability of the sequence η n .The result (1) is interesting for several applications. A direct application is tothe estimation of the functional θ P f θ,η . If the parameter η is unknown, we mayreplace it by an estimator η n and use the empirical estimator P n f θ,η n . The result(1) helps to derive the limit behaviour of this estimator, as we can decompose(2) √ n ( P n f θ,η n − P f θ,η ) = G n ( f θ,η n − f θ,η ) + G n f θ,η + √ nP ( f θ,η n − f θ,η ) . If (1) holds, then the first term on the right converges to zero in probability. Un-der appropriate conditions on the functions f θ,η , the second term on the right ∗ Supported in part by NSF Grant DMS-05-03822, NI-AID Grant 2R01 AI291968-04, and bygrant B62-596 of the Netherlands Organisation of Scientific Research NWO Section Stochastics, Department of Mathematics, Faculty of Sciences, Vrije Universiteit, DeBoelelaan 1081a, 1081 HV Amsterdam, e-mail: [email protected] University of Washington, Department of Statistics, Box 354322, Seattle, Washington 98195-4322, USA, e-mail: [email protected]
AMS 2000 subject classifications:
Keywords and phrases: delta-method, Donsker class, entropy integral, pseudo observation.234 andom empirical processes will converge to a Gaussian process by the (functional) central limit theorem. Thebehavior of the third term depends on the estimators η n , and would typicallyfollow from an application of the (functional) delta-method, applied to the map η ( P f θ,η : θ ∈ Θ).In an interesting particular case of this situation, the functions f θ,η take the form f θ,η ( x ) = θ (cid:0) η ( x ) (cid:1) , for maps θ : R d → R and each η ∈ H being a map η : X → R d . The realizationsof the estimators η n are then functions x η n ( x ) = η n ( x ; X , . . . , X n ) on thesample space X and can be evaluated at the observations to obtain the randomvectors η n ( X ) , . . . , η n ( X n ) in R d . The process { P n f θ,η n : θ ∈ Θ } is the empiricalmeasure of these vectors indexed by the functions θ . For instance, if Θ consistsof the indicator functions 1 ( −∞ ,θ ] for θ ∈ R d , then this measure is the empiricaldistribution function θ P n f θ,η n = 1 n n X i =1 { η n ( X i ) ≤ θ } of the random vectors η n ( X ) , . . . , η n ( X n ). The properties of such empirical proces-ses were studied in some generality and for examples of particular interest in Ghoudiand Remillard [6, 7]. Ghoudi and Remillard [6] apparently coined the name “pseudo-observations” for the vectors η n ( X ) , . . . , η n ( X n ). The examples include, for in-stance, regression residuals, Kendall’s dependence process, and copula processes;see the end of Section 2 for explicit formulation of these three particular examples.One purpose of the present paper is to extend the results in these papers also toother index classes Θ besides the class of indicator functions. Another purpose is torecast their results in terms of empirical process theory, which leads to simplificationand alternative conditions.A different, indirect application of (1) is to the derivation of the asymptoticdistribution of Z -estimators. A Z -estimator for θ might be defined as the solutionˆ θ n of the equation P n f θ,η n = 0, where again an unknown “nuisance” parameter η is replaced by an estimator η n . In this case (1) shows that P n f ˆ θ n ,η n − P n f ˆ θ n ,η = P ( f ˆ θ n ,η n − f ˆ θ n ,η ) + o P (1 / √ n ) , so that the limit behavior of ˆ θ n can be derived by comparison with the estimatingequation defined by P n f θ,η (with η substituted for η n ). The “drift” sequence P ( f ˆ θ n ,η n − f ˆ θ n ,η ), which will typically be equivalent to P ( f θ ,η n − f θ ,η ) up toorder o P (1 / √ n ), may give rise to an additional component in the limit distribution.The paper is organized as follows. In Section 2 we derive general conditions for thevalidity of (1) and formulate several particular examples to be considered in moredetail in the sequel. In Section 3 we specialize the general results to compositionmaps. In Section 4 we combine these results with results on Hadamard differen-tiability to obtain the asymptotic distribution of empirical processes indexed bypseudo observations. Finally in Section 5 we formulate our results for several of theparticular examples mentioned above and at the end of Section 2.
2. General result
In many situations we wish to establish (1) without knowing much about the natureof the estimators η n , beyond possibly that they are consistent for some value η . A. W. van der Vaart and J. A. Wellner
For instance, this is true if (1) is used as a step in the derivation of M − or Z − estimators. (Cf. Van der Vaart and Wellner [12] and Van der Vaart [11].) Then anappropriate method of establishing (1) is through a Donsker or entropy condition,as in the following theorems. Proofs of the Theorems 2.1 and 2.2 can be found inthe mentioned references.Both theorems assume that η n is “consistent for η ” in the sense that(3) sup θ ∈ Θ P ( f θ,η n − f θ,η ) → p . Theorem 2.1.
Suppose that H is a fixed subset of H such that Pr( η n ∈ H ) → and suppose that the class of functions { f θ,η : θ ∈ Θ , η ∈ H } is P -Donsker. If (3) holds, then (1) is valid. For the second theorem, let N ( ǫ, F , L ( P )) and N [ ] ( ǫ, F , L ( P )) be the ǫ -coveringand ǫ -bracketing numbers of a class F of measurable functions (cf. Pollard [8] andvan der Vaart and Wellner [12]) and define entropy integrals by J ( δ, F , L ) = Z δ sup Q q log N ( ǫ k F k Q, , F , L ( Q )) dǫ, (4) J [ ] ( δ, F , L ( P )) = Z δ q log N [ ] ( ǫ k F k P, , F , L ( P )) dǫ. (5)Here F is an arbitrary, measurable envelope function for the class F : a measurablefunction F : X → R such that | f ( x ) | ≤ F ( x ) for every f ∈ F and x ∈ X . Wesay that a sequence F n of envelope functions satisfies the Lindeberg condition if P F n = O (1) and P F n F n ≥ ǫ √ n → ǫ > Theorem 2.2.
Suppose that H n are subsets of H such that Pr( η n ∈ H n ) → and such that the classes of functions F n = { f θ,η : θ ∈ Θ , η ∈ H n } satisfy either J [ · ] ( δ n , F n , L ( P )) → , or J ( δ n , F n , L ) → for every sequence δ n → , relativeto envelope functions that satisfy the Lindeberg condition. In the second case alsoassume that the classes F n are suitably measurable (e.g. countable). If (3) holds,then (1) is valid. Because there are many techniques to verify that a given class of functions isDonsker, or to compute bounds on its entropy integrals, the preceding lemmas givequick results, if they apply. Furthermore, they appear to be close to best possibleunless more information about the estimators η n can be brought in, or explicitcomputations are possible for the functions f θ,η .In some applications the estimators η n are known to converge at a certain rateand/or known to possess certain regularity properties (e.g. uniform bounded deriva-tives). Such knowledge cannot be exploited in Theorem 2.1, but could be used forthe choice of the sets H n in Theorem 2.2. We now discuss an alternative approachwhich can be used if the estimators η n are also known to converge in distribution,if properly rescaled.Let H be a Banach space, and suppose that the sequence √ n ( η n − η ) convergesin distribution to a tight, Borel-measurable random element in H . The “convergencein distribution” may be understood in the sense of Hoffmann-Jørgensen, so that η n need not be Borel-measurable itself.The tight limit of the sequence √ n ( η n − η ) takes its values in a σ -compact subset H ⊂ H . For θ ∈ Θ, h ∈ H , and δ > F n ( θ, h , δ ) = (cid:8) f θ,η + n − / h − f θ,η + n − / h : h ∈ H, k h − h k < δ (cid:9) . andom empirical processes Let F n ( θ, h , δ ) be arbitrary measurable envelope functions for these classes. Theorem 2.3.
Suppose that the sequence √ n ( η n − η ) converges in distribution toa tight, random element with values in a given σ -compact subset H of H . Supposethat(i) sup θ | G n ( f θ,η + n − / h − f θ,η ) | → p for every h ∈ H .(ii) sup θ | G n F n ( θ, h , δ ) | → p for every δ > and every h ∈ H ;(iii) sup θ sup h ∈ K √ n P F n ( θ, h , δ n ) → for every δ n → and every compact K ⊂ H ;Then (1) is valid.Proof. Suppose that √ n ( η n − η ) ⇒ Z and let ǫ > K ⊂ H with P ( Z ∈ K ) > − ǫ and hence for every δ >
0, with K δ the set of all points at distance less than δ to K ,lim inf n →∞ Pr (cid:0) √ n ( η n − η ) ∈ K δ/ (cid:1) > − ǫ. In view of the compactness of K there exist finitely many elements h , . . . , h p ∈ K ⊂ H (with p = p ( δ ) depending on δ ) such that the balls of radius δ/ K . Then K δ/ is contained in the union of the balls of radius δ ,by the triangle inequality. Thus, with B ( h, δ ) denoting the ball of radius δ around h in the space H , (cid:8) √ n ( η n − η ) ∈ K δ/ (cid:9) ⊂ p ( δ ) [ i =1 (cid:8) η n ∈ B ( η + n − / h i , δ ) (cid:9) . It follows that with probability at least 1 − ǫ , as n → ∞ ,sup θ | G n ( f θ,η n − f θ,η ) |≤ sup θ max i sup k h − h i k <δ | G n ( f θ,η + n − / h − f θ,η ) |≤ sup θ max i sup k h − h i k <δ h | G n ( f θ,η + n − / h − f θ,η + n − / h i ) | + | G n ( f θ,η + n − / h i − f θ,η ) | i ≤ sup θ max i | G n F n ( θ, h i , δ ) | + 2 sup θ sup h ∈ K √ n P F n ( θ, h , δ )+ sup θ max i | G n ( f θ,η + n − / h i − f θ,η ) | , where in the last step we use the inequality | G n f | ≤ | G n F | + 2 √ nP F , valid for anyfunctions f and F with | f | ≤ F . The maxima in the display are over the finite set i = 1 , . . . , p ( δ ), and the elements h , . . . , h p ( δ ) ∈ K depend on δ . By assumptions(i) and (ii) the first and third terms converge to zero as n → ∞ , for every fixed δ . It follows that there exists δ n ↓ δ n substitutedfor δ converge to 0. For this δ n , all three terms converge to zero in probability as n → ∞ .The rate of convergence √ n in the preceding theorem may be replaced by anotherrate, with appropriate changes in the conditions, but the rate √ n appears natural inthe following context. For more general metrizable topological vector spaces thereare similar, but less attractive, results possible. A. W. van der Vaart and J. A. Wellner
The two conditions (i), (ii) of Theorem 2.3 concern the empirical process indexedby the classes of functions { f θ,η + n − / h − f θ,η : θ ∈ Θ } , (7) { F n ( θ, h , δ ) : θ ∈ Θ } . (8)These classes are indexed by Θ only, and hence Theorem 2.3, if applicable, avoidsconditions for (1) that involve measures of the complexity of the class { f θ,η : θ ∈ Θ , η ∈ H } due to the parameter η ∈ H .Condition (iii) of Theorem 2.3 involves the mean of the envelopes of the classes F n ( θ, h , δ ). For the minimal envelopes this condition takes the formsup θ sup h ∈ K √ n P sup k h − h k <δ n | f θ,η + n − / h − f θ,η + n − / h | → δ n ↓
0. This is an “integrated uniform local Lipschitz assumption” on thedependence η f θ,η . In some applications it may be useful not to use the minimalenvelope functions. The lemma is valid for any envelope functions, as long as thesame envelopes are used in both (ii) and (iii).The set K in (iii) or (9) is a compact set in the support of the limit distributionof the sequence √ n ( η n − η ). In some cases condition (iii) may be valid for anycompact K ⊂ H , whereas in other cases more precise information about the limitprocess must be exploited. For instance, if the sequence √ n ( η n − η ) converges indistribution to a tight zero-mean Gaussian process G in the space H = ℓ ∞ ( T ) ofbounded functions on some set T , then K may be taken to be a set of functions z : T → R that is uniformly bounded and uniformly equicontinuous relative to thesemimetric with square d ( s, t ) = E( G s − G t ) (and T will be totally bounded for d ). Cf. e.g. van der Vaart and Wellner [12], page 39.Condition (iii) is an analytical condition, whereas conditions (i) and (ii) areempirical process conditions. In many cases the latter pair of conditions can beverified by standard empirical process type arguments. For reference we quote twolemmas that allow handling the empirical process indexed by a sequence of classes,as in (8) or (7). (For proofs see e.g. van der Vaart [10, 11].) Both lemmas apply toclasses F n of measurable functions f : X 7→ R such thatsup f ∈F n P f → . (10) Lemma 2.1.
Suppose that the class of functions S n F n is P -Donsker. If (10) holds, then sup f ∈F n | G n ( f ) | → p . Lemma 2.2.
Suppose that either J [ · ] ( δ n , F n , L ( P )) → or J ( δ n , F n , L ) → forall δ n ↓ relative to envelope functions F n that satisfy the Lindeberg condition. Inthe second case also assume that each class F n is suitably measurable. If (10) holds,then sup f ∈F n | G n ( f ) | → p . Example 1 (Regression residual processes) . Suppose that ( X , Y ) , . . . , ( X n ,Y n ) are a random sample distributed according to the regression model Y = g η ( X )+ e . For given estimators η n we can form the residuals ˆ e i = Y i − g η n ( X i ) and may beinterested in the empirical process corresponding to ˆ e , . . . , ˆ e n , i.e. for a collectionΘ of functions θ : R → R we consider the process (cid:8) n − P ni =1 θ (ˆ e i ) : θ ∈ Θ (cid:9) . Thisfits the general set-up with the functions f θ,η defined as f θ,η ( x, y ) = θ (cid:0) y − g η ( x ) (cid:1) . andom empirical processes In many cases it will be possible to apply Theorem 2.1. For instance, if x ∈ R d , g η is a polynomial in x , and Θ is the class of indicator functions 1 ( −∞ ,θ ] for θ ∈ R , thenthe functions f θ,η are the indicator functions of the sets { ( x, y ) : y − g η ( x ) − θ ≤ } .Because the set of functions ( x, y ) y − g η ( x ) − θ is contained in a finite-dimensionalvector space, it is a VC-class, and hence so are their negativity sets (e.g. van derVaart and Wellner [12], Lemma 2.6.18). Thus the class of functions f θ,η is Donsker,and Theorem 2.1 can be applied directly. Example 2 (Kendall’s process) . Let η n be the empirical distribution function ofa random sample X , . . . , X n from a distribution η on R d . Barbe, Genest, Ghoudiand Remillard [2] and Ghoudi and Remillard [6] study the behavior of the empiricaldistribution function K n of the pseudo-observations η n ( X i ), K n ( θ ) = 1 n n X i =1 { η n ( X i ) ≤ θ } , θ ∈ [0 , , and the resulting Kendall’s process(11) √ n ( K n ( θ ) − K ( θ )) , θ ∈ [0 , K ( θ ) = P ( η ( X ) ≤ θ ). This fits the general set-up with f θ,η the compositionfunction f θ,η = θ ◦ η , and θ the indicator function 1 ( −∞ ,θ ] (where we abuse notationby using the symbol θ in two different ways).An attempt to apply Theorem 2.1 to this problem would lead to the considerationof the class of all indicator functions of sets of the form { x ∈ R d : η ( x ) ≤ θ } for η ranging over the cumulative distribution functions on R d and θ ∈ [0 , R d , and, unfortunately,fails to be Donsker for most distributions (cf. Dudley [3], page 264, 373 or Dudley[4]). In this case it appears to be necessary to exploit the limit behaviour of thesequence √ n ( η n − η ). Ghoudi and Remillard [6] have shown that (1) is valid in thiscase, under some strong smoothness assumptions on the underlying measure η . InSections 4 and 5 we rederive some of their results by empirical process methodsusing Theorem 2.3.We also consider the empirical process of the variables η n ( X ) , . . . , η n ( X n ) in-dexed by classes of functions other than the indicators 1 ( −∞ ,θ ] . If the indexing func-tions are smooth, then this empirical process will converge even without smoothnessconditions on η . A proof can be based on Theorem 2.3. Example 3 (Copula processes) . Suppose that X , . . . , X n are a sample from a dis-tribution η on R d . Write X i = ( X i, , . . . , X i,d ) and let η , , . . . , η ,d be the marginaldistributions. The copula function C associated with η is the distribution functionof the vector (cid:0) η , ( X , ) , . . . , η ,d ( X ,d ) (cid:1) , i.e. with η − ,j ( u ) = inf { x : η ,j ( x ) ≥ u } for u ∈ [0 , C ( u , . . . , u d ) = η ( η − , ( u ) , . . . , η − ,d ( u d ))for ( u , . . . , u d ) ∈ [0 , d . For j = 1 , . . . , d let η n,j be the empirical distributionfunction of X ,j , . . . , X n,j (on R ), and let η n be the empirical distribution functionof X , . . . , X n (on R d ). Then a natural estimator C n of C is given by C n ( u ) = 1 n n X i =1 { ˆ e n,i ≤ u } , u ∈ [0 , d , A. W. van der Vaart and J. A. Wellner for the “pseudo-observations” ˆ e n,i = (cid:0) η n, ( X i, ) , . . . , η n,d ( X i,d ) (cid:1) . The resulting“copula processes”,(12) √ n ( C n ( u ) − C ( u )) , u ∈ [0 , d , have been considered by Stute [9], G¨anssler and Stute [5], and Ghoudi and Remil-lard [7]. This example can be treated using Theorem 2.3, but also with the morestraightforward Theorem 2.1, or even by employing the theory of Hadamard differ-entiability, as in Chapter 3.9 of Van der Vaart and Wellner [12].
3. Composition
In this section we consider the case where the functions f θ,η take the form(13) f θ,η ( x ) = θ ( η ( x )) , for θ ranging over a class Θ of functions θ : R d → R and η ranging over a class H ofmeasurable functions η : X → R d . We first give general conditions for the validityof condition (iii) of Theorem 2.3, and next consider also the conditions (i) and (ii)for the special cases of functions θ that are Lipschitz and monotone, respectively.We develop these results for the case that the sequence √ n ( η n − η ) converges indistribution in the space H = ℓ ∞ ( X , R d ) of uniformly bounded functions z : X → R d , equipped with the uniform norm k z k = sup x ∈X (cid:13)(cid:13) z ( x ) (cid:13)(cid:13) . (Variations of theseresults are possible. For instance, R d could be replaced by a more general Banachspace, and H could be equipped with a weighted uniform norm.) (i) For f θ,η taking the form (13), condition (i) of Theorem 2.3 takes the form(14) sup f ∈F n | G n,Q f | → Q Q = P ◦ ( η , h ) − , G n,Q the empirical process of a random sample from themeasure Q , and F n the class of functions(15) F n = (cid:8) ( y, z ) θ ( y + n − / z ) − θ ( y ) : θ ∈ Θ (cid:9) . Condition (i) requires that (14) is valid for every fixed choice of h ∈ H , i.e. forevery measure Q determined as the law of ( η ( X ) , h ( X )) for some h ∈ H and X distributed according to P .This situation is of the form considered in Lemmas 2.1 and 2.2, and both lemmasmay be applicable in a given setting. It is not especially helpful to restate theselemmas for the present special situation. Instead, we give one easy to check set ofsufficient conditions. This covers VC-classes Θ, and much more.If Θ env : R d → R is an envelope function for Θ, then(16) F n ( y, z ) = Θ env ( y + n − / z ) + Θ env ( y )is an envelope function for F n . (A crude one, because we do not exploit that thefunctions in F n are differences.) andom empirical processes Lemma 3.1.
Suppose that J (1 , Θ , L ) < ∞ , that Θ is suitably measurable, that P (Θ env ◦ η ) < ∞ , and that the functions Θ env ◦ ( η + n − / h ) satisfy the Lindebergcondition in L ( P ) , for every h ∈ H . If sup θ ∈ Θ P (cid:0) θ ◦ ( η + n − / h ) − θ ◦ η (cid:1) → , for every h ∈ H , then condition (i) of Theorem is satisfied for the functions f θ,η given by (13) .Proof. It suffices to prove (14) for the classes F n given in (15). The class F n iscontained in the difference of the classes F ′ n = { ( y, z ) θ ( y + n − / z ) : θ ∈ Θ } , F ′′ = { ( y, z ) θ ( y ) : θ ∈ Θ } . These classes possess envelope functions F ′ n and F ′′ defined by F ′ n ( y, z ) = Θ env ( y + n − / z ) ,F ′′ ( y, z ) = Θ env ( y ) . The uniform entropy of F ′′ relative to F ′′ is finite by assumption. The uniformentropy of F ′ n relative to F ′ n is exactly the same, as the law of Y + n − / Z runsthrough all possible laws on R d if the law of ( Y, Z ) runs through all possible lawson R d × R d . The uniform entropy of F n relative to F n is bounded by the sum ofthe uniform entropies of F ′ n and F ′′ . (Cf. e.g. Theorem 2.10.20 of van der Vaartand Wellner [12].) Now apply Lemma 2.2. θ Assume that every function θ : R d → R in the class Θ is uniformly Lipschitz inthat(17) | θ ( r ) − θ ( r ) | ≤ k r − r k . Then, for every x ∈ X , (cid:12)(cid:12)(cid:12) θ (cid:0) η ( x ) + n − / h ( x ) (cid:1) − θ (cid:0) η ( x ) + n − / h ( x ) (cid:1)(cid:12)(cid:12)(cid:12) ≤ k h ( x ) − h ( x ) k√ n . The norm in the right side is bounded by the supremum norm k h − h k on ℓ ∞ ( X , R d ). It follows that the classes F n ( θ, h , δ ) as in (6) possess envelope functions(18) F n ( θ, h , δ ) = δ/ √ n. Theorem 3.1. If Θ is a suitably measurable collection of uniformly bounded, uni-formly Lipschitz functions θ : R d → R such that J (1 , Θ , L ) < ∞ (relative to aconstant envelope function), η ∈ ℓ ∞ ( X , R d ) , and the sequence √ n ( η n − η ) con-verges weakly in ℓ ∞ ( X , R d ) to a tight random element, then sup θ ∈ Θ (cid:12)(cid:12) G n (cid:0) θ ( η n ) − θ ( η ) (cid:1)(cid:12)(cid:12) → p . A. W. van der Vaart and J. A. Wellner
Proof.
With the envelope functions F n ( θ, h , δ ) as defined in (18), condition (ii) ofTheorem 2.3 is trivially satisfied because the envelopes are actually constants, andthe validity of condition (iii) is immediate.By assumption we can choose the envelope function of Θ equal to a constantand J (1 , Θ , L ) < ∞ . This suffices for the verification of most of the conditions ofLemma 3.1. Finally, it suffices to note that P (cid:0) θ ◦ ( η + n − / h ) − θ ◦ η (cid:1) ≤ k h k /n. By Lemma 3.1 we conclude that condition (i) of Theorem 2.3 is also satisfied,whence the theorem follows from Theorem 2.3.For the verification of condition (i) of Theorem 2.3 it suffices to consider thefunctions θ on the range of the functions η + h / √ n for a fixed h in the supportof the limit distribution of the sequence √ n ( η n − η ). Thus we may restrict thefunctions θ to a subset of R d that contains the ranges of these functions and in-terpret the condition J (1 , Θ , L ) < ∞ in Lemma 3.1 accordingly. In particular, inTheorem 2.3 we may replace this condition by the condition that J (1 , Θ K , L ) < ∞ for every norm-bounded subset K ⊂ R d , where Θ K is the collection of restrictions θ : K → R of the functions θ ∈ Θ.Any collection of uniformly bounded, Lipschitz functions θ : K → R on a com-pact interval K satisfies J (1 , Θ , L ) < ∞ . (Cf. e.g. van der Vaart and Wellner [12],page 157.) Thus in the case that d = 1 the assertion of the preceding theorem istrue for any collection of uniformly bounded Lipschitz functions.For d > C α ( K ) for a compact interval K ⊂ R d possesses a finite uniform entropy integral provided α > d/
2. (Cf. e.g. van der Vaartand Wellner [12], page 157.) The assertion of the preceding theorem is also true forsuch a class.There are many other examples of classes of Lipschitz functions with finite uni-form entropy integrals. For instance, VC-classes of Lipschitz functions. θ Assume that every function θ : R d → R in Θ is the survival function θ ( x ) = R [ x, ∞ ) dθ of a subprobability measure on R d . Then each θ is nonincreasing in eachof its arguments. If H = ℓ ∞ ( X , R d ) equipped with the uniform norm relative to themax-norm on R d , then (cid:12)(cid:12)(cid:12) θ ( η + n − / h ) − θ ( η + n − / h ) (cid:12)(cid:12)(cid:12) ≤ θ ( η + n − / h − n − / k h − h k ) − θ ( η + n − / h + n − / k h − h k ) . It follows that the classes F n ( θ, h , δ ) possess envelope functions, with δ the vector( δ, . . . , δ ), F n ( θ, h , δ ) = θ ( η + n − / h − n − / δ ) − θ ( η + n − / h + n − / δ ) . In order to verify condition (iii) of Theorem 2.3, we assume that for given (possiblyinfinite) a < b in R d and every δ n ↓ K ⊂ H ∪ { } ,(19) sup t ∈ R d ,a ≤ t ≤ b sup h ∈ K √ nP (cid:0) η + n − / h ≤ t + n − / δ n − η + n − / h ≤ t (cid:1) → . andom empirical processes Theorem 3.2.
Let Θ be a collection of survival functions θ : R d → [0 , of subprob-ability measures supported on an interval ( a, b ) ⊂ R d . If the sequence √ n ( η n − η ) converges in distribution in ℓ ∞ ( X , R d ) to a tight Borel measure concentrating on the σ -compact set H , and (19) holds for every δ n ↓ and every compact K ⊂ H ∪{ } ,then sup θ (cid:12)(cid:12) G n (cid:0) θ ( η n ) − θ ( η ) (cid:1)(cid:12)(cid:12) → p . Proof.
The survival functions of subprobability measures are in the convex hull ofthe set of indicator functions 1 [ t, ∞ ) , which is a VC-class. Therefore the entropyintegral J (1 , Θ , L ) of Θ relative to a constant envelope is finite. (Cf. e.g. van derVaart and Wellner [12], page 145.)Defining F n ( θ, η , δ ) as in the display preceding the theorem, we can write P F n ( θ, η , δ )= Z P (cid:0) ( −∞ ,s ] ( η + n − / h − n − / δ ) − ( −∞ ,s ] ( η + n − / h ) (cid:1) dθ ( s ) ≤ k θ k sup a ≤ s ≤ b P (cid:0) ( −∞ ,s ] ( η + n − / h − n − / δ ) − ( −∞ ,s ] ( η + n − / h ) (cid:1) . By assumption (19) the right side converges to zero faster than 1 / √ n , for every δ = δ n ↓
0, uniformly in h ∈ K , and uniformly in θ because the total variationnorms k θ k are uniformly bounded. This verifies condition (iii) of Theorem 2.3.Because θ is monotone with range contained in [0 , P (cid:0) θ ( η + n − / δ ) − θ ( η ) (cid:1) ≤ sup a ≤ s ≤ b P (cid:0) ( −∞ ,s ] ( η − n − / δ ) − ( −∞ ,s ] ( η ) (cid:1) . By assumption (19) with h = 0 this converges to zero faster than 1 / √ n for everysequence δ n ↓
0. This can be seen to imply that the expression in the display (whichdoes not have the leading √ n ) converges to zero also for fixed δ . By monotonicityof θ we can bound (cid:12)(cid:12) θ ( η + h / √ n ) − θ ( η ) (cid:12)(cid:12) by (cid:12)(cid:12) θ ( η − δ/ √ n ) − θ ( η ) (cid:12)(cid:12) for δ = k h k .By Lemma 3.1 we now conclude that condition (i) of Theorem 2.3 is satisfied.In the present case the envelope functions F n ( θ, h , δ ) are equal to the functions f θ,η + n − / h − f θ,η − n − / h for h = h − δ . Therefore, the validity of condition (ii)of Theorem 2.3 follows by the same arguments as used for the validity of condition(i).Condition (19) is a uniform Lipschitz condition on the distribution functions ofthe variables η ( X ) + h ( X ) / √ n . If the distribution of η ( X ) is smooth, then wemight expect that the distribution functions of the perturbed variables η ( X ) + h ( X ) / √ n will be smooth as well. However, this appears not to be true in general,and it will usually be necessary to exploit some information about the functions h .(We need to consider functions in the support of the limit measure of the sequence √ n ( η n − η ).) In this respect the conditions of Theorem 3.2 for composition withmonotone functions are much more stringent than the conditions of Theorem 3.1for the composition with Lipschitz functions.The condition (19) is in terms of the indicator functions 1 ( −∞ ,s ] , and would haveexactly the same form if we considered only indicator functions θ = 1 ( −∞ ,θ ] , ratherthan general monotone functions. Thus the restrictive condition is connected tostudying the classical empirical process.The following lemma allows the verification of condition (19) in many cases. Itwill also be used in the next section to prove applicability of the delta-method. Thelemma is similar to Lemma 5.1 of Ghoudi and R´emillard [6]. A. W. van der Vaart and J. A. Wellner
Lemma 3.2.
Suppose that
X, Y, Y t (with t > ) are real-valued random variableson a common probability space such that(i) X possesses a Lebesgue density f that is continuous in a neighbourhood of x ;(ii) k Y t − Y k ∞ → and k Y k ∞ < ∞ ;(iii) the conditional distribution of Y given X = s can be represented by a Markovkernel K ( s, · ) such that the map s K ( s, · ) is continuous at x for the weaktopology.Then for every continuous function g : R → R and every converging sequences x t → x , a t → a and ≤ b t → b , as t → , t E g ( Y t )1 x t First consider the case that Y t = Y for every t . By the definitions of K and f , we can write1 t E g ( Y )1 x t 0. Therefore1 t E (cid:12)(cid:12) g ( Y t ) − g ( Y ) (cid:12)(cid:12) x t 0. We conclude from this that1 t (cid:12)(cid:12) E g ( Y )(1 x t X, Y, Y t ) equal to (cid:0) η ( X ) , h ( X ) , h n ( X ) (cid:1) and t = 1 / √ n . Example 2, continued. Suppose that η n is the empirical distribution of a ran-dom sample from the cumulative distribution function η on R d . Then the limitdistribution of the sequence √ n ( η n − η ) is the d -dimensional η -Brownian sheeton R d .If d = 1, then the Brownian sheet is a Brownian bridge and can be representedas B ◦ η for B a standard Brownian bridge on the unit interval. A typical functionin the support of the limit distribution of the sequence √ n ( η n − η ) can be repre-sented as h = h ◦ η for some function h : [0 , → R . The conditional law of thevariable h ( X ) given η ( X ) = z is the Dirac measure at h ( z ). Because the standardBrownian bridge is continuous, the function h can be taken continuous and hencethe corresponding Markov kernels K ( z, · ) = δ h ( z ) ( · ) are weakly continuous in z , asrequired by the preceding lemma.If d > 1, then we can, without loss of generality, suppose that η is a distributionfunction on [0 , d with uniform marginal distributions (i.e. a copula function). Thenthe conditioning event η ( X ) = z will typically restrict X to a one-dimensionalcurve in [0 , d . Under sufficient smoothness of η , this curve will vary continuouslywith z , and under smoothness conditions on the law of X , the conditional distribu-tion of h ( X ) given η ( X ) = z for a continuous function h will vary continuouslyas well. Ghoudi and Remillard [6] give sufficient conditions for this continuity in anumber of examples.The preceding lemma can be extended to the case of multidimensional variables.For simplicity we only consider the two-dimensional case. Lemma 3.3. Suppose that X, Y, Y t (with t > ) are random variables in R definedon a common probability space such that A. W. van der Vaart and J. A. Wellner (i) X possesses a Lebesgue density f that has continuous conditional densities;(ii) k Y t − Y k ∞ → and k Y k ∞ < ∞ ;(iii) the conditional distribution of Y given X = s can be represented by a Markovkernel K ( s, · ) such that the map s K ( s, · ) is continuous at x for the weaktopology.Then for every continuous function g : R → R and every converging sequences x t → x , a t → a and b t → b > , as t → , t E g ( Y t )(1 X + ta t Y t ≤ x t + tb t − X + ta t Y t ≤ x t ) → b Z x −∞ Z g ( y ) K (cid:0) ( x , s ) , dy (cid:1) f ( x , s ) ds + b Z x −∞ Z g ( y ) K (cid:0) ( s , x ) , dy (cid:1) f ( s , x ) ds . Proof.