[PDF] Uniform Approximation of Vapnik-Chervonenkis Classes

Abstract

For any family of measurable sets in a probability space, we show that either (i) the family has infinite Vapnik-Chervonenkis (VC) dimension or (ii) for every epsilon > 0 there is a finite partition pi such the pi-boundary of each set has measure at most epsilon. Immediate corollaries include the fact that a family with finite VC dimension has finite bracketing numbers, and satisfies uniform laws of large numbers for every ergodic process. From these corollaries, we derive analogous results for VC major and VC graph families of functions.

Full PDF

aa r X i v : . [ m a t h . P R ] O c t Uniform Approximation of Vapnik-Chervonenkis Classes

Terrence M. Adams ∗ and Andrew B. Nobel † September 2010

Abstract

For any family of measurable sets in a probability space, we show that either (i) thefamily has inﬁnite Vapnik-Chervonenkis (VC) dimension or (ii) for every ǫ > π such the π -boundary of each set has measure at most ǫ . Immediatecorollaries include the fact that a family with ﬁnite VC dimension has ﬁnite bracketingnumbers, and satisﬁes uniform laws of large numbers for every ergodic process. Fromthese corollaries, we derive analogous results for VC major and VC graph families offunctions. ∗ Terrence Adams is with the Department of Defense, 9800 Savage Rd. Suite 6513, Ft. Meade, MD 20755 † Andrew Nobel is with the Department of Statistics and Operations Research, University of North Car-olina, Chapel Hill, NC 27599-3260. Email: [email protected] Introduction

Let ( X , S , µ ) be a probability space and let C ⊆ S be a given family of measurable sets.The Vapnik-Chervonenkis dimension of C is a measure of its combinatorial complexity,speciﬁcally, the ability of C to separate ﬁnite sets of points. Given a ﬁnite set D ⊆ X , let { C ∩ D : C ∈ C} be the collection of subsets of D selected by the members of C . Thefamily C is said to shatter D if its elements can select every subset of D , or equivalently, if |{ C ∩ D : C ∈ C}| = 2 | D | . Here and in what follows, | A | denotes the cardinality of a givenset A . The Vapnik-Chervonenkis (VC) dimension [17] of C , denoted dim( C ), is the largestinteger k such that C is able to shatter some set of cardinality k . If C can shatter arbitrarilylarge ﬁnite sets, then dim( C ) = + ∞ . A family of sets C is said to be a VC class if dim( C )is ﬁnite.Let π be a ﬁnite, measurable partition of X . For every set C ∈ C , the π -boundary of C ,denoted ∂ ( C : π ), is the union of all the cells in π that intersect both C and its complementwith positive probability. Formally, ∂ ( C : π ) = ∪ { A ∈ π : µ ( A ∩ C ) · µ ( A ∩ C c ) > } . Note that ∂ ( C : π ) depends on µ , though this dependence is suppressed in our notation. Wewill call a family C ﬁnitely approximable if for every ǫ > π of X such that µ ( ∂ ( C : π )) ≤ ǫ for every C ∈ C . Our principal result is thefollowing. Theorem 1.

Let ( X , S , µ ) be a probability space and let C ⊆ S be any family of sets. Theneither (i) C is ﬁnitely approximable or (ii) C has inﬁnite VC dimension. Theorem 1 extends immediately to ﬁnite positive measures; we restrict attention to thecase of probability measures for simplicity. Gaenssler and Stute [8] studied π -boundariesin work on uniform convergence of measures. In conjunction with Theorem 1, their resultsshow that, if for some VC-class C and some sequence { µ n } of ﬁnite measures, µ n ( A ) → µ ( A )for every A ∈ σ ( C ), then this convergence is uniform over C . One may establish the sameconclusion using Corollary 1.In general, alternatives (i) and (ii) of Theorem 1 are not mutually exclusive: there existfamilies C that are ﬁnitely approximable and have inﬁnite VC dimension. Moreover the ﬁniteapproximability of C will generally depend on the measure µ . To take a simple example,let C be the family of all Borel measurable subsets of the unit interval [0 , C clearly2as inﬁnite VC dimension. An easy argument shows that C is ﬁnitely approximable if µ has countable support, but that C is not ﬁnitely approximable if µ is absolutely continuouswith respect to Lebesgue measure. As the following, equivalent, version of Theorem 1makes clear, families with ﬁnite VC dimension are ﬁnitely approximable for any probabilitymeasure µ . Theorem 2.

Let ( X , S ) be a measurable space. If C ⊆ S has ﬁnite VC dimension, then C is ﬁnitely approximable for any probability measure µ . Families of sets with ﬁnite VC-dimension ﬁgure prominently in machine learning, em-pirical process theory and combinatorial geometry ( c.f. [11, 15, 6, 7, 16, 10]) and have beenwidely studied in these ﬁelds. The majority of this work concerns the combinatorial proper-ties of VC-classes, and related exponential probability inequalities for uniform laws of largenumbers under independent sampling (see Section 3 below). The uniform approximationguaranteed by Theorem 2 provides new insights into the structure of VC-classes.Some immediate corollaries of Theorem 2 are explored in Sections 2 and 3 below, in-cluding new results on the bracketing properties of VC major and VC graph classes offunctions. Approximation properties analogous to those of Theorems 1 and 2 may be es-tablished for classes of functions with ﬁnite fat-shattering (gap) dimension [9] by extendingthe arguments in Section 4.The proof of Theorem 1 makes use of an equivalent version of the VC dimension thatwe now describe. Recall that the join of k sets A , . . . , A k ⊆ X , denoted J = W ki =1 A i ,is the ﬁnite partition of X consisting of all non-empty intersections ˜ A ∩ · · · ∩ ˜ A k , where˜ A i ∈ { A i , A ci } for i = 1 , . . . , k . Equivalently, J consists of the non-empty atoms of the ﬁeldgenerated by A , . . . , A k . The collection A , . . . , A k ⊆ X is said to be Boolean independentif J has (maximal) cardinality 2 k . The dual VC dimension, denoted dim ∗ ( C ), is the largest k such that C contains k Boolean independent sets. If C contains Boolean independentfamilies of every ﬁnite size, then dim ∗ ( C ) = + ∞ . The dual VC-dimension was introducedby Assouad [4], and is so named because dim ∗ ( C ) is the VC-dimension of the dual family { D x : x ∈ X } ⊆ C , where D x = { C ∈ C : x ∈ C } . We will make use of the following,elementary result, whose proof can be found in [4], see also [10, 1]. Lemma A.

Let C be any collection of subsets of X . The VC-dimension dim( C ) is ﬁnite ifand only if the dual VC-dimension dim ∗ ( C ) is ﬁnite. In proving Theorem 1 we begin with the assumption that C is not ﬁnitely approximable,3nd then deduce from this that dim ∗ ( C ) = + ∞ . Speciﬁcally, we show that for every L ≥ C contains a sub-family of L Boolean independent sets. We note thatBoolean independence plays a related role in work of Rosenthal [12], who shows that if asequence of sets { C n : n ≥ } contains no pointwise convergent subsequence, then there isan inﬁnite subsequence C = { C n m : m ≥ } such that each ﬁnite subfamily of C is Booleanindependent.The construction of Boolean independent sets in Theorem 1 proceeds in stages. Ateach stage a splitting set is produced by means of a weak limit, and is then incorporatedin the construction of the splitting sets at subsequent stages. The resulting sequence ofsplitting sets is used to identify Boolean independent collections of arbitrary ﬁnite size. Asnoted by Ramon van Handel (private communication), the proof of Theorem 1 has points ofintersection with the construction of a critical set for product measures in Theorem 11-1-1of Talagrand [13], and with the notion of weakly dense sequences in ˇCech-complete spacesemployed by Bourgain, Fremlin, and Talagrand [5]. Essential diﬀerences emerge from anumber of factors, including our focus on ﬁnite approximation under a ﬁxed (but arbitrary)distribution in the absence of topological structure, as well as the recursive construction ofsplitting sets that is employed in the theorem. The next two sections are devoted to corollaries of Theorem 1 to families of sets and functionswith bounded combinatorial complexity. In Section 2 we establish that VC classes of setshave ﬁnite bracketing numbers, and deduce similar results for VC major and VC graphfamilies of functions. In Section 3 we show that VC classes satisfy uniform laws of largenumbers for every ergodic process. The proof of Theorem 1 is presented in Section 4.

Let F be a family of measurable functions f : X → R . We recall some basic deﬁnitionsfrom the theory of empirical processes. A measurable function F : X → [0 , ∞ ) is said tobe an envelope for F if | f ( x ) | ≤ F ( x ) for each x ∈ X and f ∈ F . The family F is said tobe separable if there is a countable sub-family F ⊆ F such that each function f ∈ F isa pointwise limit of a sequence of functions in F . For each pair of measurable functions g, h : X → R with g ≤ h , the bracket [ g, h ] denotes the set of all measurable functions f such that g ≤ f ≤ h pointwise on X . In particular, [ g, h ] is said to be an ǫ -bracket if4 ( h − g ) dµ ≤ ǫ . For ǫ >

0, the bracketing number N [ ] ( ǫ, F , µ ) of F is the least number of ǫ -brackets needed to cover F . In general, the functions deﬁning the minimal brackets neednot be elements of F . Let a measure µ and family C ⊆ S be ﬁxed. The notions of separability and bracketing maybe applied to C if we regard its elements as indicator functions. In this case we may assume,without loss of generality, that the lower and upper limits of each bracket are themselvesindicator functions. Corollary 1. If C is a separable VC-class, then N [ ] ( ǫ, C , µ ) is ﬁnite for every ǫ > . Proof:

By routine arguments, we may assume that C is countable. Fix ǫ >

0. Let π = { A , . . . , A m } be a ﬁnite measurable partition of X such that µ ( ∂ ( C : π )) < ǫ for every C ∈ C , and assume without loss of generality that each cell of π has positive µ -measure.For each C ∈ C , remove all points in C from A j if µ ( A j ∩ C ) = 0, and remove all pointsin C c from A j if µ ( A j ∩ C c ) = 0. Denote the resulting set by B j . Clearly B j ⊆ A j and µ ( A j \ B j ) = 0 as C is countable. The deﬁnition of B j ensures that for each C ∈ C exactlyone of the following relations holds: B j ⊆ C , B j ⊆ C c , or µ ( B j ∩ C ) · µ ( B j ∩ C c ) > B = X \ ∪ mj =1 B j , and deﬁne the partition π ′ = { B , B , . . . , B m } . Given C ∈ C let C l = ∪{ B ∈ π ′ : B ⊆ C } and C u = ∪{ B ∈ π ′ : B ∩ C = ∅} . A straightforward argumentshows that C l ⊆ C ⊆ C u , and that µ ( C u \ C l ) = µ ( ∂ ( C : π ′ )) = µ ( ∂ ( C : π )) < ǫ . It followsthat Θ = { [ C l , C u ] : C ∈ C} is a collection of ǫ -brackets covering C . The cardinality of Θ isat most 2 | π ′ | . Let F be a family of measurable functions f : X → R with envelope F . For f ∈ F and α ∈ R let L f ( α ) = { x : f ( x ) ≤ α } be the α -level set of f . Deﬁne C α = { L f ( α ) : f ∈ F } to be the family of α -level sets associated with functions in F . Proposition 1.

Suppose that dim( C α ) < ∞ for every α ∈ R . If µ is any probability measureon ( X , S ) such that R F dµ < ∞ , then N [ ] ( ǫ, F , µ ) < ∞ for every ǫ > . roof: Suppose ﬁrst that F is bounded, with constant envelope M < ∞ . Fix ǫ > K be an integer such that 2 M/K ≤ ǫ . For each f ∈ F deﬁne the approximation˜ f ( x ) = M − MK K X j =1 I ( x ∈ L f ( α j )) with α j = M − M jK .

The choice of M and K ensure that ˜ f ( x ) − ǫ ≤ f ( x ) ≤ ˜ f ( x ) for each x ∈ X . The dimensionof C α j is ﬁnite by assumption, and it then follows from Corollary 1 that there is a ﬁnitecollection Θ j of ǫ/ M -brackets that covers the level sets { L f ( α j ) : f ∈ F } . For each f ∈ F let [ g jf , h jf ] be a bracket in Θ j containing L f ( α j ). With this identiﬁcation, deﬁne upper andlower approximations of f as follows:˜ f l = M − MK K X j =1 h jf ( x ) − ǫ and ˜ f u = M − MK K X j =1 g jf ( x )An easy argument shows that ˜ f l ≤ f ≤ ˜ f u , and the family of brackets Θ = { [ ˜ f l , ˜ f u ] : f ∈ F } is ﬁnite, as | Θ | ≤ Π Kj =1 | Θ j | . Moreover,˜ f u − ˜ f l ≤ MK K X j =1 ( h jf ( x ) − g jf ( x )) + ǫ, and therefore R ( ˜ f u − ˜ f l ) dµ ≤ ǫ . Thus Θ is a ﬁnite family of 2 ǫ -brackets covering F .Suppose now that F has an envelope F such that R F dµ < ∞ . Given ǫ > M < ∞ besuch that R F >M

F dµ < ǫ . For each f ∈ F deﬁne the truncation f M ( x ) = ( f ( x ) ∨ − M ) ∧ M ,and let F M = { f M : f ∈ F } . By the preceding argument, there is a ﬁnite family Θ of ǫ -brackets covering F M . Let [ g, h ] be an element of Θ; without loss of generality, we mayassume that | g | , | h | ≤ M . Deﬁne g ′ = g ∧ ( − F I ( F > M )) and h ′ = h ∨ ( F I ( F > M ))and note that g ′ ≤ g ≤ h ≤ h ′ . Moreover, f M ∈ [ g, h ] implies f ∈ [ g ′ , h ′ ], so the ﬁnite familyof brackets { [ g ′ , h ′ ] : [ g, h ] ∈ Θ } covers F . It is easy to see that h ′ − g ′ = ( h − g ) I ( F ≤ M ) + 2 F I ( F > M ) , and therefore R ( h ′ − g ′ ) dµ ≤ R ( h − g ) dµ + 2 R F >M

F dµ ≤ ǫ. Let F be a family of measurable functions f : X → R with envelope F ( x ). The graph of f ∈ F is deﬁned by G f = { ( x, s ) : x ∈ X and 0 ≤ s ≤ f ( x ) or f ( x ) ≤ s ≤ } ⊆ X × R . G ( F ) = { G f : f ∈ F } be the family of graphs of functions in F . Proposition 2.

Suppose that dim( G ( F )) < ∞ . If µ is any probability measure on ( X , S ) such that R F dµ < ∞ , then N [ ] ( ǫ, F , µ ) < ∞ for each ǫ > . Proof:

Suppose ﬁrst that F is bounded, with constant envelope M < ∞ . The ﬁniteness ofthe bracketing numbers is not aﬀected if we replace each function f ∈ F by ( f + M ) / M ,and we therefore assume that every f ∈ F takes values in [0 , G f = { ( x, s ) : x ∈ X and 0 ≤ s ≤ f ( x ) ≤ } ⊆ X × [0 , . Let λ ( · ) denote Lebesgue measure on the Borel subsets B of [0 , ν = µ ⊗ λ on ( X × [0 , , S ⊗ B ).Fix ǫ >

0. As G ( F ) has ﬁnite VC dimension, Corollary 1 ensures that G ( F ) is coveredby a ﬁnite collection Θ of ǫ -brackets. Without loss of generality, we may represent thebrackets in Θ in the form [ A, B ], where

A, B ∈ S ⊗ B and A ⊆ B . Let [ A, B ] be a bracketin Θ. For each x ∈ X deﬁne g ( x ) = ess-sup( { s : ( x, s ) ∈ A } ) and h ( x ) = ess-inf( { s : ( x, s ) ∈ B c } ) , where for U ⊆ [0 ,

1] the essential supremum ess-sup( U ) = inf { α : µ ( U ∩ [0 , α ]) = µ ( U ) } , andess-inf( U ) is deﬁned analogously. Routine arguments shows that g and h are measurable,that g ≤ h , and that ν ( A \ G g ) = ν ( B c \ G ch ) = 0. Moreover, for every function f : X → [0 , G f ∈ [ A, B ] implies G f ∈ [ G g , G h ], which implies in turn that g ≤ f ≤ h .It follows from the arguments above that the ﬁnite family Θ of brackets [ g, h ] derivedfrom the elements of Θ covers F . In order to assess the size of these brackets, note that( G h \ G g ) x = { s : ( x, s ) ∈ G h \ G g } = { s : g ( x ) < s ≤ h ( x ) } and therefore by Fubini’s theorem Z ( h ( x ) − g ( x )) dµ ( x ) = Z λ (( G h \ G g ) x ) dµ ( x ) = ν ( G h \ G g ) ≤ ν ( B \ A ) ≤ ǫ. Thus every element [ g, h ] of Θ is an ǫ -bracket under µ .The argument for an unbounded family F with an integrable envelope F is similar tothat for VC Major families. Given ǫ > M < ∞ be such that R F >M

F dµ < ǫ . For each f ∈ F deﬁne the truncation f M ( x ) = ( f ( x ) ∨ − M ) ∧ M , and let F M = { f M : f ∈ F } . As G f M = G f ∩ ( X × [ − M, M ]), it is easy to see that the dimension of G ( F M ) is no greater than7hat of G ( F ), and is therefore ﬁnite. The preceding argument shows that there is a ﬁnitecollection of ǫ -brackets covering F M , and these can be extended to 3 ǫ -brackets covering F following the proof of Proposition 1. Let X = X , X , . . . be a stationary ergodic process taking values in ( X , S ). The ergodictheorem ensures that for every measurable set C the sample averages n − P ni =1 I C ( X i )converge almost surely to P ( X ∈ C ). A family C ⊆ S satisﬁes a uniform laws of largenumbers with respect to X if the discrepancy∆ n ( C : X ) = sup C ∈C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 I C ( X i ) − P ( X ∈ C ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) tends to zero almost surely as n tends to inﬁnity, so that the relative frequencies of sets in C converge uniformly to their limiting probabilities.For i.i.d. processes X , Vapnik and Chervonenkis [17] gave necessary and suﬃcient condi-tions under which ∆ n ( C : X ) →

0. For VC-classes they established exponential inequalitiesof the form P (∆ n ( C : X ) > t ) ≤ a · n dim( C ) · exp {− bt } , where a, b are positive constants inde-pendent of X and C . Consequently, VC classes have uniform laws of large numbers for anyi.i.d. process. Talagrand [14] provided necessary and suﬃcient conditions for uniform lawsof large numbers that strengthen those of [17]: for non-atomic distributions, ∆ n ( C : X ) A ∈ S with P ( A ) > C shattersevery ﬁnite subset of { X i : X i ∈ A } .Using the bracketing properties of VC classes established in the previous section one mayimmediately extend this result to the general ergodic case. The following theorem appearsin Adams and Nobel [1] (under an additional Polish assumption), where there is also adiscussion of related work on uniform laws of large numbers under dependent sampling. Theorem 3. If C is a separable VC-class of sets and X is a stationary ergodic process,then ∆ n ( C : X ) → almost surely as n tends to inﬁnity. Proof:

The stated convergence follows easily from Corollary 1 and standard arguments forthe Blum DeHardt law of large numbers ( c.f. [15, 7]).One may establish uniform laws of large numbers for separable VC major and VC graphclasses of functions in the general ergodic case using the bracketing results in Propositions1 and 2, respectively. In [1] these results are derived directly from Theorem 3. Related8ork for families of functions, under a more general, scale speciﬁc, notion of dimension canbe found in [2].

In the case where X is a complete separable metric space and S is the Borel subsets of X ,one may prove Theorem 1 using arguments similar to those used in [1] to establish uniformlaws of large numbers for VC classes under ergodic sampling. The details can be found inan earlier version [3] of the results presented here. Below we provide a simpler argumentthat does not require the Polish assumption. The new argument, which follows the outlineof the proof in [1], employs several simpliﬁcations and improvements that were suggestedby an anonymous referee of [1], in particular, the use of Hilbert space weak limits in thedeﬁnition of splitting sets. It follows from standard results on the L p -covering numbers of VC classes (for exam-ple, Theorem 2.6.4 of [15]) that there exists a countable sub-family C of C such thatinf C ′ ∈C µ ( C ′ △ C ) = 0 for each C ∈ C . An elementary argument then shows thatsup C ∈C µ ( ∂ ( C : π )) = sup C ∈C µ ( ∂ ( C : π ))for every ﬁnite partition π , and we may therefore assume that C is countable. Let C = { C , C , . . . } and let S = σ ( C ) ⊆ S be the sigma ﬁeld generated by C . Suppose that theuniform approximation property fails to hold for C , that is, there exists a number η > C ∈C λ ( ∂ ( C : π )) > η for every ﬁnite measurable partition π. (1)Using the inequality (1) we construct a sequence of “splitting sets” S , S , . . . ⊆ X from thesets in C in a stage-wise fashion. At the k th stage the splitting set S k is obtained from asequential procedure that makes use of the splitting sets S , . . . , S k − produced at previousstages. The splitting sets are used to identify arbitrarily large ﬁnite collections of sets in C having full join. The existence of these collections implies that C has inﬁnite VC dimensionby Lemma A. First stage.

Deﬁne the reﬁning sequence of joins J ( n ) = C ∨ · · · ∨ C n for n ≥

1. It followsfrom (1) that for each n there is a set C ( n ) ∈ C whose boundary G ( n ) = ∂ ( C ( n ) : J ( n ))9as measure greater than η . Note that the sets { G ( n ) : n ≥ } are measurable S .By standard results in functional analysis, there exists a subsequence { n m } and an S -measurable function h such that R g I G ( n m ) dµ → R g h dµ as m tends to inﬁnity for every g ∈ L ( X , S , µ ). (The function h is the weak limit of the indicator functions I G ( n m ) .)It follows that 0 ≤ h ≤ R h dµ ≥ η . Deﬁne the splitting set S = { h > } and note that µ ( S ) ≥ η .For simplicity, let J ( m ), C ( m ), and G ( m ) denote, respectively, the quantities J ( n m ), C ( n m ), and G ( n m ) along the subsequence deﬁning h . We adopt similar notation forsubsequences encountered at subsequent stages. Subsequent stages.

Suppose now that we have constructed splitting sets S j at stages j = 1 , . . . , k −

1, and wish to construct the splitting set S k at stage k . Begin by deﬁning thereﬁning sequence of joins J k ( n ) = S ∨ · · · ∨ S k − ∨ C ∨ · · · ∨ C n for n ≥

1. It follows from(1) that for each n there is a set C k ( n ) ∈ C whose boundary G k ( n ) = ∂ ( C k ( n ) : J k ( n )) hasmeasure greater than η . Proceeding as in Stage 1, there is a subsequence { I G k ( m ) } havinga weak limit h k ∈ L ( X , S , µ ) such that 0 ≤ h k ≤ R h k dµ ≥ η . Deﬁnethe splitting set S k = { h k > } and note that µ ( S k ) ≥ η . Construction of Full Joins.

Fix an integer L ≥

1. As the measure of each splitting set S k is at least η , there exist positive integers k < k < . . . < k L +1 such that µ ( T L +1 j =1 S k j ) > k j = j . For l = 1 , . . . , L + 1deﬁne Q l = l \ j =1 S j In what follows we will make repeated use of the elementary fact that R B ( h · · · h l ) dµ > µ ( B ∩ Q l ) > D , . . . , D L ∈ C such that for each l = 1 , . . . , L , Z B ( h · · · h l ) dµ > B ∈ D l ∨ · · · ∨ D L . (2)The inequalities (2) are established by reverse induction, beginning with the case l = L . Tothis end, note that0 < Z ( h · · · h L +1 ) dµ = lim m →∞ Z ( h · · · h L ) I G L +1 ( m ) dµ, and therefore µ ( Q L ∩ G L +1 ( m )) > m suﬃciently large. Fix such an m and let D = C L +1 ( m ). It follows from the deﬁnition of G L +1 ( m ) that for some cell A ∈ J L +1 ( m ), µ ( Q L ∩ A ) > µ ( A ∩ D ) · µ ( A ∩ D c ) > . (3)10he inclusion of the sets S , . . . , S L in the deﬁnition of the joins J L +1 ( n ) ensures that Q L isa ﬁnite union of cells of J L +1 ( m ). The ﬁrst relation in (3) then implies that A is necessarilya subset of Q L , and it follows from the second relation that µ ( Q L ∩ D ) · µ ( Q L ∩ D c ) > D L = D the last inequality implies (2) in the case l = L .Suppose now that for some 1 < l < L we have identiﬁed sets D l , D l +1 , . . . , D L such that(2) holds. Then for each cell B in the join D l ∨ · · · ∨ D L ,0 < Z B ( h · · · h l ) dµ = lim m →∞ Z B ( h · · · h l − ) I G l ( m ) dµ. Therefore, there exists an integer m such that µ ( B ∩ Q l − ∩ G l ( m )) > B ∈ D l ∨ · · · ∨ D L . As the join J l ( m ) includes the ﬁrst n m elements of C , by enlarging m ifnecessary we may assume that J l ( m ) includes D l , . . . , D L . Let D = C l ( m ) and let B beany cell of D l ∨ · · · ∨ D L . The deﬁnition of G l ( m ) implies that for some cell A ∈ J l ( m ), µ ( B ∩ Q l − ∩ A ) > µ ( A ∩ D ) · µ ( A ∩ D c ) > . (4)Both Q l − and B are equal to a union of cells of the partition J ( m ), so the ﬁrst relation in(4) implies that A ⊆ B ∩ Q l − , and it then follows from the second relation that µ ( B ∩ Q l − ∩ D ) and µ ( B ∩ Q l − ∩ D c ) are positive. As these inequalities hold for each B ∈ D l ∨ · · · ∨ D L ,we have R B ′ ( h · · · h l − ) dµ > B ′ ∈ D ∨ D l ∨· · ·∨ D L . Letting D l − = D completesthe induction.It follows from (2) that the sets D , . . . , D L have full join, and as L ≥ C has inﬁnite VC dimension, which completes the proof of thetheorem. Remark:

An inspection of the proof shows that the approximating partitions π in thetheorem can be taken to be measurable σ ( C ). A simple counterexample shows that π maynot be chosen from the smaller family S ∞ n =1 σ ( C ∨ C ∨ . . . ∨ C n ). Let X = [0 ,

1] and let µ be Lebesgue measure. Let a , a , . . . be a sequence of positive real numbers such that s = P ∞ n =1 a n <

1. Deﬁne s = 0 and s n = P ni =1 a i for n ≥

1, and let C n = [ s n − , s n ).Clearly, the VC-dimension of the class { C , C , . . . } equals 1, since the sets are disjoint.Deﬁne J n = C ∨ C ∨ . . . ∨ C n . Then the set A n = [ s n ,

1] is a single element in J n withmeasure 1 − s n > − s >

0. Moreover, both A n ∩ C n +1 and A n ∩ C ′ n +1 have positivemeasure. Thus, for n ≥ A n ⊆ ∂ ( C n +1 : G n ) and µ ( ∂ ( C n +1 : G n )) > − s . Acknowledgements

The authors are indebted to an anonymous referee of the earlier paper [1] who suggested11he general form of Theorem 2, and whose detailed comments led to a simpler and moregeneral proof. The authors would also like to acknowledge helpful discussions with Ramonvan Handel, who provided feedback on an earlier version of this work [3], and who broughtthe papers [5, 12, 13] to our attention. The work presented in this paper was supported inpart by NSF grant DMS-0907177.

References [1]

Adams, T.M. and

Nobel, A.B. (2010) Uniform convergence of Vapnik-Chervonenkisclasses under ergodic sampling.

Annals of Probability (4)1345-1367.[2] Adams, T.M. and

Nobel, A.B. (2010) The gap dimension and uniform laws of largenumbers for ergodic processes. Preprint. arXiv:1007.2964v1[3]

Adams, T.M. and

Nobel, A.B. (2010) Uniform approximation and bracketing prop-erties of VC classes. Preprint. arXiv1007.4037v1[4]

Assouad, P. (1983) Densit´e et dimension.

Annales de l’Institut Fourier (3) 233-282.MR0723955 (86j:05022)[5] Bourgain, J. and

Fremlin, D.H. and

Talagrand, M. (1978) Pointwise compactsets of Baire measurable functions.

American Journal of Mathematics

Devroye, L. and

Gy¨orfi, L. and

Lugosi, G. (1996)

A probabilistic theory of patternrecognition . Springer. MR1383093 (97d:68196)[7]

Dudley, R.M. (1999)

Uniform Central Limit Theorems.

Cambridge University Press,Cambridge. MR1720712 (2000k:60050)[8]

Gaenssler, P. and

Stute, W. (1976) On uniform convergence of measures withapplications to uniform convergence of empirical distributions.

Empirical distributionsand processes (Selected Papers, Meeting on Math. Stochastics, Oberwolfach, 1976) . MR0433534 (55

Kearns, M.J. and

Schapire, R.E. (1994) Eﬃcient distribution-free learning of prob-abilistic concepts.

Journal of Computer and System Sciences

Matousek, J. (2002) Lectures on Discrete Geometry.

Graduate Texts in Mathematics

Springer, New York. MR1899299 (2003f:52011)1211]

Pollard, D. (1984)

Convergence of Stochastic Processes

Springer, New York.MR0762984 (86i:60074)[12]

Rosenthal, H.P. (1974) A characterization of Banach spaces containing l1

Proceed-ings of the National Academy of Sciences U.S.A. Talagrand, M. (1984)

Pettis integral and measure theory

Memoirs of the AmericanMathematics Society (307). MR0756174 (86j:46042)[14] Talagrand, M. (1987) The Glivenko-Cantelli problem.

Annals of Probability van der Vaart, A.W. and

Wellner, J.A. (1996)

Weak Convergence and EmpiricalProcesses . Springer-Verlag, New York. MR1385671 (97g:60035)[16]

Vapnik, V.N. (2000)

The nature of statistical learning theory . Second edition.Springer-Verlag, New York. MR1719582 (2001c:68110)[17]

Vapnik, V.N. and

Chervonenkis, A.Ya. (1971) On the uniform convergence of rela-tive frequencies of events to their probabilities.

Theory of Probability and its Applications16