Uniform Approximation of Vapnik-Chervonenkis Classes
aa r X i v : . [ m a t h . P R ] O c t Uniform Approximation of Vapnik-Chervonenkis Classes
Terrence M. Adams ∗ and Andrew B. Nobel † September 2010
Abstract
For any family of measurable sets in a probability space, we show that either (i) thefamily has infinite Vapnik-Chervonenkis (VC) dimension or (ii) for every ǫ > π such the π -boundary of each set has measure at most ǫ . Immediatecorollaries include the fact that a family with finite VC dimension has finite bracketingnumbers, and satisfies uniform laws of large numbers for every ergodic process. Fromthese corollaries, we derive analogous results for VC major and VC graph families offunctions. ∗ Terrence Adams is with the Department of Defense, 9800 Savage Rd. Suite 6513, Ft. Meade, MD 20755 † Andrew Nobel is with the Department of Statistics and Operations Research, University of North Car-olina, Chapel Hill, NC 27599-3260. Email: [email protected] Introduction
Let ( X , S , µ ) be a probability space and let C ⊆ S be a given family of measurable sets.The Vapnik-Chervonenkis dimension of C is a measure of its combinatorial complexity,specifically, the ability of C to separate finite sets of points. Given a finite set D ⊆ X , let { C ∩ D : C ∈ C} be the collection of subsets of D selected by the members of C . Thefamily C is said to shatter D if its elements can select every subset of D , or equivalently, if |{ C ∩ D : C ∈ C}| = 2 | D | . Here and in what follows, | A | denotes the cardinality of a givenset A . The Vapnik-Chervonenkis (VC) dimension [17] of C , denoted dim( C ), is the largestinteger k such that C is able to shatter some set of cardinality k . If C can shatter arbitrarilylarge finite sets, then dim( C ) = + ∞ . A family of sets C is said to be a VC class if dim( C )is finite.Let π be a finite, measurable partition of X . For every set C ∈ C , the π -boundary of C ,denoted ∂ ( C : π ), is the union of all the cells in π that intersect both C and its complementwith positive probability. Formally, ∂ ( C : π ) = ∪ { A ∈ π : µ ( A ∩ C ) · µ ( A ∩ C c ) > } . Note that ∂ ( C : π ) depends on µ , though this dependence is suppressed in our notation. Wewill call a family C finitely approximable if for every ǫ > π of X such that µ ( ∂ ( C : π )) ≤ ǫ for every C ∈ C . Our principal result is thefollowing. Theorem 1.
Let ( X , S , µ ) be a probability space and let C ⊆ S be any family of sets. Theneither (i) C is finitely approximable or (ii) C has infinite VC dimension. Theorem 1 extends immediately to finite positive measures; we restrict attention to thecase of probability measures for simplicity. Gaenssler and Stute [8] studied π -boundariesin work on uniform convergence of measures. In conjunction with Theorem 1, their resultsshow that, if for some VC-class C and some sequence { µ n } of finite measures, µ n ( A ) → µ ( A )for every A ∈ σ ( C ), then this convergence is uniform over C . One may establish the sameconclusion using Corollary 1.In general, alternatives (i) and (ii) of Theorem 1 are not mutually exclusive: there existfamilies C that are finitely approximable and have infinite VC dimension. Moreover the finiteapproximability of C will generally depend on the measure µ . To take a simple example,let C be the family of all Borel measurable subsets of the unit interval [0 , C clearly2as infinite VC dimension. An easy argument shows that C is finitely approximable if µ has countable support, but that C is not finitely approximable if µ is absolutely continuouswith respect to Lebesgue measure. As the following, equivalent, version of Theorem 1makes clear, families with finite VC dimension are finitely approximable for any probabilitymeasure µ . Theorem 2.
Let ( X , S ) be a measurable space. If C ⊆ S has finite VC dimension, then C is finitely approximable for any probability measure µ . Families of sets with finite VC-dimension figure prominently in machine learning, em-pirical process theory and combinatorial geometry ( c.f. [11, 15, 6, 7, 16, 10]) and have beenwidely studied in these fields. The majority of this work concerns the combinatorial proper-ties of VC-classes, and related exponential probability inequalities for uniform laws of largenumbers under independent sampling (see Section 3 below). The uniform approximationguaranteed by Theorem 2 provides new insights into the structure of VC-classes.Some immediate corollaries of Theorem 2 are explored in Sections 2 and 3 below, in-cluding new results on the bracketing properties of VC major and VC graph classes offunctions. Approximation properties analogous to those of Theorems 1 and 2 may be es-tablished for classes of functions with finite fat-shattering (gap) dimension [9] by extendingthe arguments in Section 4.The proof of Theorem 1 makes use of an equivalent version of the VC dimension thatwe now describe. Recall that the join of k sets A , . . . , A k ⊆ X , denoted J = W ki =1 A i ,is the finite partition of X consisting of all non-empty intersections ˜ A ∩ · · · ∩ ˜ A k , where˜ A i ∈ { A i , A ci } for i = 1 , . . . , k . Equivalently, J consists of the non-empty atoms of the fieldgenerated by A , . . . , A k . The collection A , . . . , A k ⊆ X is said to be Boolean independentif J has (maximal) cardinality 2 k . The dual VC dimension, denoted dim ∗ ( C ), is the largest k such that C contains k Boolean independent sets. If C contains Boolean independentfamilies of every finite size, then dim ∗ ( C ) = + ∞ . The dual VC-dimension was introducedby Assouad [4], and is so named because dim ∗ ( C ) is the VC-dimension of the dual family { D x : x ∈ X } ⊆ C , where D x = { C ∈ C : x ∈ C } . We will make use of the following,elementary result, whose proof can be found in [4], see also [10, 1]. Lemma A.
Let C be any collection of subsets of X . The VC-dimension dim( C ) is finite ifand only if the dual VC-dimension dim ∗ ( C ) is finite. In proving Theorem 1 we begin with the assumption that C is not finitely approximable,3nd then deduce from this that dim ∗ ( C ) = + ∞ . Specifically, we show that for every L ≥ C contains a sub-family of L Boolean independent sets. We note thatBoolean independence plays a related role in work of Rosenthal [12], who shows that if asequence of sets { C n : n ≥ } contains no pointwise convergent subsequence, then there isan infinite subsequence C = { C n m : m ≥ } such that each finite subfamily of C is Booleanindependent.The construction of Boolean independent sets in Theorem 1 proceeds in stages. Ateach stage a splitting set is produced by means of a weak limit, and is then incorporatedin the construction of the splitting sets at subsequent stages. The resulting sequence ofsplitting sets is used to identify Boolean independent collections of arbitrary finite size. Asnoted by Ramon van Handel (private communication), the proof of Theorem 1 has points ofintersection with the construction of a critical set for product measures in Theorem 11-1-1of Talagrand [13], and with the notion of weakly dense sequences in ˇCech-complete spacesemployed by Bourgain, Fremlin, and Talagrand [5]. Essential differences emerge from anumber of factors, including our focus on finite approximation under a fixed (but arbitrary)distribution in the absence of topological structure, as well as the recursive construction ofsplitting sets that is employed in the theorem. The next two sections are devoted to corollaries of Theorem 1 to families of sets and functionswith bounded combinatorial complexity. In Section 2 we establish that VC classes of setshave finite bracketing numbers, and deduce similar results for VC major and VC graphfamilies of functions. In Section 3 we show that VC classes satisfy uniform laws of largenumbers for every ergodic process. The proof of Theorem 1 is presented in Section 4.
Let F be a family of measurable functions f : X → R . We recall some basic definitionsfrom the theory of empirical processes. A measurable function F : X → [0 , ∞ ) is said tobe an envelope for F if | f ( x ) | ≤ F ( x ) for each x ∈ X and f ∈ F . The family F is said tobe separable if there is a countable sub-family F ⊆ F such that each function f ∈ F isa pointwise limit of a sequence of functions in F . For each pair of measurable functions g, h : X → R with g ≤ h , the bracket [ g, h ] denotes the set of all measurable functions f such that g ≤ f ≤ h pointwise on X . In particular, [ g, h ] is said to be an ǫ -bracket if4 ( h − g ) dµ ≤ ǫ . For ǫ >
0, the bracketing number N [ ] ( ǫ, F , µ ) of F is the least number of ǫ -brackets needed to cover F . In general, the functions defining the minimal brackets neednot be elements of F . Let a measure µ and family C ⊆ S be fixed. The notions of separability and bracketing maybe applied to C if we regard its elements as indicator functions. In this case we may assume,without loss of generality, that the lower and upper limits of each bracket are themselvesindicator functions. Corollary 1. If C is a separable VC-class, then N [ ] ( ǫ, C , µ ) is finite for every ǫ > . Proof:
By routine arguments, we may assume that C is countable. Fix ǫ >
0. Let π = { A , . . . , A m } be a finite measurable partition of X such that µ ( ∂ ( C : π )) < ǫ for every C ∈ C , and assume without loss of generality that each cell of π has positive µ -measure.For each C ∈ C , remove all points in C from A j if µ ( A j ∩ C ) = 0, and remove all pointsin C c from A j if µ ( A j ∩ C c ) = 0. Denote the resulting set by B j . Clearly B j ⊆ A j and µ ( A j \ B j ) = 0 as C is countable. The definition of B j ensures that for each C ∈ C exactlyone of the following relations holds: B j ⊆ C , B j ⊆ C c , or µ ( B j ∩ C ) · µ ( B j ∩ C c ) > B = X \ ∪ mj =1 B j , and define the partition π ′ = { B , B , . . . , B m } . Given C ∈ C let C l = ∪{ B ∈ π ′ : B ⊆ C } and C u = ∪{ B ∈ π ′ : B ∩ C = ∅} . A straightforward argumentshows that C l ⊆ C ⊆ C u , and that µ ( C u \ C l ) = µ ( ∂ ( C : π ′ )) = µ ( ∂ ( C : π )) < ǫ . It followsthat Θ = { [ C l , C u ] : C ∈ C} is a collection of ǫ -brackets covering C . The cardinality of Θ isat most 2 | π ′ | . Let F be a family of measurable functions f : X → R with envelope F . For f ∈ F and α ∈ R let L f ( α ) = { x : f ( x ) ≤ α } be the α -level set of f . Define C α = { L f ( α ) : f ∈ F } to be the family of α -level sets associated with functions in F . Proposition 1.
Suppose that dim( C α ) < ∞ for every α ∈ R . If µ is any probability measureon ( X , S ) such that R F dµ < ∞ , then N [ ] ( ǫ, F , µ ) < ∞ for every ǫ > . roof: Suppose first that F is bounded, with constant envelope M < ∞ . Fix ǫ > K be an integer such that 2 M/K ≤ ǫ . For each f ∈ F define the approximation˜ f ( x ) = M − MK K X j =1 I ( x ∈ L f ( α j )) with α j = M − M jK .
The choice of M and K ensure that ˜ f ( x ) − ǫ ≤ f ( x ) ≤ ˜ f ( x ) for each x ∈ X . The dimensionof C α j is finite by assumption, and it then follows from Corollary 1 that there is a finitecollection Θ j of ǫ/ M -brackets that covers the level sets { L f ( α j ) : f ∈ F } . For each f ∈ F let [ g jf , h jf ] be a bracket in Θ j containing L f ( α j ). With this identification, define upper andlower approximations of f as follows:˜ f l = M − MK K X j =1 h jf ( x ) − ǫ and ˜ f u = M − MK K X j =1 g jf ( x )An easy argument shows that ˜ f l ≤ f ≤ ˜ f u , and the family of brackets Θ = { [ ˜ f l , ˜ f u ] : f ∈ F } is finite, as | Θ | ≤ Π Kj =1 | Θ j | . Moreover,˜ f u − ˜ f l ≤ MK K X j =1 ( h jf ( x ) − g jf ( x )) + ǫ, and therefore R ( ˜ f u − ˜ f l ) dµ ≤ ǫ . Thus Θ is a finite family of 2 ǫ -brackets covering F .Suppose now that F has an envelope F such that R F dµ < ∞ . Given ǫ > M < ∞ besuch that R F >M
F dµ < ǫ . For each f ∈ F define the truncation f M ( x ) = ( f ( x ) ∨ − M ) ∧ M ,and let F M = { f M : f ∈ F } . By the preceding argument, there is a finite family Θ of ǫ -brackets covering F M . Let [ g, h ] be an element of Θ; without loss of generality, we mayassume that | g | , | h | ≤ M . Define g ′ = g ∧ ( − F I ( F > M )) and h ′ = h ∨ ( F I ( F > M ))and note that g ′ ≤ g ≤ h ≤ h ′ . Moreover, f M ∈ [ g, h ] implies f ∈ [ g ′ , h ′ ], so the finite familyof brackets { [ g ′ , h ′ ] : [ g, h ] ∈ Θ } covers F . It is easy to see that h ′ − g ′ = ( h − g ) I ( F ≤ M ) + 2 F I ( F > M ) , and therefore R ( h ′ − g ′ ) dµ ≤ R ( h − g ) dµ + 2 R F >M
F dµ ≤ ǫ. Let F be a family of measurable functions f : X → R with envelope F ( x ). The graph of f ∈ F is defined by G f = { ( x, s ) : x ∈ X and 0 ≤ s ≤ f ( x ) or f ( x ) ≤ s ≤ } ⊆ X × R . G ( F ) = { G f : f ∈ F } be the family of graphs of functions in F . Proposition 2.
Suppose that dim( G ( F )) < ∞ . If µ is any probability measure on ( X , S ) such that R F dµ < ∞ , then N [ ] ( ǫ, F , µ ) < ∞ for each ǫ > . Proof:
Suppose first that F is bounded, with constant envelope M < ∞ . The finiteness ofthe bracketing numbers is not affected if we replace each function f ∈ F by ( f + M ) / M ,and we therefore assume that every f ∈ F takes values in [0 , G f = { ( x, s ) : x ∈ X and 0 ≤ s ≤ f ( x ) ≤ } ⊆ X × [0 , . Let λ ( · ) denote Lebesgue measure on the Borel subsets B of [0 , ν = µ ⊗ λ on ( X × [0 , , S ⊗ B ).Fix ǫ >
0. As G ( F ) has finite VC dimension, Corollary 1 ensures that G ( F ) is coveredby a finite collection Θ of ǫ -brackets. Without loss of generality, we may represent thebrackets in Θ in the form [ A, B ], where
A, B ∈ S ⊗ B and A ⊆ B . Let [ A, B ] be a bracketin Θ. For each x ∈ X define g ( x ) = ess-sup( { s : ( x, s ) ∈ A } ) and h ( x ) = ess-inf( { s : ( x, s ) ∈ B c } ) , where for U ⊆ [0 ,
1] the essential supremum ess-sup( U ) = inf { α : µ ( U ∩ [0 , α ]) = µ ( U ) } , andess-inf( U ) is defined analogously. Routine arguments shows that g and h are measurable,that g ≤ h , and that ν ( A \ G g ) = ν ( B c \ G ch ) = 0. Moreover, for every function f : X → [0 , G f ∈ [ A, B ] implies G f ∈ [ G g , G h ], which implies in turn that g ≤ f ≤ h .It follows from the arguments above that the finite family Θ of brackets [ g, h ] derivedfrom the elements of Θ covers F . In order to assess the size of these brackets, note that( G h \ G g ) x = { s : ( x, s ) ∈ G h \ G g } = { s : g ( x ) < s ≤ h ( x ) } and therefore by Fubini’s theorem Z ( h ( x ) − g ( x )) dµ ( x ) = Z λ (( G h \ G g ) x ) dµ ( x ) = ν ( G h \ G g ) ≤ ν ( B \ A ) ≤ ǫ. Thus every element [ g, h ] of Θ is an ǫ -bracket under µ .The argument for an unbounded family F with an integrable envelope F is similar tothat for VC Major families. Given ǫ > M < ∞ be such that R F >M
F dµ < ǫ . For each f ∈ F define the truncation f M ( x ) = ( f ( x ) ∨ − M ) ∧ M , and let F M = { f M : f ∈ F } . As G f M = G f ∩ ( X × [ − M, M ]), it is easy to see that the dimension of G ( F M ) is no greater than7hat of G ( F ), and is therefore finite. The preceding argument shows that there is a finitecollection of ǫ -brackets covering F M , and these can be extended to 3 ǫ -brackets covering F following the proof of Proposition 1. Let X = X , X , . . . be a stationary ergodic process taking values in ( X , S ). The ergodictheorem ensures that for every measurable set C the sample averages n − P ni =1 I C ( X i )converge almost surely to P ( X ∈ C ). A family C ⊆ S satisfies a uniform laws of largenumbers with respect to X if the discrepancy∆ n ( C : X ) = sup C ∈C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 I C ( X i ) − P ( X ∈ C ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) tends to zero almost surely as n tends to infinity, so that the relative frequencies of sets in C converge uniformly to their limiting probabilities.For i.i.d. processes X , Vapnik and Chervonenkis [17] gave necessary and sufficient condi-tions under which ∆ n ( C : X ) →
0. For VC-classes they established exponential inequalitiesof the form P (∆ n ( C : X ) > t ) ≤ a · n dim( C ) · exp {− bt } , where a, b are positive constants inde-pendent of X and C . Consequently, VC classes have uniform laws of large numbers for anyi.i.d. process. Talagrand [14] provided necessary and sufficient conditions for uniform lawsof large numbers that strengthen those of [17]: for non-atomic distributions, ∆ n ( C : X ) A ∈ S with P ( A ) > C shattersevery finite subset of { X i : X i ∈ A } .Using the bracketing properties of VC classes established in the previous section one mayimmediately extend this result to the general ergodic case. The following theorem appearsin Adams and Nobel [1] (under an additional Polish assumption), where there is also adiscussion of related work on uniform laws of large numbers under dependent sampling. Theorem 3. If C is a separable VC-class of sets and X is a stationary ergodic process,then ∆ n ( C : X ) → almost surely as n tends to infinity. Proof:
The stated convergence follows easily from Corollary 1 and standard arguments forthe Blum DeHardt law of large numbers ( c.f. [15, 7]).One may establish uniform laws of large numbers for separable VC major and VC graphclasses of functions in the general ergodic case using the bracketing results in Propositions1 and 2, respectively. In [1] these results are derived directly from Theorem 3. Related8ork for families of functions, under a more general, scale specific, notion of dimension canbe found in [2].
In the case where X is a complete separable metric space and S is the Borel subsets of X ,one may prove Theorem 1 using arguments similar to those used in [1] to establish uniformlaws of large numbers for VC classes under ergodic sampling. The details can be found inan earlier version [3] of the results presented here. Below we provide a simpler argumentthat does not require the Polish assumption. The new argument, which follows the outlineof the proof in [1], employs several simplifications and improvements that were suggestedby an anonymous referee of [1], in particular, the use of Hilbert space weak limits in thedefinition of splitting sets. It follows from standard results on the L p -covering numbers of VC classes (for exam-ple, Theorem 2.6.4 of [15]) that there exists a countable sub-family C of C such thatinf C ′ ∈C µ ( C ′ △ C ) = 0 for each C ∈ C . An elementary argument then shows thatsup C ∈C µ ( ∂ ( C : π )) = sup C ∈C µ ( ∂ ( C : π ))for every finite partition π , and we may therefore assume that C is countable. Let C = { C , C , . . . } and let S = σ ( C ) ⊆ S be the sigma field generated by C . Suppose that theuniform approximation property fails to hold for C , that is, there exists a number η > C ∈C λ ( ∂ ( C : π )) > η for every finite measurable partition π. (1)Using the inequality (1) we construct a sequence of “splitting sets” S , S , . . . ⊆ X from thesets in C in a stage-wise fashion. At the k th stage the splitting set S k is obtained from asequential procedure that makes use of the splitting sets S , . . . , S k − produced at previousstages. The splitting sets are used to identify arbitrarily large finite collections of sets in C having full join. The existence of these collections implies that C has infinite VC dimensionby Lemma A. First stage.
Define the refining sequence of joins J ( n ) = C ∨ · · · ∨ C n for n ≥
1. It followsfrom (1) that for each n there is a set C ( n ) ∈ C whose boundary G ( n ) = ∂ ( C ( n ) : J ( n ))9as measure greater than η . Note that the sets { G ( n ) : n ≥ } are measurable S .By standard results in functional analysis, there exists a subsequence { n m } and an S -measurable function h such that R g I G ( n m ) dµ → R g h dµ as m tends to infinity for every g ∈ L ( X , S , µ ). (The function h is the weak limit of the indicator functions I G ( n m ) .)It follows that 0 ≤ h ≤ R h dµ ≥ η . Define the splitting set S = { h > } and note that µ ( S ) ≥ η .For simplicity, let J ( m ), C ( m ), and G ( m ) denote, respectively, the quantities J ( n m ), C ( n m ), and G ( n m ) along the subsequence defining h . We adopt similar notation forsubsequences encountered at subsequent stages. Subsequent stages.
Suppose now that we have constructed splitting sets S j at stages j = 1 , . . . , k −
1, and wish to construct the splitting set S k at stage k . Begin by defining therefining sequence of joins J k ( n ) = S ∨ · · · ∨ S k − ∨ C ∨ · · · ∨ C n for n ≥
1. It follows from(1) that for each n there is a set C k ( n ) ∈ C whose boundary G k ( n ) = ∂ ( C k ( n ) : J k ( n )) hasmeasure greater than η . Proceeding as in Stage 1, there is a subsequence { I G k ( m ) } havinga weak limit h k ∈ L ( X , S , µ ) such that 0 ≤ h k ≤ R h k dµ ≥ η . Definethe splitting set S k = { h k > } and note that µ ( S k ) ≥ η . Construction of Full Joins.
Fix an integer L ≥
1. As the measure of each splitting set S k is at least η , there exist positive integers k < k < . . . < k L +1 such that µ ( T L +1 j =1 S k j ) > k j = j . For l = 1 , . . . , L + 1define Q l = l \ j =1 S j In what follows we will make repeated use of the elementary fact that R B ( h · · · h l ) dµ > µ ( B ∩ Q l ) > D , . . . , D L ∈ C such that for each l = 1 , . . . , L , Z B ( h · · · h l ) dµ > B ∈ D l ∨ · · · ∨ D L . (2)The inequalities (2) are established by reverse induction, beginning with the case l = L . Tothis end, note that0 < Z ( h · · · h L +1 ) dµ = lim m →∞ Z ( h · · · h L ) I G L +1 ( m ) dµ, and therefore µ ( Q L ∩ G L +1 ( m )) > m sufficiently large. Fix such an m and let D = C L +1 ( m ). It follows from the definition of G L +1 ( m ) that for some cell A ∈ J L +1 ( m ), µ ( Q L ∩ A ) > µ ( A ∩ D ) · µ ( A ∩ D c ) > . (3)10he inclusion of the sets S , . . . , S L in the definition of the joins J L +1 ( n ) ensures that Q L isa finite union of cells of J L +1 ( m ). The first relation in (3) then implies that A is necessarilya subset of Q L , and it follows from the second relation that µ ( Q L ∩ D ) · µ ( Q L ∩ D c ) > D L = D the last inequality implies (2) in the case l = L .Suppose now that for some 1 < l < L we have identified sets D l , D l +1 , . . . , D L such that(2) holds. Then for each cell B in the join D l ∨ · · · ∨ D L ,0 < Z B ( h · · · h l ) dµ = lim m →∞ Z B ( h · · · h l − ) I G l ( m ) dµ. Therefore, there exists an integer m such that µ ( B ∩ Q l − ∩ G l ( m )) > B ∈ D l ∨ · · · ∨ D L . As the join J l ( m ) includes the first n m elements of C , by enlarging m ifnecessary we may assume that J l ( m ) includes D l , . . . , D L . Let D = C l ( m ) and let B beany cell of D l ∨ · · · ∨ D L . The definition of G l ( m ) implies that for some cell A ∈ J l ( m ), µ ( B ∩ Q l − ∩ A ) > µ ( A ∩ D ) · µ ( A ∩ D c ) > . (4)Both Q l − and B are equal to a union of cells of the partition J ( m ), so the first relation in(4) implies that A ⊆ B ∩ Q l − , and it then follows from the second relation that µ ( B ∩ Q l − ∩ D ) and µ ( B ∩ Q l − ∩ D c ) are positive. As these inequalities hold for each B ∈ D l ∨ · · · ∨ D L ,we have R B ′ ( h · · · h l − ) dµ > B ′ ∈ D ∨ D l ∨· · ·∨ D L . Letting D l − = D completesthe induction.It follows from (2) that the sets D , . . . , D L have full join, and as L ≥ C has infinite VC dimension, which completes the proof of thetheorem. Remark:
An inspection of the proof shows that the approximating partitions π in thetheorem can be taken to be measurable σ ( C ). A simple counterexample shows that π maynot be chosen from the smaller family S ∞ n =1 σ ( C ∨ C ∨ . . . ∨ C n ). Let X = [0 ,
1] and let µ be Lebesgue measure. Let a , a , . . . be a sequence of positive real numbers such that s = P ∞ n =1 a n <
1. Define s = 0 and s n = P ni =1 a i for n ≥
1, and let C n = [ s n − , s n ).Clearly, the VC-dimension of the class { C , C , . . . } equals 1, since the sets are disjoint.Define J n = C ∨ C ∨ . . . ∨ C n . Then the set A n = [ s n ,
1] is a single element in J n withmeasure 1 − s n > − s >
0. Moreover, both A n ∩ C n +1 and A n ∩ C ′ n +1 have positivemeasure. Thus, for n ≥ A n ⊆ ∂ ( C n +1 : G n ) and µ ( ∂ ( C n +1 : G n )) > − s . Acknowledgements
The authors are indebted to an anonymous referee of the earlier paper [1] who suggested11he general form of Theorem 2, and whose detailed comments led to a simpler and moregeneral proof. The authors would also like to acknowledge helpful discussions with Ramonvan Handel, who provided feedback on an earlier version of this work [3], and who broughtthe papers [5, 12, 13] to our attention. The work presented in this paper was supported inpart by NSF grant DMS-0907177.
References [1]
Adams, T.M. and
Nobel, A.B. (2010) Uniform convergence of Vapnik-Chervonenkisclasses under ergodic sampling.
Annals of Probability (4)1345-1367.[2] Adams, T.M. and
Nobel, A.B. (2010) The gap dimension and uniform laws of largenumbers for ergodic processes. Preprint. arXiv:1007.2964v1[3]
Adams, T.M. and
Nobel, A.B. (2010) Uniform approximation and bracketing prop-erties of VC classes. Preprint. arXiv1007.4037v1[4]
Assouad, P. (1983) Densit´e et dimension.
Annales de l’Institut Fourier (3) 233-282.MR0723955 (86j:05022)[5] Bourgain, J. and
Fremlin, D.H. and
Talagrand, M. (1978) Pointwise compactsets of Baire measurable functions.
American Journal of Mathematics
Devroye, L. and
Gy¨orfi, L. and
Lugosi, G. (1996)
A probabilistic theory of patternrecognition . Springer. MR1383093 (97d:68196)[7]
Dudley, R.M. (1999)
Uniform Central Limit Theorems.
Cambridge University Press,Cambridge. MR1720712 (2000k:60050)[8]
Gaenssler, P. and
Stute, W. (1976) On uniform convergence of measures withapplications to uniform convergence of empirical distributions.
Empirical distributionsand processes (Selected Papers, Meeting on Math. Stochastics, Oberwolfach, 1976) . MR0433534 (55
Kearns, M.J. and
Schapire, R.E. (1994) Efficient distribution-free learning of prob-abilistic concepts.
Journal of Computer and System Sciences
Matousek, J. (2002) Lectures on Discrete Geometry.
Graduate Texts in Mathematics
Springer, New York. MR1899299 (2003f:52011)1211]
Pollard, D. (1984)
Convergence of Stochastic Processes
Springer, New York.MR0762984 (86i:60074)[12]
Rosenthal, H.P. (1974) A characterization of Banach spaces containing l1
Proceed-ings of the National Academy of Sciences U.S.A. Talagrand, M. (1984)
Pettis integral and measure theory
Memoirs of the AmericanMathematics Society (307). MR0756174 (86j:46042)[14] Talagrand, M. (1987) The Glivenko-Cantelli problem.
Annals of Probability van der Vaart, A.W. and
Wellner, J.A. (1996)
Weak Convergence and EmpiricalProcesses . Springer-Verlag, New York. MR1385671 (97g:60035)[16]
Vapnik, V.N. (2000)
The nature of statistical learning theory . Second edition.Springer-Verlag, New York. MR1719582 (2001c:68110)[17]
Vapnik, V.N. and
Chervonenkis, A.Ya. (1971) On the uniform convergence of rela-tive frequencies of events to their probabilities.
Theory of Probability and its Applications16