On the stability of bootstrap estimators
Andreas Christmann, Matias Salibian-Barrera, Stefan Van Aelst
aa r X i v : . [ m a t h . S T ] N ov (Date: November 9, 2011) On the Stability of BootstrapEstimators
A. Christmann and M. Salib´ıan-Barrera and S. Van Aelst University of Bayreuth, Department of Mathematics, Bayreuth, GERMANY.e-mail: [email protected] University of British Columbia, Department of Statistics, Vancouver, CANADA.e-mail: [email protected] University of Ghent, Department of Applied Mathematics and Computer Science,Ghent, BELGIUM.e-mail:
Abstract:
It is shown that bootstrap approximations of an estimatorwhich is based on a continuous operator from the set of Borel probabilitymeasures defined on a compact metric space into a complete separablemetric space is stable in the sense of qualitative robustness. Supportvector machines based on shifted loss functions are treated as specialcases.
Keywords and phrases: bootstrap, statistical machine learning, sta-bility, support vector machine, robustness.
1. Introduction
The finite sample distribution of many nonparametric methods from statis-tical learning theory is unknown because the distribution P from which thedata were generated is unknown and because there are often only asymptot-ical results on the behaviour of such methods known.The goal of this paper is to show that bootstrap approximations of anestimator which is based on a continuous operator from the set of Borelprobability distributions defined on a compact metric space into a completeseparable metric space is stable in the sense of qualitative robustness. As aspecial case it is shown that bootstrap approximations for the support vectormachine (SVM) are stable, both for the risk functional and for the SVMoperator itself. The results can be interpreted as generalizations of theoremsderived by [4].The rest of the paper has the following structure. Section 2 gives the general hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators result and Section 3 contains the results for SVMs. All proofs are given inthe appendix.
2. On Qualitative Robustness of Bootstrap Estimators
If not otherwise mentioned, we will use the Borel σ -algebra B ( A ) on a set A and denote the Borel σ -algebra on R by B . Assumption 1.
Let (Ω , A , µ ) be a probability space, where µ is unknown, ( Z , d Z ) be a compact metric space, and B ( Z ) be the Borel σ -algebra on Z . De-note the set of all Borel probability measures on ( Z , B ( Z )) by M ( Z , B ( Z )) .On M ( Z , B ( Z )) we use the Borel σ -algebra B ( M ( Z , B ( Z ))) and the boundedLipschitz metric d BL , see (4.11). Let S be a statistical operator defined on M ( Z , B ( Z )) with values in a complete, separable metric space ( W , d W ) en-clipped with its Borel σ -algebra B ( W ) . Let Z, Z n : (Ω , A , µ ) → ( Z , B ( Z )) , n ∈ N , be independent and identically distributed random variables and de-note the image measure by P := Z ◦ µ . Let S n ( Z , . . . , Z n ) be a statisticwith values in ( W , B ( W )) . Denote the empirical measure of ( Z , . . . , Z n ) by P n := n P ni =1 δ Z i . The statistic S n is defined via the operator S : ( M ( Z , B ( Z )) , B ( M ( Z , B ( Z ))) → ( W , B ( W )) where S (P n ) = S n ( Z , . . . , Z n ) . Denote the distribution of S n ( Z , . . . , Z n ) when Z i i.i.d. ∼ P by L n ( S ; P) := L ( S n ( Z , . . . , Z n )) . Accordingly, we denote thedistribution of S n ( Z , . . . , Z n ) when Z i i.i.d. ∼ P n by L n ( S ; P n ) . Efron [9, 10] proposed the bootstrap, whose main idea is to approximatethe unknown distribution L n ( S ; P) by L n ( S ; P n ). Note that these bootstrapapproximations L n ( S ; P n ) are (probability measure-valued) random variableswith values in M ( W , B ( W )).Following [4] we call a sequence of bootstrap approximations L n ( S ; P n ) qualitatively robust at P ∈ M ( Z , B ( Z )) if the sequence of transformations g n : M ( Z , B ( Z )) → M ( W , B ( W )) , g n (Q) = L ( L n ( S ; Q n )) , n ∈ N , (2.1)is asymptotically equicontinuous at P ∈ M ( Z , B ( Z )), i.e. if ∀ ε > ∃ δ > ∃ n ∈ N : d BL (Q , P) < δ ⇒ sup n ≥ n d BL (cid:0) L ( L n ( S ; Q n )) , L ( L n ( S ; P n )) (cid:1) < ε. (2.2) hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators Following [4] again, we call a sequence of statistics ( S n ) n ∈ N uniformly quali-tatively robust in a neighborhood U (P ) of P ∈ M ( Z , B ( Z )) if ∃ n ∈ N ∀ ε > ∀ n ≥ n ∃ δ > ∀ P ∈ U (P ) : d BL (Q , P) < δ ⇒ d BL ( L n ( S ; Q) , L n ( S ; P)) < ε. (2.3)The following two results and Theorem 8 in the next section are the mainresults of this paper. Theorem 2.
If Assumption 1 is valid and if S is uniformly continuous ina neighborhood U (P ) of P ∈ M ( Z , B ( Z )) , then ( S n ( Z , . . . , Z n )) n ∈ N isuniformly qualitatively robust in U (P ) . Theorem 3.
If Assumption 1 is valid and if ( S n ( Z , . . . , Z n )) n ∈ N is uniformlyqualitatively robust in a neighborhood U (P ) of P ∈ M ( Z , B ( Z )) , then thesequence L n ( S ; P n ) of bootstrap approximations of L n ( S ; P) is qualitativelyrobust for P . As an immediate consequence from both theorems given above we obtain
Corollary 4.
If Assumption 1 is valid and if S is a continuous operator,then the sequence L n ( S ; P n ) of bootstrap approximations of L n ( S ; P) is qual-itatively robust for all P ∈ M ( Z , B ( Z )) . Remark 5.
The Theorems 2 and 3 can be considered as a generalizationof [4, Thm. 2, Thm. 3], who considered the case W := A ⊂ R being a fi-nite interval and Z := R -valued random variables Z , . . . , Z n . In our case,the statistics S n ( Z , . . . , Z n ) are W -valued statistics, where W is a completeseparable metric space and its dimension can be infinite.
3. On Qualitative Robustness of Bootstrap SVMs
In this section we will apply the previous results to support vector machineswhich belong to the modern class of statistical machine learning methods.I.e., we will consider the special case that W is a reproducing kernel Hilbertspace H used by a support vector machine (SVM). Note that H typicallyhas an infinite dimension, which is true, e.g., if the popular Gaussian RBFkernel k : X × X → R , k ( x, x ′ ) := exp( − γ k x − x ′ k ) for γ >
0) is used.To state our result on the stability of bootstrap SVMs in Theorem 8 below,we need the following assumptions on the loss function and the kernel. hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators Assumption 6.
Let Z = X × Y be a compact metric space with metric d Z ,where Y ⊂ R is closed. Let L : X × Y × R → [0 , ∞ ) be a loss function suchthat L is continuous and convex with respect to its third argument and that L is uniformly Lipschitz continuous with respect to its third argument withuniform Lipschitz constant | L | > , i.e. | L | is the smallest constant c suchthat sup ( x,y ) ∈X ×Y | L ( x, y, t ) − L ( x, y, t ′ ) | ≤ c | t − t ′ | for all t, t ′ ∈ R . Denote theshifted loss function by L ⋆ ( x, y, t ) := L ( x, y, t ) − L ( x, y, , ( x, y, t ) ∈ X × Y × R . Let k : X × X → R be a continuous kernel with reproducing kernel Hilbertspace H and assume that k is bounded by k k k ∞ := (sup x ∈X k ( x, x )) / ∈ (0 , ∞ ) . Let λ ∈ (0 , ∞ ) . These assumptions can be considered as standard assumptions for stableSVMs, see, e.g., [1] and [15, Chap. 10], .In this paper the RKHS H , the penalyzing constant λ , and the loss function L and thus the shifted loss function L ⋆ are fixed. Therefore, we write in thenext definition just S and R instead of S L ⋆ ,H,λ and R L ⋆ ,H,λ to shorten thenotation. Definition 7.
The
SVM operator S : M ( Z , B ( Z )) → H is defined by S (P) := f L ⋆ , P ,λ := arg min f ∈ H E P L ⋆ ( X, Y, f ( X )) + λ k f k H . (3.4) The
SVM risk functional R : M ( Z , B ( Z )) → R is defined by R (P) := E P L ⋆ ( X, Y, S (P)( X )) = E P L ⋆ ( X, Y, f L ⋆ , P ,λ ( X )) . (3.5)If Assumption 6 is valid, then S is well-defined because S (P) ∈ H existsand is unique, R is well-defined because R (P) ∈ R exists and is unique, andit holds, for all P ∈ M ( X × Y ), k S (P) k ∞ ≤ λ | L | k k k ∞ < ∞ and | R (P) | ≤ λ | L | k k k ∞ < ∞ , (3.6)see [2, Thm 5, Thm. 6, (17),(18)]. Theorem 8.
If the general Assumption 1 and the Assumption 6 are valid,then the SVM operator S and the SVM risk functional R fulfill:(i) The sequence L n ( S ; P n ) of bootstrap SVM estimators of L n ( S ; P) isqualitatively robust for all P ∈ M ( Z , B ( Z )) .(ii) The sequence L n ( R ; P n ) of bootstrap SVM risk estimators of L n ( R ; P) is qualitatively robust for all P ∈ M ( Z , B ( Z )) . hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators
4. Proofs
For the proofs we need Theorem 9 and Theorem 10, see below. To state The-orem 9 on uniform Glivenko-Cantelli classes, we need the following notation.For any metric space ( S , d ) and real-valued function f : S → R , we denotethe bounded Lipschitz norm of f by k f k BL := sup x ∈S | f ( x ) | + sup x,y ∈S ,x = y | f ( x ) − f ( y ) | d ( x, y ) . (4.7)Let ˜ F be a set of measurable functions from ( S , B ( S )) → ( R , B ). For anyfunction G : ˜ F → R (such as a signed measure) define k G k ˜ F := sup {| G ( f ) | : f ∈ ˜ F } . (4.8) Theorem 9. [8, Prop. 12] For any separable metric space ( S , d ) and M ∈ (0 , ∞ ) , ˜ F M := { f : ( S , B ( S )) → ( R , B ); k f k BL ≤ M } (4.9) is a universal Glivenko-Cantelli class. It is a uniform Glivenko-Cantelli class,i.e., for all ε > , lim n →∞ sup ν ∈M ( S , B ( S )) Pr ∗ (cid:16) sup m ≥ n k ν m − ν k ˜ F M > ε (cid:17) = 0 , (4.10) if and only if ( S , d ) is totally bounded. Here, Pr ∗ denotes the outer probability. Note that the term k ν m − ν k ˜ F M in (4.10) equals the bounded Lipschitzmetric d BL of the probability measures ν m and ν if M = 1, i.e. k ν m − ν k ˜ F = sup f ∈ ˜ F | ( ν m − ν )( f ) | = sup f ; k f k BL ≤ (cid:12)(cid:12)(cid:12)Z f dν m − Z f dν (cid:12)(cid:12)(cid:12) =: d BL ( ν m , ν ) , (4.11)see [7, p. 394]. Hence, Theorem 9 can be interpreted as a generalization of[4, Lemma 1, p. 186], which says that if A ⊂ R is a finite interval, then d BL (P m , P) converges almost surely to 0 uniformly in P ∈ M ( A, B ( A )). Forvarious characterizations of Glivenko-Cantelli classes, we refer to [16, Thm.22] and [6].We next list the other main result we need for the proof of Theorem 8.This result is an analogon of the famous Strassen theorem for the boundedLipschitz metric d BL instead of the Prohorov metric. hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators Theorem 10. [13, Thm. 4.2, p. 30] Let Z be a Polish space with topology τ Z . Let d BL be the bounded Lipschitz metric defined on the set M ( Z , B ( Z )) of all Borel probability measures on Z . Then the following two statementsare equivalent:(i) There are random variables ξ with distribution ν and ξ with distri-bution ν such that E [ d BL ( ξ , ξ )] ≤ ε .(ii) d BL ( ν , ν ) ≤ ε . Proof of Theorem 2.
We closely follow the proof by [4, Thm. 2]. However,we use Theorem 9 instead of their Lemma 1 and we use [3, Lem. 1] insteadof [12, Lem. 1].Let P n ⊂ M ( Z , B ( Z )) be the set of empirical distributions of order n ∈ N ,i.e. P n := n P n ∈ M ( Z , B ( Z )); ∃ ( z , . . . , z n ) ∈ Z n such that P n = 1 n n X i =1 δ z i o , (4.12)and let E n ⊂ P n . If misunderstandings are unlikely, we identify E n with theset { z , . . . , z n } of atoms.It is enough to show that ∀ ε > ∃ δ > ∀ P ∈ U (P ) ∃ sequence ( E n ) n ∈ N ⊂ P n (4.13)such that P n ( E n ) > − ε and for all Q n ∈ E n and for all ˜Q n ∈ P n we have d BL (Q n , ˜Q n ) < δ ⇒ d W ( S (Q n ) , S ( ˜Q n )) < ε. (4.14)From this we obtain that ( S n ) n ∈ N is uniformly qualitatively robust by [3,Lem. 1].Let ε >
0. Since the operator S is uniformly continuous in U (P ) we obtain ∃ δ > ∀ P ∈ U (P ) : d BL (P , Q) < δ ⇒ d W ( S (P) , S (Q)) < ε/ . (4.15)Hence by Theorem 9 for the special case M = 1 and by (4.11), we get ∃ n ∈ N : sup P ∈U (P ) Pr ∗ (cid:16) sup n ≥ n d BL (P n , P) < δ (cid:17) > − ε. (4.16)For n ≥ n and P ∈ U (P ), define E n, P := { Q n ∈ P n : d BL (Q n , P) < δ / } . (4.17) hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators It follows, that P n ( E n, P ) > − ε together with Q n ∈ E n, P and d BL (Q n , ˜Q n ) <δ / d BL (Q n , P) < δ / d BL ( ˜Q n , P) < δ . The triangle inequality thus yields due to (4.15) d W ( S (Q n ) , S ( ˜Q n )) ≤ d W ( S (Q n ) , S (P)) + d W ( S (P) , S ( ˜Q n )) < ε, (4.18)from which the assertion follows. Proof of Theorem 3.
The proof mimics the proof of [4, Thm. 3], but usesTheorem 9 instead of [4, Lem. 1].Fix P ∈ M ( Z , B ( Z )) and ε >
0. By the uniform qualitative robustnessof ( S n ) n ∈ N in U (P ), there exists n ∈ N such that for all ε > δ > d BL (Q , P) < δ ⇒ sup m ≥ n sup P ∈U (P ) d BL ( L m ( S ; Q) , L m ( S ; P)) < ε. (4.19)Define δ := δ/
2. Due to Theorem 9 for the special case M = 1 and by (4.11),we have, for all ε > n →∞ sup P ∈M ( Z , B ( Z )) Pr ∗ (cid:16) sup m ≥ n d BL (P m , P) > ε (cid:17) = 0 . (4.20)Hence (4.19) and Varadarajan’s theorem on the almost sure convergence ofempirical measures to a Borel probability measure defined on a separablemetric space, see e.g. [7, Thm. 11.4.1, p. 399], yields for the empirical distri-butions Q n from Q and P ,n from P that, ∃ n > n ∀ n ≥ n : d BL (Q , P ) < δ ⇒ d BL (Q n , P ,n ) < δ almost surely . (4.21)It follows from the uniform qualitative robustness of ( S n ) n ∈ N , see (4.19), that ∃ n ∈ N ∀ ε > ∀ n ≥ n ∃ δ > ∀ P ∈ U (P ) : d BL (Q , P) < δ ⇒ d BL ( L n ( S ; Q n ) , L n ( S ; P ,n )) < ε almost surely . (4.22)For notational convenience, we write for the sequences of bootstrap estima-tors ξ ,n := L n ( S ; Q n ) , ξ ,n := L n ( S ; P ,n ) , n ∈ N . (4.23) hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators Note that ξ ,n and ξ ,n are (measure-valued) random variables with values inthe set M ( W , B ( W )). We denote the distribution of ξ j,n by µ j,n for j ∈ { , } and n ∈ N . Hence (4.22) yields d BL ( ξ ,n , ξ ,n ) < ε almost surely for all n ≥ n (4.24)and it follows E [ d BL ( ξ ,n , ξ ,n )] ≤ ε, ∀ n ≥ n . (4.25)Now an application of an analogon of Strassen’s theorem, see Theorem 10,yields sup n ≥ n d BL ( L ( ξ ,n ) , L ( ξ ,n )) ≤ ε ∀ n ≥ n , (4.26)which completes the proof, because L ( ξ ,n ) = L ( L n ( S ; Q n )) and L ( ξ ,n ) = L ( L n ( S ; P ,n )) . (4.27) Proof of part (i).
By assumption, ( Z , d Z ) is a com-pact metric space, where Z = X × Y . Let B ( Z ) be the Borel σ -algebraon Z . It is well-known that the bounded Lipschitz metric d BL metrizes theweak topology on the space M ( Z , B ( Z )), see [7, Thm. 11.3.3], and that( M ( Z , B ( Z )) , d BL ) is a compact metric space if and only if ( Z , d Z ) is acompact metric space, see [14, p. 45, Thm. 6.4]. From the compactness of( M ( Z , B ( Z )) , d BL ), it of course follows that this metric space is separableand totally bounded, see [5, Thm. 1.4.26].Under the assumptions of the theorem we have, for all fixed λ ∈ (0 , ∞ ), thatthe SVM operator S : M ( Z , B ( Z )) → H , S (P) = f L ⋆ , P ,λ , is well-definedbecause it exists and is unique, see [2, Thm. 5, Thm. 6] and is continuous withrespect to the combination of the weak topology on M ( Z , B ( Z )) and thenorm topology on H , see [11, Thm. 3.3, Cor. 3.4]. There it was also shownthat the operator ˜ S : M ( Z , B ( Z )) → C b ( Z ), P f L ⋆ , P ,λ , is continuous withrespect to the combination of weak topology on M ( Z , B ( Z )) and the normtopology on C b ( Z ). Because ( M ( Z , B ( Z )) , d BL ) is a compact metric space,the operators S and ˜ S are therefore even uniformly continuous on the wholespace M ( Z , B ( Z )) with respect to the mentioned topologies, see [5, Prop.1.5.9]. hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators Because the reproducing kernel Hilbert space W := H is a Hilbert space, H is complete. Furthermore, because the input space X is separable and thekernel k is continuous, the RKHS H is also separable, see [15, Lem. 4.33].Therefore, Theorem 2 yields that the sequence of H -valued statistics S n (( X , Y ) , . . . , ( X n , Y n )) = arg min f ∈ H n n X i =1 L ⋆ ( X i , Y i , f ( X i ))+ λ k f k H , n ∈ N , (4.28)is uniformly qualitatively robust in a neighborhood U (P ) for every proba-bility measure P ∈ M ( Z ). Now we apply Theorem 3, which yields that thesequence ( L n ( S ; P n )) n ∈ N of bootstrap SVM estimators of L n ( S ; P) is quali-tatively robust for all P ∈ M ( Z , B ( Z )), which gives the first assertion ofthe theorem. Proof of part (ii).
The proof consists of two steps. In Step 1 the continuity ofthe SVM risk functional R will be shown. In Step 2, the Theorems 2 and 3will be used to show that the sequence ( L n ( R ; P n )) n ∈ N , n ∈ N , of bootstrapSVM risk estimators is qualitatively robust. Step 1.
We will first show that the SVM risk functional R : M ( Z , B ( Z )) → R is continuous with respect to the combination of the weak topology on M ( Z , B ( Z )) and the standard topology on R .As mentioned in part (i) , the assumption that ( Z , d Z ) is a compact metricspace implies that ( M ( Z , B ( Z )) , d BL ) is a compact metric space and hencethis space is separable and totally bounded.Under the assumptions of the theorem, the SVM operator S : M ( Z , B ( Z )) → H , S (P) = f L ⋆ , P ,λ , is well-defined because S (P) exists and is unique for allP ∈ M ( Z , B ( Z )) and for all λ ∈ (0 , ∞ ), see [2, Thm. 5, Thm. 6]. Further-more, S is continuous with respect to the combination of the weak topologyon M ( Z , B ( Z )) and the norm topology on H , see [11, Thm. 3.3]. Hence thefunction g P : X × Y → R , g P ( x, y ) := L ⋆ (cid:0) x, y, S (P)( x ) (cid:1) = L ⋆ (cid:0) x, y, f L ⋆ , P ,λ ( x ) (cid:1) (4.29)is well-defined. Because the kernel k is bounded and continuous, all functions f ∈ H , and hence in particular S (P) = f L ⋆ , P ,λ ∈ H , are continuous, see e.g.[15, Lem. 4.28, Lem. 4.29]. Hence the function g P is continuous (with respectto ( x, y )), because the loss function L and hence the shifted loss function L ⋆ ( x, y, t ) = L ( x, y, t ) − L ( x, y, x, y, t ) ∈ X × Y × R , are continuous.Furthermore, the function g P is bounded , because ( Z , d Z ) with Z := X × Y is hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators by assumption a compact metric space, the Lipschitz continuous loss function L maps from X × Y × R to [0 , ∞ ), and k S (P) k ∞ ≤ λ | L | k k k ∞ < ∞ , see[2, p. 314, (17)]. Hence g P ∈ C b ( Z , R ). Because the bounded Lipschitz metric d BL metrizes the weak topology on M ( Z , B ( Z )), it follows that ∀ ε > ∃ δ > d BL (Q , P) < δ = ⇒ (cid:12)(cid:12)(cid:12)Z g P d Q − Z g P d P (cid:12)(cid:12)(cid:12) < ε . (4.30)Recall that S : M ( Z , B ( Z )) → H is continuous with respect to the combi-nation of the weak topology on M ( Z , B ( Z )) and the norm topology on H ,see [11, Thm. 3.3]. Hence ∀ ε > ∃ δ > d BL (Q , P) < δ = ⇒ k S (Q) − S (P) k H < ε . (4.31)Fix ε >
0. Define ε := ε ε := ε | L | k k k ∞ . Using the triangle inequality in (4.33), the definition of the shifted loss func-tion L ⋆ in (4.34), the definition of the function g P in (4.35), the Lipschitzcontinuity of L in (4.36), and the well-known formula k f k ∞ ≤ k k k ∞ k f k H , f ∈ H, (4.32)see e.g. [15, p. 124] we obtain that d BL (Q , P) < δ implies | R (Q) − R (P) | = (cid:12)(cid:12)(cid:12)Z L ⋆ ( x, y, S (Q)( x )) d Q( x, y ) − Z L ⋆ ( x, y, S (P)( x )) d P( x, y ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)Z L ⋆ ( x, y, S (Q)( x )) d Q( x, y ) − Z L ⋆ ( x, y, S (P)( x )) d Q( x, y ) (cid:12)(cid:12)(cid:12) (4.33)+ (cid:12)(cid:12)(cid:12)Z L ⋆ ( x, y, S (P)( x )) d Q( x, y ) − Z L ⋆ ( x, y, S (P)( x )) d P( x, y ) (cid:12)(cid:12)(cid:12) ≤ Z | L ( x, y, S (Q)( x )) − L ( x, y, S (P)( x )) | d Q( x, y ) (4.34)+ (cid:12)(cid:12)(cid:12)Z g P d Q − Z g P d P (cid:12)(cid:12)(cid:12) (4.35) (4.30) ≤ | L | k S (Q) − S (P) k ∞ + ε (4.36) (4.32) ≤ | L | k k k ∞ k S (Q) − S (P) k H + ε (4.37) (4.31) ≤ | L | k k k ∞ ε + ε = 23 ε. (4.38) hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators Hence, R is continuous with respect to the combination of the weak topologyon M ( Z , B ( Z )) and the standard topology on ( R , B ). Step 2.
Because ( M ( Z , B ( Z )) , d BL ) is a compact metric space and the riskfunctional R : M ( Z , B ( Z )) → R is continuous, R is even uniformly contin-uous with respect to the mentioned topologies, see [5, Prop. 1.5.9]. Obviously( W , d W ) := ( R , |·| ) is a complete separable metric space. Therefore, Theorem2 yields that the sequence of R -valued statistics R n (( X , Y ) , . . . , ( X n , Y n )) = 1 n n X i =1 L ⋆ (cid:0) X i , Y i , f L ⋆ , D ,λ ( X i ) (cid:1) , n ∈ N , where f L ⋆ , D ,λ := arg min f ∈ H n P nj =1 L ⋆ ( X j , Y j , f ( X j )) + λ k f k H , is uniformlyqualitatively robust in a neighborhood U (P ) for every probability measureP ∈ M ( Z ). Now we apply Theorem 3, which yields that the sequence L n ( R ; P n ) of bootstrap SVM estimators of L n ( R ; P) is qualitatively robustfor all P ∈ M ( Z , B ( Z )), which completes the proof. References [1] A. Christmann and I. Steinwart. Consistency and robustness of kernelbased regression.
Bernoulli , 13:799–819, 2007.[2] A. Christmann, A. Van Messem, and I. Steinwart. On consistency androbustness properties of support vector machines for heavy-tailed distri-butions.
Statistics and Its Interface , 2:311–327, 2009.[3] A. Cuevas. Qualitative robustness in abstract inference.
J. Statist.Plann. Inference , 18:277–289, 1988.[4] A. Cuevas and R. Romo. On robustness properties of bootstrap approx-imations.
J. Statist. Plann. Inference , 1993.[5] Z. Denkowski, S. Mig´orski, and N. Papageorgiou.
An introduction tononlinear analysis: Theory . Kluwer Academic Publishers, Boston, 2003.[6] R. M. Dudley.
Uniform Central Limit Theorems . Cambridge UniversityPress, Cambridge, 1999.[7] R. M. Dudley.
Real Analysis and Probability . Cambridge UniversityPress, Cambridge, 2002.[8] R. M. Dudley, E. Gin´e, and J. Zinn. Uniform and universal Glivenko-Cantelli classes.
J. Theor. Prob. , 4:485–510, 1991.[9] B. Efron. Bootstrap methods: Another look at the jackknife.
Annals ofStatistics , 7:1–26, 1979. hristmann, Salib´ıan-Barrera, Van Aelst/On the Stability of Bootstrap Estimators [10] B. Efron. The Jackknife, the Bootstrap, and Other Resampling Plans ,volume 38. CBMS Monograph, Society for Industrial and Applied Math-ematics, Philadelphia, 1982.[11] R. Hable and A. Christmann. Qualitative robustness of support vectormachines.
Journal of Multivariate Analysis , 102:993–1007, 2011.[12] F. R. Hampel. A general qualitative definition of robustness.
Ann. Math.Statist. , 42:1887–1896, 1971.[13] P. J. Huber.
Robust Statistics . John Wiley & Sons, New York, 1981.[14] K. R. Parthasarathy.
Probability Measures on Metric Spaces . AcademicPress, New York, 1967.[15] I. Steinwart and A. Christmann.
Support Vector Machines . Springer,New York, 2008.[16] M. Talagrand. The Glivenko-Cantelli problem.