[PDF] Qualitative robustness for bootstrap approximations

Abstract

An important property of statistical estimators is qualitative robustness, that is small changes in the distribution of the data only result in small chances of the distribution of the estimator. Moreover, in practice, the distribution of the data is commonly unknown, therefore bootstrap approximations can be used to approximate the distribution of the estimator. Hence qualitative robustness of the statistical estimator under the bootstrap approximation is a desirable property. Currently most theoretical investigations on qualitative robustness assume independent and identically distributed pairs of random variables. However, in practice this assumption is not fulfilled. Therefore, we examine the qualitative robustness of bootstrap approximations for non-i.i.d. random variables, for example \alpha-mixing and weakly dependent processes. In the i.i.d. case qualitative robustness is ensured via the continuity of the statistical operator, representing the estimator, see Hampel (1971) and Cuevas and Romo (1993). We show, that qualitative robustness of the bootstrap approximation is still ensured under the assumption that the statistical operator is continuous and under an additional assumption on the stochastic process. In particular, we require a convergence condition of the empirical measure of the underlying process, the so called Varadarajan property.

Full PDF

aa r X i v : . [ m a t h . S T ] J a n Qualitative robustness for bootstrap approximations

Katharina Strohriegl,University of [email protected] 16, 2018

Abstract

An important property of statistical estimators is qualitative robustness, that issmall changes in the distribution of the data only result in small chances of the distri-bution of the estimator. Moreover, in practice, the distribution of the data is commonlyunknown, therefore bootstrap approximations can be used to approximate the distribu-tion of the estimator. Hence qualitative robustness of the statistical estimator under thebootstrap approximation is a desirable property. Currently most theoretical investiga-tions on qualitative robustness assume independent and identically distributed pairs ofrandom variables. However, in practice this assumption is not fulﬁlled. Therefore, weexamine the qualitative robustness of bootstrap approximations for non-i.i.d. randomvariables, for example α -mixing and weakly dependent processes. In the i.i.d. case qual-itative robustness is ensured via the continuity of the statistical operator, representingthe estimator, see Hampel (1971) and Cuevas and Romo (1993). We show, that quali-tative robustness of the bootstrap approximation is still ensured under the assumptionthat the statistical operator is continuous and under an additional assumption on thestochastic process. In particular, we require a convergence condition of the empiricalmeasure of the underlying process, the so called Varadarajan property. Keywords: stochastic processes, qualitative robustness, bootstrap, α -mixing, weakly de-pendent AMS:

The overwhelming part of theoretical publications in statistical machine learning was doneunder the assumption that the data is generated by independent and identically distributed(i.i.d.) random variables. However, this assumption is not fulﬁlled in many practical ap-plications so that non-i.i.d. cases increasingly attract attention in machine learning. Animportant property of an estimator is robustness. It is well known that many classical esti-mators are not robust, which means that small changes in the distribution of the data gen-erating process may highly aﬀect the results, see for example Huber (1981), Hampel (1968),1urečková and Picek (2006) or Maronna et al. (2006) for some books on robust statistics.Qualitative robustness is a continuity property of the estimator and means roughly speaking:small changes in the distribution of the data only lead to small changes in the distribution(i.e. the performance) of the estimator. In this way the following kinds of "small errors"are covered: small errors in all data points (rounding errors) and large errors in only asmall fraction of the data points (gross errors, outliers). Qualitative robustness of estima-tors has been deﬁned originally in Hampel (1968) and Hampel (1971) in the i.i.d. case andhas been generalized to estimators for stochastic processes in various ways, for example, inPapantoni-Kazakos and Gray (1979), Bustos (1980), which will be the one used here, Cox(1981), Boente et al. (1987), Zähle (2015), and Zähle (2016), for a more local considerationof qualitative robustness, see for example Krätschmer et al. (2017).Often the ﬁnite sample distribution of the estimator or of the stochastic process of interest isunknown, hence an approximation of the distribution is needed. Commonly, the bootstrapis used to receive an approximation of the unknown ﬁnite sample distribution by resamplingfrom the given sample.The classical bootstrap, also called the empirical bootstrap, has been introduced by Efron(1979) for i.i.d. random variables. This concept is based on drawing a bootstrap sample ( Z ∗ , . . . , Z ∗ m ) of size m ∈ N with replacement out of the original sample ( Z , . . . , Z n ) , n ∈ N , and approximate the theoretical distribution P n of ( Z , . . . , Z n ) using the bootstrapsample. For the empirical bootstrap the approximation of the distribution via the bootstrapis given by the empirical distribution of the bootstrap sample ( Z ∗ , . . . , Z ∗ m ) , hence P ∗ n = ⊗ ni =1 (cid:0) m P mi =1 δ Z ∗ i (cid:1) , where δ Z i denotes the dirac measure. The bootstrap sample itself hasdistribution ⊗ mi =1 (cid:0) n P ni =1 δ Z i (cid:1) .For an introduction to the bootstrap see for example Efron and Tibshirani (1993) andvan der Vaart (1998, Chapter 3.6). Besides the empirical bootstrap many other boot-strap methods have been developed in order to ﬁnd good approximations also for non-i.i.d. observations, see for example Singh (1981), Lahiri (2003), and the references therein.In Section 2.2 the moving block bootstrap introduced by Künsch (1989) and Liu and Singh(1992) is used to approximate the distribution of an α -mixing stochastic process.It is, also in the non-i.i.d. case, still desirable that the estimator is qualitatively robusteven for the bootstrap approximation. That is, the distribution of the estimator under thebootstrap approximation L P ∗ n ( S n ) , n ∈ N , of the assumed, ideal distribution P n should stillbe close to the distribution of the estimator under the bootstrap approximation L Q ∗ n ( S n ) , n ∈ N , of the real contaminated distribution Q n . Remember that this is a random object as P ∗ n respectively Q ∗ n are random. For notational convenience all bootstrap values are notedas usual with an asterisk.To show qualitative robustness often generalizations of Hampel’s theorem are used, as itis often hard to show qualitative robustness directly. For the i.i.d. case Hampel’s Theoremensures qualitative robustness of a sequence of estimators, if these estimators are continuousand can be represented by a statistical operator which is continuous in the distribution of the2ata generating stochastic process. Accordingly we try to ﬁnd results similar to Hampel’stheorem for the case of bootstrap approximations for non-i.i.d. cases.Generalizations of Hampel’s theorem to non-i.i.d. cases can be found in Zähle (2015) andZähle (2016). For a slightly diﬀerent generalization of qualitative robustness, Hampel’stheorem has been formulated for strongly stationary and ergodic processes in Cox (1981) andBoente et al. (1982). In Strohriegl and Hable (2016) a generalization of Hampel’s Theoremto a broad class of non-i.i.d. stochastic processes is given. Cuevas and Romo (1993) describesa concept of qualitative robustness of bootstrap approximations for the i.i.d. case and forreal valued estimators. Also a generalization of Hampel’s theorem to this case is given. InChristmann et al. (2013, 2011) qualitative robustness of Efron’s bootstrap approximationis shown for the i.i.d. case for a class of regularized kernel based learning methods, i. e. notnecessarily real valued estimators. Moreover Beutner and Zähle (2016) describes consistencyof the bootstrap for plug in estimators.The next chapter contains a deﬁnition of qualitative robustness of the bootstrap approxi-mation of an estimator and the main results. In Chapter 2.1 Theorem 2.2 shows qualitativerobustness of the bootstrap approximation of an estimator for independent but not neces-sarily identically distributed random variables, Chapter 2.2 contains Theorem 2.6 and 2.7which generalize the result in Christmann et al. (2013) to α -mixing sequences with valuesin R d . All proofs are deferred to the appendix. Throughout this paper, let ( Z , d Z ) be a Polish space with some metric d Z and Borel- σ -algebra B . Denote by M ( Z N ) the set of all probability measures on ( Z N , B ⊗ N ) . Let ( Z N , B ⊗ N , M ( Z N )) be the underlying statistical model. If nothing else is stated, we alwaysuse Borel- σ -algebras for topological spaces. Let ( Z i ) i ∈ N be the coordinate process on Z N ,that is Z i : Z N → Z , ( z j ) j ∈ N z i , i ∈ N . Then the process has law P N under P N ∈ M ( Z N ) .Moreover let P n := ( Z , . . . , Z n ) ◦ P N be the n -th order marginal distribution of P N for every n ∈ N and P N ∈ M ( Z N ) . We are concerned with a sequence of estimators ( S n ) n ∈ N on thestochastic process ( Z i ) i ∈ N . The estimator may take its values in any Polish space H withsome metric d H ; that is, S n : Z n → H for every n ∈ N .Our work applies to estimators which can be represented by a statistical operator S : M ( Z ) → H , that is, S (cid:0) P w n (cid:1) = S n ( w n ) = S n ( z , . . . , z n ) ∀ w n = ( z , . . . , z n ) ∈ Z n , ∀ n ∈ N , (1)where P w n denotes the empirical measure deﬁned by P w n ( B ) := n P ni =1 I B ( z i ) , B ∈B , for the observations w n = ( z , ..., z n ) ∈ Z n . Examples of such estimators are M-estimators, R-estimators, see Huber (1981, Theorem 2.6), or Support Vector Machines,see Hable and Christmann (2011). 3ased on the generalization of Hampel’s concept of Π -robustness from Bustos (1980), wedeﬁne qualitative robustness for bootstrap approximations for non-i.i.d sequences of randomvariables. The stronger concept of Π -robustness is needed here, as we do not assume to havei.i.d. random variables, which are used in Cuevas and Romo (1993).Therefore the deﬁnition of qualitative robustness stated below is stronger than the deﬁnitionin Cuevas and Romo (1993), i. e. if we use this deﬁnition for the i.i.d. case the assumption d BL ( P n , Q n ) = d BL ( ⊗ ni =1 P, ⊗ ni =1 Q ) < δ implies d BL ( P, Q ) < δ , where d BL denotes thebounded Lipschitz metric. This can be seen similar to the proof of Lemma 3.1 in Section2.1.Now, let P ∗ N be the approximation of P N with respect to the bootstrap. Deﬁne the boot-strap sample ( Z ∗ , . . . , Z ∗ n ) as the ﬁrst n coordinate projections Z ∗ i : Z N → Z , wherethe law of the stochastic process ( Z ∗ i ) i ∈ N has to be chosen according to the bootstrapprocedure. For the empirical bootstrap, for example, the bootstrap sample is chosenvia drawing with replacement from the given observations z , . . . , z ℓ , ℓ ∈ N . Hence thedistribution of the bootstrap sample is ⊗ n ∈ N ℓ P ℓi =1 δ z i , with ﬁnite sample distributions ⊗ nj =1 1 ℓ P ℓi =1 δ z i = ( Z ∗ , . . . , Z ∗ n ) (cid:16) ⊗ n ∈ N ℓ P ℓi =1 δ z i (cid:17) .Contrarily to the classical case of qualitative robustness the distribution of the estimator un-der P ∗ n , L P ∗ n ( S n ) is a random probability measure, as the distribution P ∗ n = ⊗ ni =1 1 ℓ P ℓi =1 δ Z ∗ i , Z ∗ i : Z N → Z , is random. Hence the mapping z N

7→ L P ∗ n ( S n ) , z N ∈ Z N , is itself arandom variable with values in M ( H ) , i. e. on the space of probability measures on H ,equipped with the weak topology on M ( H ) . The measurability of this mapping is ensuredby Beutner and Zähle (2016, Lemma D1).Contrarily to the original deﬁnitions of qualitative robustness in Bustos (1980) the boundedLipschitz metric d BL is used instead of the Prohorov metric π for the deﬁnition of qualitativerobustness of the bootstrap approximation below. This is equivalent to Cuevas and Romo(1993). Let X be a separable metric space, then the bounded Lipschitz metric on the spaceof probability measures M ( X ) on X is deﬁned by: d BL ( P, Q ) := sup (cid:26)(cid:12)(cid:12)(cid:12)(cid:12)Z f dP − Z f dQ (cid:12)(cid:12)(cid:12)(cid:12) ; f ∈ BL ( X ) , k f k BL ≤ (cid:27) where k·k BL := |·| + k·k ∞ denotes the bounded Lipschitz norm with | f | = sup x = y | f ( x ) − f ( y ) | d ( x,y ) and k · k ∞ the supremum norm k f k ∞ := sup x | f ( x ) | and the space of bounded Lipschitzfunctions is deﬁned as BL := { f : X → R | f Lipshitz and k f k BL < ∞} . This is due to tech-nical reasons only. Both metrics metricize the weak topology on the space of all probabilitymeasures M ( X ) , for Polish spaces X , see, for example, Huber (1981, Chapter 2, Corollary4.3) or Dudley (1989, Theorem 11.3.3), and therefore can be replaced while adapting δ onthe left hand-side of implication (2). If X is a Polish space, so is M ( X ) with respect to theweak topology, see Huber (1981, Chapter 2, Theorem 3.9). Hence the bounded Lipschitzmetric on the right-hand side of implication (2) operates on a space of probability measureson the Polish space M ( X ) . Therefore the Prohorov metric and the bounded Lipschitz met-ric can again be replaced while adapting ε in (2). Similar to Cuevas and Romo (1993) the4roof of the theorems below rely on the fact that the set of bounded Lipschitz functionsBL is a uniform Glivenko-Cantelli class, which implies uniform convergence of the boundedLipschitz metric of the empirical measure to a limiting distribution, see Dudley et al. (1991).Therefore the deﬁnition is given with respect to the bounded Lipschitz metric. Deﬁnition 2.1 (Qualitative robustness for bootstrap approximations)

Let P N ∈ M ( Z N ) and let P ∗ N ∈ M ( Z N ) be the bootstrap approximation of P N . Let P ⊂M ( Z N ) with P N ∈ P . Let S n : Z n → H , n ∈ N , be a sequence of estimators. Then thesequence of bootstrap approximations ( L P ∗ n ( S n )) n ∈ N is called qualitatively robust at P N withrespect to P if, for every ε > , there is δ > such that there is n ∈ N such that for every n ≥ n and for every Q N ∈ P , d BL ( P n , Q n ) < δ ⇒ d BL ( L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n ))) < ε. (2) Here L ( L P ∗ n ( S n )) (respectively L ( L Q ∗ n ( S n )) ) denotes the distribution of the bootstrap approx-imation of the estimator S n under P ∗ n (respectively Q ∗ n ). This deﬁnition of qualitative robustness with respect to the subset P indicates that we donot show (2) for arbitrary probability measures Q N ∈ M ( Z N ) . All of our results requirethe contaminated process to at least have the same structure as the ideal process. This isdue to the use of the bootstrap procedure. The empirical bootstrap, which is used below,only works well for a few processes, see for example Lahiri (2003), hence the assumptionson the contaminated process are necessary. To our best knowledge there are no resultsconcerning qualitative robustness of the bootstrap approximation for general stochasticprocesses without any assumptions on the second process and it is probably very hard toshow this for every Q N ∈ M ( Z N ) , respectively P = M ( Z N ) . Another diﬀerence to theclassical deﬁnition of qualitative robustness in Bustos (1980) is the restriction to n ≥ n .As the results for the bootstrap are asymptotic results, we can not achieve the equicontinuityfor every n ∈ N , but only asymptotically.As the estimators can be represented by a statistical operator which depends on the empiricalmeasure it is crucial to concern stochastic processes which at last provide convergence of theirempirical measure. Therefore, Strohriegl and Hable (2016) proposed to choose Varadarajanprocess. Let (Ω , A , µ ) be a probability space. Let ( Z i ) i ∈ N , Z i : Ω → Z , i ∈ N , be a stochasticprocess and W n := ( Z , . . . , Z n ) . Then the stochastic process ( Z i ) i ∈ N is called a (strong)Varadarajan process if there exists a probability measure P ∈ M ( Z ) such that π ( P W n , P ) −−−−→ n →∞ almost surely.The stochastic process ( Z i ) i ∈ N is called weak Varadarajan process if π ( P W n , P ) −−−−→ n →∞ in probability.Examples for Varadarajan processes are certain Markov Chains, some mixing processes,ergodic process and processes which satisfy a law of large numbers for events in the senseof Steinwart et al. (2009, Deﬁnition 2.1), see Strohriegl and Hable (2016) for details.5 .1 Qualitative robustness for independent not identically distributedprocesses In this section we relax the i.i.d. assumption in view of the identical distribution. We as-sume the random variables Z i , i ∈ N , to be independent, but not necessarily identicallydistributed.The result below generalizes Christmann et al. (2013, Theorem 3) and Christmann et al.(2011), as the assumptions on the stochastic process are weaker as well as those on thestatistical operator. Compared to Theorem 3 in Cuevas and Romo (1993), which showsqualitative robustness of the sequence of bootstrap estimators with values in R , we haveto strengthen the assumptions on the sample space, but do not need the estimator tobe uniformly continuous. But keep in mind, that the assumption d BL ( P n , Q n ) < δ im-plies d BL ( P, Q ) < δ , which is used for the i.i.d. case, in Christmann et al. (2013) andCuevas and Romo (1993). Theorem 2.2

Let the sequence of estimators ( S n ) n ∈ N be represented by a statistical opera-tor S : M ( Z ) → H via (1) for a Polish space H and let ( Z , d Z ) be a totally bounded metricspace.Let P N = ⊗ i ∈ N P i , P i ∈ M ( Z ) be an inﬁnite product measure such that the coordinate pro-cess ( Z i ) i ∈ N , Z i : Z N → z i , i ∈ N , is a strong Varadarajan process with limiting distribution P . Moreover deﬁne P := (cid:8) Q N ∈ M ( Z N ); Q N = ⊗ i ∈ N Q i , Q i ∈ M ( Z ) (cid:9) . Let S : M ( Z ) → H be continuous at P with respect to d BL and let the estimators S n : Z n → H, n ∈ N , becontinuous.Then the sequence of bootstrap approximations ( L P ∗ n ( S n )) n ∈ N , is qualitatively robust at P N with respect to P . Remark 2.3

The required properties on the statistical operator S and on the sequence ofestimators ( S n ) n ∈ N in Theorem 2.2 ensure the qualitative robustness of ( S n ) n ∈ N , as long asthe assumptions on the underlying stochastic processes are fulﬁlled.The proof shows that the bootstrap approximation of every sequence of estimators ( S n ) n ∈ N which is qualitatively robust in the sense of the deﬁnitions in Bustos (1980) and Strohriegl and Hable(2016, Deﬁnition 1) is qualitatively robust in the sense of Theorem 2.2. Hence Hampel’s theorem for the i.i.d. case can be generalized to bootstrap approximationsand to the case of not necessarily identically distributed random variables if qualitativerobustness is based on the deﬁnition of Π -robustness.Unfortunately, the assumption on the space ( Z , d Z ) to be totally bounded seems to benecessary. In the proof of Theorem 2.2 we use a result of Dudley et al. (1991) to showuniformity on the space of probability measures M ( Z ) . This result needs the boundedLipschitz functions to be a uniform Glivenko-Cantelli class, which is equivalent to ( Z , d Z ) being totally bounded, see Dudley et al. (1991, Proposition 12). In order to weaken the6ssumption on ( Z , d Z ) , probably another way to show uniformity on the space of probabilitymeasures M ( Z ) has to be found.A short look on the metrics used on Z n is advisable. We consider Z n as the n -fold productspace of the Polish space ( Z , d Z ) . The product space Z n is again a Polish space (in theproduct topology) and it is tempting to use a p -product metric d n,p on Z n , that is, d n,p (cid:0) ( z , . . . , z n ) , ( z ′ , . . . , z ′ n ) (cid:1) = (cid:13)(cid:13)(cid:0) d Z ( z , z ′ ) , . . . , d Z ( z n , z ′ n ) (cid:1)(cid:13)(cid:13) p (3)where k · k p is a p n -norm on R n for ≤ p ≤ ∞ . For example, d n, is the Euclidean metricon R n and d n, ∞ (cid:0) ( z , . . . , z n ) , ( z ′ , . . . , z ′ n ) (cid:1) = max i d ( z i , z ′ i ) ; all these metrics are stronglyequivalent. However, these common metrics do not cover the intuitive meaning of qualitativerobustness as the distance between two points in Z n (i.e., two data sets) is small only ifall coordinates are close together (small rounding errors). So points where only a smallfraction of the coordinates are far-oﬀ (gross errors) are excluded. Using these metrics, thequalitative robustness of the sample mean at every P N ∈ M ( Z N ) can be shown, see e.g.Strohriegl and Hable (2016, Proposition 1). But the sample mean is a highly non-robustestimator, as gross errors have great impact on the estimate. Following Boente et al. (1987),we use the metric d n on Z n : d n (cid:0) ( z , . . . , z n ) , ( z ′ , . . . , z ′ n ) (cid:1) = inf (cid:8) ε > ♯ { i : d ( z i , z ′ i ) ≥ ε } /n ≤ ε (cid:9) . (4)This metric on Z n covers both kinds of "small errors". Though d n is not strongly equiv-alent to d n,p in general, it is topologically equivalent to the p -product metrics d n,p , seeStrohriegl and Hable (2016, Lemma 1). Hence, Z n is metrizable also with metric d n . More-over the continuity of S n on Z n is with respect to the product topology on Z n which can,due to the topological equivalence of these two metrics, be seen with respect to the commonmetrics d n,p .The next part gives two examples of stochastic processes of independent, but not necessarilyidentically distributed random variables, which are Varadarajan processes. In particularthese stochastic processes even satisfy a strong law of large numbers for events (SLLNE)in the sense of Steinwart et al. (2009) and therefore are, due to Strohriegl and Hable (2016,Theorem 2), strong Varadarajan processes. The ﬁrst example is rather simple and describesa sequence of univariate normal distributions. Example 1

Let ( a i ) i ∈ N ⊂ R be a sequence with lim i →∞ a i = a ∈ R and let | a i | ≤ c , forsome constant c > for all i ∈ N . Let ( Z i ) i ∈ N , Z i : Ω → R , be a stochastic process where Z i , i ∈ N , are independent and Z i ∼ N ( a i , , i ∈ N . Then the process ( Z i ) i ∈ N is a strongVaradarajan process. The second example are stochastic processes where the distributions of the random variables Z i , i ∈ N , are lying in a so-called shrinking ε -neighbourhood of a probability measure P . Example 2

Let ( Z , B ) be a measurable space and let ( Z i ) i ∈ N be a stochastic process withindependent random variables Z i : Ω → Z , Z i ∼ P i , where P i = (1 − ε i ) P + ε ˜ P i or a sequence ε i → , i → ∞ , ε i > and ˜ P i , P ∈ M ( Z ) , i ∈ N . Then the process ( Z i ) i ∈ N is a strong Varadarajan process. The next corollary shows, that Support Vector Machines are qualitatively robust. For adetailed introduction to Support Vector Machines see e.g., Schölkopf and Smola (2002) andSteinwart and Christmann (2008). Let D n := ( z , z , . . . , z n ) = (( x , y ) , ( x , y ) , . . . , ( x n , y n )) be a given dataset. Corollary 2.4

Let Z = X × Y , Y ⊂ R closed, be a totally bounded, metric space and let ( Z i ) i ∈ N be a stochastic process where the random variables Z i , i ∈ N , are independent and Z i ∼ P i := (1 − ε i ) P + ε i ˜ P i , P, ˜ P i ∈ M ( Z ) . Moreover let ( λ n ) n ∈ N be a sequence of positivereal valued numbers with λ n → λ , n → ∞ , for some λ > . Let H be a reproducingkernel Hilbert space with continuous and bounded kernel k and let S λ n : ( X × Y ) n → H bethe SVM estimator, which maps D n to f L ∗ ,D n ,λ n for a continuous and convex loss function L : X × Y × Y → [0 , ∞ [ . It is assumed that L ( x, y, y ) = 0 for every ( x, y ) ∈ X × Y and that L is additionally Lipschitz continuous in the last argument.Then we have for every ε > there is δ > such that there is n ∈ N such that for all n ≥ n and for every process ( ˜ Z i ) i ∈ N , where ˜ Z i are independent and have distribution Q i , i ∈ N : d BL ( P n , Q n ) < δ ⇒ d BL ( L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n ))) < ε. That is, the sequence of bootstrap approximations is qualitatively robust if the second(contaminated) process ( ˜ Z i ) i ∈ N is still of the same kind, i.e. still independent, as the originaluncontaminated process ( Z i ) i ∈ N . α -mixingprocesses Dropping the independence assumption we now focus on real valued mixing processes, inparticular on strongly stationary α -mixing or strong mixing stochastic processes. The mixingnotion is an often used and well-accepted dependence notion which quantiﬁes the degreeof dependence of a stochastic process. There exist several types of mixing coeﬃcients, butall of them are based on diﬀerences between probabilities µ ( A ∩ A ) − µ ( A ) µ ( A ) . Thereis a large literature on this dependence structure. For a detailed overview on mixing, seeBradley (2005), Bradley (2007a,b,c), and Doukhan (1994) and the references therein. The α -mixing structure has been introduced in Rosenblatt (1956). Also examples of relationsbetween dependence structures and mixing coeﬃcients can be found in the references above.Let Ω be a set equipped with two σ -algebras A and A and a probability measure µ . Thenthe α -mixing coeﬃcient is deﬁned by α ( A , A , µ ) := sup {| µ ( A ∩ A ) − µ ( A ) µ ( A ) | | A ∈ A , A ∈ A } . By deﬁnition the coeﬃcients equal zero, if the σ -algebras are independent.8oreover mixing can be deﬁned for stochastic processes. We follow Steinwart et al. (2009,Deﬁnition 3.1): Deﬁnition 2.5

Let ( Z i ) i ∈ N be a stochastic process, Z i : Ω → Z , i ∈ N , and let σ ( Z i ) be the σ -algebra generated by Z i , i ∈ N . Then the α - bi - and the α -mixing coeﬃcients are deﬁnedby α (( Z ) i ∈ N , µ, i, j ) = α ( σ ( Z i ) , σ ( Z j ) , µ ) α (( Z ) i ∈ N , µ, n ) = sup i ≥ α ( σ ( Z i ) , σ ( Z i + n ) , µ ) . A stochastic process ( Z i ) i ∈ N is called α - mixing with respect to µ if lim n →∞ α (( Z ) i ∈ N , µ, n ) = 0 . It is called weakly α - bi -mixing with respect to µ if lim n →∞ n n X i =1 n X j =1 α (( Z ) i ∈ N , µ, i, j ) = 0 . Instead of Efron’s empirical bootstrap another bootstrap approach is used in order to rep-resent the dependence structure of an α -mixing process. Künsch (1989) and Liu and Singh(1992) introduced the moving block bootstrap (MBB). Often resampling of single observa-tions can not preserve the dependence structure of the process, therefore they decided totake blocks of length b of observations instead. The dependence structure of the process ispreserved, within these blocks. The block length b increases with the number of observa-tions n for asymptotic considerations. A slight modiﬁcation of the original moving blockbootstrap, see for example Politis and Romano (1990) and Shao and Yu (1993), is used inthe next two theorems in order to avoid edge eﬀects.The proofs are based on central limit theorems for empirical processes. There are severalresults concerning the moving block bootstrap of the empirical process in case of mix-ing processes, see for example Bühlmann (1994), Naik-Nimbalkar and Rajarshi (1994), andPeligrad (1998, Theorem 2.2) for α -mixing sequences and Radulović (1996) and Bühlmann(1995) for β -mixing sequences. To our best knowledge there are so far no results concerningqualitative robustness for bootstrap approximations of estimators for α -mixing stochasticprocesses. Therefore, Theorem 2.6 shows qualitative robustness for a stochastic process withvalues in R . The proof is based on Peligrad (1998, Theorem 2.2), which provides a centrallimit theorem under assumptions on the process, which are weaker than those in Bühlmann(1994) and Naik-Nimbalkar and Rajarshi (1994). In the case of R d -valued, d > , stochasticprocesses, stronger assumptions on the stochastic process are needed, as the central limittheorem in Bühlmann (1994) requires stronger assumptions, see Theorem 2.7.Let Z , . . . , Z n , n ∈ N , be the ﬁrst n projections of a real valued stochastic process ( Z i ) i ∈ N and let b ∈ N , b < n , be the block length. Then, for ﬁxed n ∈ N , the sample can be9ivided into blocks B i,b := ( Z i , . . . , Z i + b − ) . If i > n − b + 1 , we deﬁne Z n + j = Z j , for themissing elements of the blocks. To get the MBB bootstrap sample W ∗ n = ( Z ∗ , . . . , Z ∗ n ) , ℓ numbers I , . . . , I ℓ from the set { , . . . , n } are randomly chosen with replacement. Withoutloss of generality it is assumed that n = ℓb , if n is not a multiple of b we simply cutthe last block, which is usually done in literature. Then the sample consists of the blocks B I ,b , B I ,b , . . . , B I ℓ ,b , that is Z ∗ = Z I , Z ∗ = Z I +1 , . . . , Z ∗ b = I + b − , Z ∗ b +1 = Z I , . . . , Z ∗ ℓb = Z I ℓ + b − .As we are interested in estimators S n , n ∈ N , which can be represented by a statisticaloperator S : M ( Z ) → H via S ( P w n ) = S n ( z , . . . , z n ) , for a Polish space H , see (1), theempirical measure of the bootstrap sample P W ∗ n = n P ni =1 δ Z ∗ i should approximate theempirical measure of the original sample P W n = n P ni =1 δ Z i . Contrarily to qualitativerobustness in the case of independent and not necessarily identically distributed randomvariables (Theorem 2.2), the assumptions on the statistical operator S are strengthened forthe case of α -mixing sequences. In particular the statistical operator S is assumed to beuniformly continuous for all P ∈ ( M ( Z ) , d BL ) . For the ﬁrst theorem we assume the randomvariables Z i , i ∈ N , to be real valued and bounded. Without loss of generality we assume ≤ Z i ≤ , otherwise a transformation leads to this assumption. For the bootstrap forthe true as well as for the contaminated process, we assume the block length b ( n ) and thenumber of blocks ℓ ( n ) to be sequences of integers satisfying n h ∈ O ( b ( n )) , b ( n ) ∈ O ( n / − a ) , for some < h < − a, < a < ,b ( n ) = b (2 q ) for q ≤ n < q +1 , q ∈ N , b ( n ) → ∞ , n → ∞ and b ( n ) · ℓ ( n ) = n , n ∈ N . Theorem 2.6

Let P N ∈ M ( R N ) be a probability measure on ( R N , B ⊗ N ) such that the coor-dinate process ( Z i ) i ∈ N , Z i : R N → R is bounded, strongly stationary, and α -mixing with X m>n α ( σ ( Z , . . . , Z i ) , σ ( Z i + m , . . . ) , P N ) = O ( n − γ ) , i ∈ N , for some γ > . (5) Let

P ⊂ M ( R N ) be the set of probability measures such that the coordinate process fulﬁlsthe properties above for the same γ > . Let H be a Polish space, with some metric d H ,let ( S n ) n ∈ N be a sequence of estimators which can be represented by a statistical operator S : M ( R ) → H via (1) . Moreover let S n be continuous and let S be additionally uniformlycontinuous with respect to d BL . Then the sequence of estimators ( S n ) n ∈ N is qualitativelyrobust at P N with respect to P . The assumptions on the stochastic process are on the one hand, together with the assump-tions on the block length, used to ensure the validity of the bootstrap approximation andon the other hand, together with the assumptions on the statistical operator, respectivelythe sequence of estimators, to ensure the qualitative robustness.10he next theorem generalizes this result to stochastic processes with values in [0 , d , d > ,instead of [0 , ⊂ R . Therefore, for example, the bootstrap version of the SVM estimatoris qualitatively robust under weak conditions. The proof of the next theorem follows thesame lines as the proof of the theorem above, but another central limit theorem, which isshown in Bühlmann (1994), is used. Therefore the assumptions on the mixing property ofthe stochastic process are stronger and the random variables Z i , i ∈ N , are assumed tohave continuous marginal distributions. Again the bootstrap sample results of a movingblock bootstrap where ℓ ( n ) blocks of length b ( n ) are chosen, again assuming ℓ ( n ) · b ( n ) = n .Moreover, let b ( n ) be a sequences of integers satisfying b ( n ) = O ( n − a ) for some a > . Theorem 2.7

Assume Z = [0 , d , d > . Let P N be a probability measure such that thecoordinate process ( Z i ) i ∈ N , Z i : Z N → Z is strongly stationary and α -mixing with ∞ X m =0 ( m + 1) d +7 ( α ( σ ( Z , . . . , Z i ) , σ ( Z i + m , . . . ) , P N )) < ∞ , i ∈ N . (6) Assume that Z i has continuous marginal distributions for all i ∈ N . Deﬁne the set ofprobability measures P ⊂ M ( Z ) such that the coordinate process is strongly stationary and α -mixing as in (6) .Let H be a Polish space, wit some metric d H , ( S n ) n ∈ N be a sequence of estimators such that S n : Z n → H is continuous and assume that S n can be represented by a statistical operator S : M ( Z ) → H via (1) which is additionally uniformly continuous with respect to d BL .Then the sequence of estimators ( S n ) n ∈ N is qualitatively robust at P N with respect to P . Although the assumptions on the statistical operator S , compared to Theorem 2.2, werestrengthened in order to generalize the qualitative robustness to α -mixing sequences inTheorem 2.6 and 2.7, M-estimators are still an example for qualitative robust estimatorsif the sample space ( Z , d Z ) , Z ⊂ R is compact. The compactness of ( Z , d Z ) implies thecompactness of the space ( M ( Z ) , d BL ) , see Parthasarathy (1967, Theorem 6.4). As thestatistical operator S is continuous, the compactness of M ( Z ) implies the uniform continuityof S . Another example of M-estimators which are uniformly continuous even if the inputspace is not compact is given in Cuevas and Romo (1993, Theorem 4). Acknowledgements:

This research was partially supported by the DFG Grant 291/2-1"Support Vector Machines bei stochastischer Unabhängigkeit". Moreover I would like tothank Andreas Christmann for helpful discussions on this topic.

This section contains the proofs of the main theorems and corollaries.11 .1 Proofs of Section 2.1

Before proving Theorem 2.1, we state a rather technical lemma, connecting the productmeasure ⊗ ni =1 P i ∈ M ( Z n ) of independent random variables to their mixture measure n P ni =1 P i ∈ M ( Z ) . Let ( Z , d Z ) be a Polish space. Lemma 3.1

Let P n , Q n ∈ M ( Z n ) such that P n = ⊗ ni =1 P i and Q n = ⊗ ni =1 Q i , P i , Q i ∈M ( Z ) , i ∈ N . Then for all δ > : d BL ( P n , Q n ) ≤ δ ⇒ d BL n n X i =1 P i , n n X i =1 Q i ! ≤ δ. Proof:

Let BL be the set of bounded Lipschitz functions with k f k BL ≤ .By assumptionwe have d BL ( P n , Q n ) ≤ δ . Moreover for a function f : Z → R : Z Z f ( z i ) dP i ( z i ) = Z Z n − Z Z f ( z i ) dP i ( z i ) d (cid:0) ⊗ j = i P j ( z j ) (cid:1) . (7)Then, sup f ∈ BL ( Z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z Z f ( z i ) d " n n X i =1 P i ( z i ) − Z Z f ( z i ) d " n n X i =1 Q i ( z i ) = sup f ∈ BL ( Z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 (cid:20)Z Z f ( z i ) dP i ( z i ) − Z Z f ( z i ) dQ i ( z i ) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (7) = sup f ∈ BL ( Z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 (cid:20)Z Z n − Z Z f ( z i ) dP i ( z i ) d (cid:0) ⊗ j = i P j ( z j ) (cid:1) − Z Z n − Z Z f ( z i ) dQ i ( z i ) d (cid:0) ⊗ j = i Q j ( z j ) (cid:1)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) = sup f ∈ BL ( Z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 (cid:20)Z Z n f ( z i ) d (cid:0) ⊗ nj =1 P j ( z j ) (cid:1) − Z Z n f ( z i ) d (cid:0) ⊗ nj =1 Q j ( z j ) (cid:1)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n n X i =1 sup f ∈ BL ( Z ) (cid:12)(cid:12)(cid:12)(cid:12)Z Z n f ( z i ) d (cid:0) ⊗ nj =1 P j ( z j ) (cid:1) − Z Z n f ( z i ) d (cid:0) ⊗ nj =1 Q j ( z j ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) . Now every function f ∈ BL ( Z ) can be identiﬁed as a function ˜ f : Z n → Z , ( z , . . . , z n ) ˜ f ( z , . . . , z n ) := f ( z i ) . This function is also Lipschitz continuous on Z n : | ˜ f ( z , . . . , z n ) − ˜ f ( z ′ , . . . , z ′ n ) | = | f ( z i ) − f ( z ′ i ) |≤ | f | d ( z i , z ′ i ) ≤ | f | ( d Z ( z , z ′ ) + . . . + d Z ( z i , z ′ i ) + . . . + d Z ( z n , z ′ n )) , d Z ( z , z ′ ) + . . . + d Z ( z i , z ′ i ) + . . . + d Z ( z n , z ′ n ) induces the product topology on Z n .That is ˜ f ∈ BL ( Z n ) . Note that this is also true for every p -product metric d n,p in Z n , ≤ p ≤ ∞ , as they are strongly equivalent. Hence, d BL n n X i =1 P i , n n X i =1 Q i ! ≤ n n X i =1 sup g ∈ BL ( Z n ) (cid:12)(cid:12)(cid:12)(cid:12)Z Z n g dP n − Z Z n g dQ n (cid:12)(cid:12)(cid:12)(cid:12) ≤ n n X i =1 d BL ( P n , Q n ) ≤ δ, which yields the assertion. (cid:3) Proof of Theorem 2.2:

To prove Theorem 2.2 we ﬁrst use the triangle inequality tosplit the bounded Lipschitz distance between the distribution of the estimator S n , n ∈ N ,into two parts regarding the distribution of the estimator under the joint distribution P n of ( Z , . . . , Z n ) : d BL ( L P ∗ n ( S n ) , L Q ∗ n ( S n )) ≤ d BL ( L P ∗ n ( S n ) , L P n ( S n )) | {z } I + d BL ( L P n ( S n ) , L Q ∗ n ( S n )) | {z } II . Then the representation of the estimator S n by the statistical operator S and the conti-nuity of this operator in P together with the Varadarajan property and the independenceassumption on the stochastic process yield the assertion.First we regard part I: Deﬁne the distribution P N ∈ M ( Z N ) and let P ∗ N be the bootstrapapproximation of P N . Deﬁne, for n ∈ N , the random variables W n : Z N → Z n , W n = ( Z , . . . , Z n ) , z N W n ( z N ) = w n = ( z , . . . , z n ) , and W ′ n : Z N → Z n , W ′ n = ( Z ′ , . . . , Z ′ n ) , z N w ′ n ,such that W n ( P N ) = P n and W ′ n ( P ∗ N ) = P ∗ n .Denote the bootstrap sample by W ∗ n := ( Z ∗ , . . . , Z ∗ n ) , W ∗ n : Z N → Z n , z N w ∗ n .As Efron’s empirical bootstrap is used, the bootstrap sample, which is chosen via resamplingwith replacement out of Z , . . . , Z ℓ , ℓ ∈ N , has distribution Z ∗ i ∼ P W ℓ = ℓ P ℓj =1 δ Z j , i ∈ N ,respectively W ∗ n := ( Z ∗ , . . . , Z ∗ n ) ∼ ⊗ ni =1 P W ℓ . The bootstrap approximation of P ℓ , ℓ ∈ N ,is the empirical measure of the bootstrap sample P ∗ ℓ = ⊗ ℓi =1 1 n P nj =1 δ Z ∗ j .Further denote the joint distribution of W N , W ∗ N , and W ′ N by K N ∈ M ( Z N × Z N × Z N ) .Then, K N has marginal distributions K N ( B × Z N × Z N ) = P N ( B ) for all B ∈ B ⊗ N , K N ( Z N × B × Z N ) = ⊗ i ∈ N P W n ( B ) for all B ∈ B ⊗ N , and K N ( Z N × Z N × B ) = P ∗ N ( B ) for all B ∈ B ⊗ N .Then, L P n ( S n ) = S n ( P n ) = S n ◦ W n ( P N ) and L P ∗ n ( S n ) = S n ( P ∗ n ) = S n ◦ W ′ n ( P ∗ N ) d BL ( L P ∗ n ( S n ) , L P n ( S n )) = d BL ( L ( S n ◦ W ′ n ) , L ( S n ◦ W n )) . By assumption the coordinate process ( Z i ) i ∈ N consists of independent random variables,hence we have P n = ⊗ ni =1 P i , for P i = Z i ( P N ) , i ∈ N .Moreover ( Z , d Z ) is assumed to be a totally bounded metric space. Then, due to Dudley et al.(1991, Proposition 12), the set BL ( Z , d Z ) is a uniform Glivenko-Cantelli class. That is, if Z i ∼ P i.i.d. i ∈ N , we have for all η > : lim n →∞ sup P ∈M ( Z ) P N (cid:18)(cid:26) z N ∈ Z N | sup m ≥ n d BL ( P W m ( z N ) , P ) > η (cid:27)(cid:19) = 0 . Applying this to the bootstrap sample ( Z ∗ , . . . , Z ∗ m ) , m ∈ N , which is found by resamplingwith replacement out of the original sample ( Z , . . . , Z n ) , we have, for all w n ∈ Z n , lim n →∞ sup P w n ∈M ( Z ) ⊗ i ∈ N P w n (cid:18)(cid:26) z N ∈ Z N | sup m ≥ n d BL ( P W ∗ m ( z N ) , P w n ) > η (cid:27)(cid:19) = 0 . Let ε > be arbitrary but ﬁxed. Then, for every δ > there is n ∈ N such that for all n ≥ n and all P w n ∈ M ( Z ) : ⊗ ni =1 P w n (cid:18)(cid:26) w ∗ n ∈ Z n | d BL ( P w ∗ n , P w n ) ≤ δ (cid:27)(cid:19) ≥ − ε . (8)And, using the same argumentation for the sequence of random variables Z ′ i , i ∈ N , whichare i.i.d. and have distribution n P ni =1 δ Z ∗ i = P W ∗ n : lim n →∞ sup P w ∗ n ∈M ( Z ) P ∗ N (cid:18)(cid:26) z N ∈ Z N | sup m ≥ n d BL ( P W ′ m ( z N ) , P w ∗ n ) > η (cid:27)(cid:19) = 0 . Respectively, for every δ > there is n ∈ N such that for all n ≥ n and all P w ∗ n ∈ M ( Z ) : P ∗ n (cid:18)(cid:26) w ′ n ∈ Z n | d BL ( P w ′ n , P w ∗ n ) ≤ δ (cid:27)(cid:19) ≥ − ε . (9)As the process ( Z i ) i ∈ N is a strong Varadarajan process by assumption, there exists a prob-ability measure P ∈ M ( Z ) such that d BL ( P W n , P ) −→ almost surely with respect to P N , n → ∞ . That is, for every δ > there is n ∈ N such that for all n ≥ n : P n (cid:18)(cid:26) w n ∈ Z n | d BL ( P w n , P ) ≤ δ (cid:27)(cid:19) ≥ − ε . (10)14he continuity of the statistical operator S : M ( Z ) → H in P ∈ M ( Z ) yields: for every ε > there exists δ > such that for all Q ∈ M ( Z ) : d BL ( P, Q ) ≤ δ ⇒ d H ( S ( P ) , S ( Q )) ≤ ε . (11)As the Prohorov metric π d H is bounded by the Ky Fan metric, see Dudley (1989, Theorem11.3.5) we conclude: π d H ( L P ∗ n ( S n ) , L P n ( S n )) = π d H ( S n ◦ W ′ n , S n ◦ W n ) ≤ inf (cid:8) ˜ ε > | K N (cid:0)(cid:8) d H ( S n ◦ W ′ n , S n ◦ W n ) > ˜ ε (cid:9)(cid:1) ≤ ˜ ε (cid:9) = inf (cid:8) ˜ ε > | ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d H ( S n ( w ′ n ) , S n ( w n )) > ˜ ε, w ∗ n ∈ Z n (cid:9)(cid:1) ≤ ˜ ε (cid:9) . (12)Due to the deﬁnition of the statistical operator S , this is equivalent to inf (cid:8) ˜ ε > | ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d H ( S ( P w ′ n ) , S ( P w n )) > ˜ ε, w ∗ n ∈ Z n (cid:9)(cid:1) ≤ ˜ ε (cid:9) . The triangle inequality d H ( S ( P w ′ n ) , S ( P w n )) ≤ d H ( S ( P w ′ n ) , S ( P )) + d H ( S ( P ) , S ( P w n )) , and the continuity of the statistical operator S , see (11), then yield, for all ε > , ( W n , W ∗ n , W ′ n )( K N ) (cid:16)n ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d H ( S ( P w ′ n ) , S ( P w n )) > ε , w ∗ n ∈ Z n o(cid:17) ≤ ( W n , W ∗ n , W ′ n )( K N ) (cid:16)n ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d H ( S ( P w ′ n ) , S ( P )) > ε or d H ( S ( P ) , S ( P w n )) > ε , w ∗ n ∈ Z n o(cid:17) (11) ≤ ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d BL ( P w ′ n , P ) > δ or d BL ( P, P w n ) > δ , w ∗ n ∈ Z n } ) . Using the triangle inequality, d BL ( P w ′ n , P ) ≤ d BL ( P w ′ n , P w ∗ n ) + d BL ( P w ∗ n , P ) (13)and d BL ( P w ∗ n , P ) ≤ d BL ( P w ∗ n , P w n ) + d BL ( P w n , P ) , (14)15ives for all n ≥ max { n , n , n } : ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d BL ( P w ′ n , P ) > δ or d BL ( P, P w n ) > δ , w ∗ n ∈ Z n } ) (13) ≤ ( W n , W ∗ n , W ′ n )( K N ) (cid:18)(cid:26) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d BL ( P w ′ n , P w ∗ n ) > δ or d BL ( P w ∗ n , P ) > δ or d BL ( P, P w n ) > δ (cid:27)(cid:19) (14) ≤ ( W n , W ∗ n , W ′ n )( K N ) (cid:18)(cid:26) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d BL ( P w ′ n , P w ∗ n ) > δ or d BL ( P w ∗ n , P w n ) > δ or d BL ( P, P w n ) > δ (cid:27)(cid:19) ≤ P ∗ n (cid:18)(cid:26) w ′ n ∈ Z n | d BL ( P w ′ n , P w ∗ n ) > δ (cid:27)(cid:19) + P n (cid:18)(cid:26) w n ∈ Z n | d BL ( P w n , P ) > δ (cid:27)(cid:19) + ⊗ ni =1 P w n (cid:18)(cid:26) w ∗ n ∈ Z n | d BL ( P w ∗ n , P w n ) > δ (cid:27)(cid:19) (8) , (9) , (10) < ε ε ε ε . Hence, for all ε > there are n , n , n ∈ N such that vor all n ≥ max { n , n , n } , theinﬁmum in (12) is bounded by ε . Therefore π d H ( L P ∗ n ( S n ) , L P n ( S n )) < ε . The equivalence between the Prohorov metric and the bounded Lipschitz metric for Polishspaces, see Huber (1981, Chapter 2, Corollary 4.3), yields the existence of n , ∈ N suchthat for all n ≥ n , : d BL ( L P ∗ n ( S n ) , L P n ( S n )) < ε . (15)To prove the convergence of the term in part II, consider the distribution Q N ∈ M ( Z N ) andlet Q ∗ N be the bootstrap approximation of Q N . Deﬁne, for n ∈ N , the random variables ˜ W n : Z N → Z n , ˜ W n = ( ˜ Z , . . . , ˜ Z n ) , z N ˜ w n with distribution ˜ W n ( Q N ) = Q n , ˜ W ′ n : Z N → Z n , ˜ W ′ n = ( ˜ Z ′ , . . . , ˜ Z ′ n ) , z N ˜ w ′ n , with distribution ˜ W ′ n ( Q ∗ N ) = Q ∗ n , andthe bootstrap sample ˜ W ∗ n : Z N → Z n , ˜ W ∗ n = ( ˜ Z ∗ , . . . , ˜ Z ∗ n ) , z N ˜ w ∗ n , with distribution ⊗ ni =1 Q ˜ W ℓ = ⊗ ni =1 1 ℓ P ℓi =1 δ ˜ Z i . Moreover let ˜ K N ∈ M ( Z N × Z N × Z N × Z N ) denote the joint distribution of W N , ˜ W N , ˜ W ∗ N , and ˜ W ′ N . Then, ˜ K N ∈ M ( Z N × Z N × Z N × Z N ) has marginal distributions P N , Q N , ⊗ i ∈ N Q ˜ W n , and Q ∗ N . 16irst, similar to the argumentation for part I, Efron’s bootstrap and Dudley et al. (1991,Proposition 12) give for ˜ w n ∈ Z n : lim n →∞ sup Q ˜ w n ∈M ( Z ) ⊗ n ∈ N Q ˜ w n (cid:18)(cid:26) z N ∈ Z N | sup m ≥ n d BL ( Q ˜ W ∗ m ( z N ) , Q ˜ w n ) > η (cid:27)(cid:19) = 0 . Hence, for arbitrary, but ﬁxed ε > , for every δ > there is n ∈ N such that for all n ≥ n and all Q ˜ w n ∈ M ( Z ) : ⊗ ni =1 Q ˜ w n (cid:18)(cid:26) ˜ w ∗ n ∈ Z n | d BL ( Q ˜ w ∗ n , Q ˜ w n ) ≤ δ (cid:27)(cid:19) ≥ − ε . (16)Further, lim n →∞ sup Q ˜ w ∗ n ∈M ( Z ) Q ∗ N (cid:18)(cid:26) z N ∈ Z N | sup m ≥ n d BL ( Q ˜ W ′ m ( z N ) , Q ˜ w ∗ n ) > η (cid:27)(cid:19) = 0 . Respectively, for every δ > there is n ∈ N such that for all n ≥ n and all Q ˜ w ∗ n = n P ni =1 δ ˜ z ∗ i ∈ M ( Z ) : Q ∗ n (cid:18)(cid:26) ˜ w ′ n ∈ Z n | d BL ( Q ˜ w ′ n , Q ˜ w ∗ n ) ≤ δ (cid:27)(cid:19) ≥ − ε . (17)Moreover, as the random variables Z i , Z i ∼ P i , i ∈ N , are independent, the boundedLipschitz distance between the empirical measure and n P ni =1 P i can be bounded, due toDudley et al. (1991, Theorem 7). As totally bounded spaces are particularly separable,see Denkowski et al. (2003, below Corollary 1.4.28), Dudley et al. (1991, Proposition 12)provides that BL ( Z , d Z ) is a uniform Glivenko-Cantelli class. The proof of this propositiondoes not depend on the distributions of the random variables Z i , i ∈ N , and is therefore alsovalid for independent and not necessarily identically distributed random variables. HenceDudley et al. (1991, Theorem 7) yields for all η > : lim n →∞ sup ( P i ) i ∈ N ∈ ( M ( Z )) N P N ( z N ∈ Z N | sup m ≥ n d BL P W m ( z N ) , n n X i =1 P i ! > η )! = 0 , as long as the assumptions of Proposition 12 in Dudley et al. (1991) apply. As BL ( Z , d Z ) is bounded, we have F = BL ( Z , d Z ) , see Dudley et al. (1991, page 499, before Proposition10), hence it is suﬃcient to show that BL ( Z , d Z ) is image admissible Suslin. By assump-tion ( Z , d Z ) is totally bounded, hence BL ( Z , d Z ) is separable with respect to k · k ∞ , seeStrohriegl and Hable (2016, Lemma 3). As f ∈ BL ( Z , d Z ) implies k f k ∞ ≤ , the spaceBL ( Z , d Z ) is a bounded subset of ( C b ( Z , d Z ) , k · k ∞ ) , which is due to Dudley (1989, Theo-rem 2.4.9) a complete space. Now, BL ( Z , d Z ) is a closed subset of ( C b ( Z , d Z ) , k · k ∞ ) with17espect to k · k ∞ . Hence BL ( Z , d Z ) is complete, due to Denkowski et al. (2003, Proposition1.4.17). Therefore BL ( Z , d Z ) is separable and complete with respect to k · k ∞ and partic-ularly a Suslin space, see Dudley (2014, p.229). As Lipschitz continuous functions are alsoequicontinuous, Dudley (2014, Theorem 5.28 (c)) gives that BL ( Z , d Z ) is image admissibleSuslin.Hence, Dudley et al. (1991, Theorem 7) yields sup ( P i ) i ∈ N ∈ ( M ( Z )) N d BL P W n , n n X i =1 P i ! −→ almost surely with respect to P N , n → ∞ , and sup ( Q i ) i ∈ N ∈ ( M ( Z )) N d BL Q ˜ W n , n n X i =1 Q i ! −→ almost surely with respect to Q N , n → ∞ . That is, there is n ∈ N such that for all n ≥ n P n ( w n ∈ Z n | d BL P w n , n n X i =1 P i ! ≤ δ )! ≥ − ε , (18)and Q n ( ˜ w n ∈ Z n | d BL Q ˜ w n , n n X i =1 Q i ! ≤ δ )! ≥ − ε . (19)Moreover, due to Lemma 3.1, we have d BL ( P n , Q n ) ≤ δ ⇒ d BL n n X i =1 P i , n n X i =1 Q i ! ≤ δ . (20)Then the strong Varadarajan property of ( Z i ) i ∈ N yields that there is n ∈ N such that forall n ≥ n : P n (cid:18)(cid:26) w n ∈ Z n | d BL ( P w n , P ) ≤ δ (cid:27)(cid:19) ≥ − ε . (21)Similar to the argumentation for part I we conclude, using again the boundedness of theProhorov metric π d H by the Ky Fan metric, see Dudley (1989, Theorem 11.3.5): π d H ( L P n ( S n ) , L Q ∗ n ( S n )) = π d H ( S n ◦ W n , S n ◦ ˜ W ′ n )= inf { ˜ ε > | ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d H ( S n ( w n ) , S n ( ˜ w ′ n )) > ˜ ε, ˜ w n , ˜ w ∗ n ∈ Z n (cid:9)(cid:1) ≤ ˜ ε } . Due to the deﬁnition of the statistical operator S , this is equivalent to inf { ˜ ε > | ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d H ( S ( P w n ) , S ( Q ˜ w ′ n )) > ˜ ε, ˜ w n , ˜ w ∗ n ∈ Z n (cid:9)(cid:1) ≤ ˜ ε } . d H ( S ( P w n ) , S ( Q ˜ w ′ n )) ≤ d H ( S ( P w n ) , S ( P )) + d H ( S ( P ) , S ( Q ˜ w ′ n )) . Hence, for all n ≥ max { n , n , n , n } , we obtain ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:16)n ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d H ( S ( P w n ) , S ( Q ˜ w ′ n )) > ε , ˜ w n , ˜ w ∗ n ∈ Z n o(cid:17) ≤ ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:16)n ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d H ( S ( P w n ) , S ( P )) > ε or d H ( S ( P ) , S ( Q ˜ w ′ n )) > ε , ˜ w n , ˜ w ∗ n ∈ Z n o(cid:17) . The continuity of the statistical operator S in P , see (11), gives d BL ( P, Q ˜ W ′ n ) ≤ δ ⇒ d H ( S ( P ) , S ( Q ˜ W ′ n )) ≤ ε , and d BL ( P, P W n ) ≤ δ ⇒ d H ( S ( P ) , S ( P W n )) ≤ ε . Further, the triangle inequality yields d BL ( P, Q ˜ w ′ n ) ≤ d BL ( P, P w n ) + d BL P w n , n n X i =1 P i ! + d BL n n X i =1 P i , n n X i =1 Q i ! + d BL n n X i =1 Q i , Q ˜ w n ! + d BL ( Q ˜ w n , Q ˜ w ∗ n ) + d BL ( Q ˜ w ∗ n , Q ˜ w ′ n ) . (22)Therefore we conclude, for all n ≥ max { n , n , n , n } , ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d H ( S ( P w n ) , S ( P )) > ε or d H ( S ( P ) , S ( Q ˜ w ′ n )) > ε , ˜ w n , ˜ w ∗ n ∈ Z n o(cid:17) (11) ≤ ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d BL ( P w n , P ) > δ or d BL ( P, Q ˜ w ′ n ) > δ , ˜ w n , ˜ w ∗ n ∈ Z n (cid:9)(cid:1) (22) ≤ ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d BL ( P w n , P ) > δ or d BL P w n , n n X i =1 P i ! > δ or d BL n n X i =1 P i , n n X i =1 Q i ! > δ or d BL n n X i =1 Q i , Q ˜ w n ! > δ or d BL ( Q ˜ w n , Q ˜ w ∗ n ) > δ or d BL ( Q ˜ w ∗ n , Q ˜ w ′ n ) > δ (cid:27)(cid:19) . d BL ( P n , Q n ) ≤ δ , then (20) yields d BL (cid:0) n P ni =1 P i , n P ni =1 Q i (cid:1) ≤ δ , there-fore this term can be omitted. Note that this is only proven for the p -product metrics on Z n and not for the metric d n from (4). For this metric we need a diﬀerent argumentation,which is stated below the next calculation.Hence, for all n ≥ max { n , n , n , n } , ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d H ( S ( P w n ) , S ( Q ˜ w ′ n )) > ε, ˜ w n , ˜ w ∗ n ∈ Z n (cid:9)(cid:1) (20) ≤ ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d BL ( P w n , P ) > δ or d BL P w n , n n X i =1 P i ! > δ or d BL n n X i =1 Q i , Q ˜ w n ! > δ or d BL ( Q ˜ w n , Q ˜ w ∗ n ) > δ or d BL ( Q ˜ w ∗ n , Q ˜ w ′ n ) > δ (cid:27)(cid:19) ≤ P n (cid:18)(cid:26) w n ∈ Z n | d BL ( P w n , P ) > δ (cid:27)(cid:19) + P n ( w n ∈ Z n | d BL P w n , n n X i =1 P i ! > δ )! + Q n ( ˜ w n ∈ Z n | d BL n n X i =1 Q i , Q ˜ w n ! > δ )! + ⊗ ni =1 Q ˜ w n (cid:18)(cid:26) ˜ w ∗ n ∈ Z n | d BL (cid:0) Q ˜ w n , Q ˜ w ∗ n (cid:1) > δ (cid:27)(cid:19) + Q ∗ n (cid:18)(cid:26) ˜ w ′ n ∈ Z n | d BL (cid:0) Q ˜ w ∗ n , Q ˜ w ′ n (cid:1) > δ (cid:27)(cid:19) (16) , (17)(18) , (19) , (21) < ε

10 + ε

10 = ε . In order to show the above bound for the metric d n , see (4), on Z n , we use another variantof the triangle inequality in (22): d BL ( P, Q ˜ w ′ n ) ≤ d BL ( P, P w n ) + d BL ( P w n , Q ˜ w n ) + d BL ( Q ˜ w n , Q ˜ w ∗ n ) + d BL ( Q ˜ w ∗ n , Q ˜ w ′ n ) . (23)Assume d BL ( P n , Q n ) ≤ δ . Then, the strong equivalence between the Prohorov metricand the bounded Lipschitz metric on Polish spaces, see Huber (1981, Chapter 2, Corollary4.3), yields π d n ( P n , Q n ) ≤ p d BL ( P n , Q n ) ≤ δ . Due to Dudley (1989, Theorem 11.6.2), π d n ( P n , Q n ) ≤ δ implies the existence of a probability measure µ ∈ M ( Z n × Z n ) withmarginal distributions P n and Q n , such that µ (cid:16)n ( w n , ˜ w n ) ∈ Z n × Z n | d n ( w n , ˜ w n ) > δ o(cid:17) ≤ δ . By a simple calculation d n ( w n , ˜ w n ) ≤ δ implies π d n (cid:0) n P ni =1 δ z i , n P ni =1 δ ˜ z i (cid:1) ≤ δ andwe have: µ (cid:18)(cid:26) ( w n , ˜ w n ) ∈ Z n × Z n | π d n ( P w n , Q ˜ w n ) > δ (cid:27)(cid:19) ≤ δ . π and d BL yields: µ (cid:18)(cid:26) ( w n , ˜ w n ) ∈ Z n × Z n | d BL ( P w n , Q ˜ w n ) > δ (cid:27)(cid:19) ≤ δ . Now we choose the joint distribution ˜ K N of W N , ˜ W N , ˜ W ∗ N , and ˜ W ′ N such that the distri-bution of ( W n , ˜ W n ) : Z N × Z N → Z n × Z n is µ ∈ M ( Z n × Z n ) . Then we conclude: ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d H ( S ( P w n ) , S ( P )) > ε or d H ( S ( P ) , S ( Q ˜ w ′ n )) > ε , ˜ w n , ˜ w ∗ n ∈ Z n o(cid:17) (11) , (23) ≤ ( W n , ˜ W n , ˜ W ∗ n , ˜ W ′ n )( ˜ K N ) (cid:0)(cid:8) ( w n , ˜ w n , ˜ w ∗ n , ˜ w ′ n ) ∈ Z n × Z n × Z n × Z n | d BL ( P w n , P ) > δ or d BL ( P w n , Q ˜ w n ) > δ or d BL ( Q ˜ w n , Q ˜ w ∗ n ) > δ or d BL ( Q ˜ w ∗ n , Q ˜ w ′ n ) > δ (cid:27)(cid:19) . ≤ P n (cid:18)(cid:26) w n ∈ Z n | d BL ( P w n , P ) > δ (cid:27)(cid:19) + µ (cid:18)(cid:26) ( w n , ˜ w n ) ∈ Z n × Z n | d BL ( P w n , Q ˜ w n ) > δ (cid:27)(cid:19) + ⊗ ni =1 Q ˜ w n (cid:18)(cid:26) ˜ w ∗ n ∈ Z n | d BL (cid:0) Q ˜ w n , Q ˜ w ∗ n (cid:1) > δ (cid:27)(cid:19) + Q ∗ n (cid:18)(cid:26) ˜ w ′ n ∈ Z n | d BL (cid:0) Q ˜ w ∗ n , Q ˜ w ′ n (cid:1) > δ (cid:27)(cid:19) . Now, adapting the inequalities in (16), (17), and (21) in ε respectively n yields the bound-edness of the above term by ε for d BL ( P n , Q n ) ≤ δ and for all n ≥ { n , n , n } .Now we can go on with the proof similar for both kinds of metrics on Z n .The equivalence between the Prohorov metric and the bounded Lipschitz metric on Polishspaces, see Huber (1981, Chapter 2, Corollary 4.3), yields the existence of n , ∈ N suchthat for all n ≥ n , , d BL ( P n , Q n ) ≤ δ (respectively d BL ( P n , Q n ) ≤ δ ) implies d BL ( L P n ( S n ) , L Q ∗ n ( S n )) < ε . (24)Now, (15) and (24) yield for all n ≥ max { n , , n , } : d BL ( L P ∗ n ( S n ) , L Q ∗ n ( S n )) < ε. (25)Recall that L P ∗ n ( S n ) =: ζ n and L Q ∗ n ( S n ) =: ξ n are random quantities with values in M ( H ) .Hence (25) is equivalent to E (cid:2) d BL ( L P ∗ n ( S n ) , L Q ∗ n ( S n )) (cid:3) < ε, for all n ≥ max { n , , n , } , E [ d BL ( ζ n , ξ n )] < ε, for all n ≥ max { n , , n , } . Therefore, for all f ∈ BL ( M ( Z )) and for all n ≥ max { n , , n , } : (cid:12)(cid:12)(cid:12)(cid:12)Z f d ( L ( ζ n )) − Z f d ( L ( ξ n )) (cid:12)(cid:12)(cid:12)(cid:12) = | E f ( ζ n ) − E f ( ξ n ) | ≤ E | f ( ζ n ) − f ( ξ n ) |≤ E ( | f | d BL ( ζ n , ξ n )) < ε, by a variant of Strassen’s Theorem, see Huber (1981, Chapter 2, Theorem 4.2, (2) ⇒ (1)).That is, d BL ( L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n ))) < ε for all n ≥ max { n , , n , } . Hence for every ε > we ﬁnd δ = δ and n = max { n , , n , } such that for all n ≥ n : d BL ( P n , Q n ) < δ ⇒ d BL ( L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n ))) < ε, which yields the assertion. (cid:3) Proof of Example 1:

Without any restriction we assume a = 0 . Otherwise regard the process Z i − a , i ∈ N . Byassumption, the random variables Z i , i ∈ N , are independent. Hence I B ◦ Z i , i ∈ N , areindependent, see for example Hoﬀmann-Jørgensen (1994, Theorem 2.10.6) for all measurable B ∈ B , as I B is a measurable function. According to Steinwart et al. (2009, Proposition2.8), ( Z i ) i ∈ N satisﬁes the SLLNE if there is a probability measure P in M ( Z ) such that lim n →∞ n P ni =1 E µ I B ◦ Z i = P ( B ) for all measurable B ∈ B . Hence: n n X i =1 E µ I B ◦ Z i = 1 n n X i =1 Z I B dZ i ( µ ) = 1 n n X i =1 Z I B f i dλ , where f i ( x ) = √ π e − ( x − a i ) denotes the density of the normal distribution N (0 , withrespect to the Lebesgue measure λ . Moreover deﬁne g : R → R by g ( x ) =  e − ( x + c ) , x < − c √ π , − c ≤ x ≤ ce − ( x − c ) , c < x x ∈ R . Therefore | f i | ≤ | g | , for all i ∈ N , g is integrable and due to Lebesgue’s Theorem, see forexample Hoﬀmann-Jørgensen (1994, Theorem 3.6): lim n →∞ n n X i =1 Z I B f i dλ = lim n →∞ Z n n X i =1 I B f i dλ = Z lim n →∞ n n X i =1 I B f i dλ . (26)22e have f i → f , where f = √ π e − x for all x ∈ R , as a i → and therefore the Lemma ofKronecker, see for example Hoﬀmann-Jørgensen (1994, Theorem 4.9, Equation 4.9.1) yields: lim n →∞ n P ni =1 f i ( x ) = f ( x ) for all x ∈ X .Now (26) yields the SLLNE: lim n →∞ n n X i =1 Z I B f i dλ = Z I B f dλ = P ( B ) , for al B ∈ B . With Strohriegl and Hable (2016, Zheorem 2) the Varadarajan property is given. (cid:3)

Proof of Example 2:

Similar to the proof of Example 1, we ﬁrst show the SLLNE, that is there exists a probabilitymeasure P ∈ M ( Z ) such that lim n →∞ n n X i =1 Z I B ◦ Z i dµ = P ( B ) , for all measurable B ⊂ Ω . Now let B ⊂ Ω be an arbitrary measurable set. Then: lim n →∞ n n X i =1 Z I B ◦ Z i dµ = lim n →∞ n n X i =1 Z Z I B dP i = lim n →∞ n n X i =1 Z Z I B d [(1 − ε i ) P + ε i ˜ P i ]= lim n →∞ n n X i =1 Z Z I B dP − lim n →∞ n n X i =1 ε i Z Z I B dP + lim n →∞ n n X i =1 ε i Z Z I B d ˜ P i . (27)As, ≤ n P ni =1 ε i R I B dP ≤ n P ni =1 ε i and ε i → , we have lim n →∞ n n X i =1 ε i Z I B dP ≤ lim n →∞ n n X i =1 ε i −→ , n → ∞ and similarly lim n →∞ n n X i =1 ε i Z I B d ˜ P i ≤ lim n →∞ n n X i =1 ε i −→ , n → ∞ . Hence (27) yields lim n →∞ n n X i =1 I B ◦ Z i = lim n →∞ n n X i =1 Z I B dP = P ( B ) and therefore, due to Strohriegl and Hable (2016, Theorem 2), the assertion. (cid:3) roof of Corollary 2.4: Due to Example 2, the stochastic process is a Varadarajan process. Hable and Christmann(2011, Theorem 3.2) ensures the continuity of the statistical operator S : M ( Z ) → H, P f L ∗ ,P,λ for a ﬁxed value λ ∈ (0 , ∞ ) . Moreover Hable and Christmann (2011, Corollary3.4) yields the continuity of the estimator S n : Z n → H, D n f L ∗ ,D n ,λ for every ﬁxed λ ∈ (0 , ∞ ) . Hence for ﬁxed λ > the bootstrap approximation of the SVM estimator isqualitatively robust, for the given assumptions. Moreover the proof of Theorem 2.2, equation(25), and the equivalence between between bounded Lipschitz metric and Prokhorov distanceyield: for every ε > there is δ > such that there is n ∈ N such that for all n ≥ n andif d BL ( P n , Q n ) ≤ δ : π ( L P ∗ n ( S n ) , L Q ∗ n ( S n )) < ε almost surely . Similarly to the proof of the qualitative robustness in Strohriegl and Hable (2016, Theorem4) we get: for every ε > there is n ε , such that for all n ≥ n ε : k f L ∗ ,D n ,λ n − f L ∗ ,D n ,λ k H ≤ ε . And the same argumentation as in the proof of the qualitative robustness of the SVMestimator for the non-i.i.d. case in Strohriegl and Hable (2016, Theorem 4) for the cases n ≤ n ≤ n ε and n > n ε yields the assertion. (cid:3) Proof of Theorem 2.6:Proof of Theorem 2.6:

Let P ∗ N , Q ∗ N ∈ M ( Z N ) be the bootstrap approximations of thetrue distribution P N and the contaminated distribution Q N . First, the triangle inequalityyields: d BL ( L P ∗ n ( S n ) , L Q ∗ n ( S n )) ≤ d BL ( L P ∗ n ( S n ) , L P n ( S n )) | {z } I + d BL ( L P n ( S n ) , L Q n ( S n )) | {z } II + d BL ( L Q n ( S n ) , L Q ∗ n ( S n )) | {z } III . First, we regard the term in part II. Let σ ( Z i ) , i ∈ N , be the σ -algebra generated by Z i .Due to the assumptions on the mixing process P m>n α ( σ ( Z , . . . , Z i ) , σ ( Z i + m , . . . ) , P N ) = O ( n − γ ) , i ∈ N , γ > , the sequence ( α ( σ ( Z , . . . , Z i ) , σ ( Z i + m , . . . ) , µ )) m ∈ N is a null se-quence. Moreover it is bounded by the deﬁnition of the α -mixing coeﬃcient which, due to24he strong stationarity, does not depend on i . Therefore n n X i =1 n X j =1 α (( Z i ) i ∈ N , P N , i, j ) = 1 n n X i =1 n X j =1 α ( σ ( Z i ) , σ ( Z j ) , P N ) ≤ n n X i =1 n X j ≥ i α ( σ ( Z i ) , σ ( Z j ) , P N ) ≤ n n X i =1 n X j ≥ i α ( σ ( Z , . . . , Z i ) , σ ( Z j , . . . ) , P N )= 2 n n X i =1 n − i X ℓ =0 α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N ) stationarity ≤ n n X ℓ =0 α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N ) , i ∈ N −→ , n → ∞ . Hence, the process is weakly α - bi -mixing with respect to P N , see Deﬁnition 2.5. Due to thestationarity assumption, the process ( Z i ) i ∈ N is additionally asymptotically mean stationary,that is lim n →∞ n P ni =1 E I B ◦ Z i = P ( B ) for all B ∈ A for a probability measure P . Thereforethe process satisﬁes the WLLNE, see Steinwart et al. (2009, Proposition 3.2), and thereforeis a weak Varadarajan process, see Strohriegl and Hable (2016, Theorem 2).As the process is assumed to be a Varadarajan process and due to the assumptions on the se-quence of estimators ( S n ) n ∈ N , qualitative robustness of ( S n ) n ∈ N is ensured by Strohriegl and Hable(2016, Theorem 1). Together with the equivalence between the Prohorov metric and thebounded Lipschitz metric for Polish spaces, see Huber (1981, Chapter 2, Corollary 4.3), itfollows:For every ε > there is δ > such that for all n ∈ N and for all Q n ∈ M ( Z n ) we have: d BL ( P n , Q n ) < δ ⇒ d BL ( L P n ( S n ) , L Q n ( S n )) < ε . This implies E [ d BL ( L P n ( S n ) , L Q n ( S n ))] < ε . (28)Hence the convergence of the term in part II is shown.To prove the convergence of the term in part I, consider the distribution P N ∈ M ( Z N ) and let P ∗ N be the bootstrap approximation of P N , via the blockwise bootstrap. Deﬁne, for n ∈ N , the random variables W n : Z N → Z n , W n = ( Z , . . . , Z n ) , z N w n = ( z , . . . , z n ) , and W ′ n : Z N → Z n , W ′ n = ( Z ′ , . . . , Z ′ n ) , z N w ′ n ,such that W n ( P N ) = P n and W ′ n ( P ∗ N ) = P ∗ n .Moreover denote the bootstrap sample by W ∗ n : Z N → Z n , W ∗ n := ( Z ∗ , . . . , Z ∗ n ) , z N w ∗ n ,25nd the distribution of W ∗ n by P n . The blockwise bootstrap approximation of P m , m ∈ N ,is P ∗ m = ⊗ mj =1 1 n P ni =1 δ Z ∗ i , m ∈ N . Note that the sample Z ∗ , . . . , Z ∗ n depends and on theblocklength b ( n ) and on the number of blocks ℓ ( n ) .Further denote the joint distribution of W N , W ∗ N , and W ′ N by K N ∈ M ( Z N × Z N × Z N ) .Then, K N has marginal distributions K N ( B × Z N × Z N ) = P N ( B ) for all B ∈ B ⊗ N , K N ( Z N × B × Z N ) = P N ( B ) for all B ∈ B ⊗ N , and K N ( Z N × Z N × B ) = P ∗ N ( B ) for all B ∈ B ⊗ N .Then, L P n ( S n ) = S n ( P n ) = S n ◦ W n ( P N ) and L P ∗ n ( S n ) = S n ( P ∗ n ) = S n ◦ W ′ n ( P ∗ N ) and therefore d BL ( L P ∗ n ( S n ) , L P n ( S n )) = d BL ( L ( S n ◦ W ′ n ) , L ( S n ◦ W n )) . By assumption we have ≤ z i ≤ , i ∈ N . Hence Z i ( z N ) = z i ∈ [0 , , i. e. Z = [0 , , whichis a totally bounded metric space. Therefore the set BL ([0 , is a uniform Glivenko-Cantelli class, due to Dudley et al. (1991, Proposition 12). Similar to part I of the proof ofTheorem 2.2, the blockwise bootstrap structure and the Glivenko-Cantelli property yield: lim n →∞ sup P w ∗ n ∈M ( Z ) P ∗ N (cid:18)(cid:26) z N ∈ Z N | sup m ≥ n d BL ( P W ′ m ( z N ) , P w ∗ n ) > η (cid:27)(cid:19) = 0 . Respectively, for ﬁxed ε > , for every δ > there is n ∈ N such that for all n ≥ n andall P w ∗ n ∈ M ( Z ) : P ∗ n (cid:18)(cid:26) w ′ n ∈ Z n | d BL ( P w ′ n , P w ∗ n ) ≤ δ (cid:27)(cid:19) ≥ − ε . (29)Regard the process G n ( t ) = √ n P ni =1 I { Z ∗ i ≤ t } − √ n P ni =1 I { Z i ≤ t } , t ∈ R . Due to the assump-tions on the process and on the moving block bootstrap, Theorem 2.3 in Peligrad (1998)yields the almost sure convergence in distribution to a Brownian bridge G : √ n n X i =1 I { Z ∗ i ≤ t } − √ n n X i =1 I { Z i ≤ t } −→ D G ( t ) , t ∈ R (30)almost surely with respect to P N , n → ∞ , in the Skorohod topology on D [0 , . Here −→ D indicates convergence in distribution and D [0 , denotes the space of cadlag functions on [0 , , for details see for example Billingsley (1999, p. 121).This is equivalent to √ n n X i =1 I { Z ∗ i ≤ t } − √ n n X i =1 I { Z i ≤ t } −→ D G ( t ) , almost surely with respect to P N , n → ∞ , t of G , see Billingsley (1999, (12.14), p. 124).Multiplying by √ n yields for any ﬁxed continuity point t ∈ R :1 n n X i =1 I { Z ∗ i ≤ t } − n n X i =1 I { Z i ≤ t } − √ n G ( t ) −→ D almost surely with respect to P N , n → ∞ . As convergence in distribution to a ﬁnite constant implies convergence in probability, seefor example van der Vaart (1998, Theorem 2.7(iii)), and as √ n G ( t ) → in probability, forall t ∈ R : n n X i =1 I { Z ∗ i ≤ t } − n n X i =1 I { Z i ≤ t } −→ P almost surely with respect to P N , n → ∞ , for all continuity points t of G , where −→ P denotes the convergence in probability.Hence, Dudley (1989, Theorem 11.12) yields the convergence of the corresponding proba-bility measures: d BL n n X i =1 δ Z ∗ i , n n X i =1 δ Z i ! −→ P almost surely with respect to P N , n → ∞ . Respectively d BL ( P W ∗ n , P W n ) −→ P almost surely with respect to P N , n → ∞ . Deﬁne the set B n = (cid:8) w n ∈ Z n | d BL ( P W ∗ n , P w n ) −→ P , n → ∞ (cid:9) . Hence, P n ( B n ) = P N (cid:16)n z N ∈ Z N | W n ( z N ) ∈ B n o(cid:17) = 1 (31)and, for all w n ∈ B n , there is n , w n ∈ N such that for all n ≥ n , w n ∈ N : P n (cid:18)(cid:26) w ∗ n ∈ Z n | d BL (cid:0) P w ∗ n , P w n (cid:1) > δ (cid:27)(cid:19) < ε . (32)By assumption we have ≤ z i ≤ , i ∈ N . Hence the space of probability measures { P w n | w n ∈ [0 , n } is a subset of M ([0 , and therefore tight, as [0,1] is a compactspace, see e. g. (Klenke, 2013, Example 13.28). Then Prohorov’s Theorem, see for ex-ample Billingsley (1999, Theorem 5.1) yields relative compactness of M ([0 , , d BL ) and inparticular the relative compactness of the set { P w n | w n ∈ [0 , n } . As M ([0 , , d BL ) isa complete space, see Dudley (1989, Theorem 11.5.5), relative compactness equals total27oundedness. That is, there exists a ﬁnite dense subset ˜ P of { P w n | w n ∈ [0 , n } such thatfor all ρ > and P w n ∈ { P w n | w n ∈ [0 , n } there is ˜ P ρ ∈ ˜ P such that d BL ( ˜ P ρ , P w n ) ≤ ρ. (33)The triangle inequality yields: d BL (cid:0) P w ∗ n , P w n (cid:1) ≤ d BL (cid:16) P w ∗ n , ˜ P ρ (cid:17) + d BL (cid:16) ˜ P ρ , P w n (cid:17) . Deﬁne ρ = δ . Then (32) yields for every ˜ P ρ ∈ ˜ P the existence of an integer n ≥ n , ˜ P ∈ N such that, for all n ≥ n , ˜ P and all w n ∈ B n : P n (cid:18)(cid:26) w ∗ n ∈ Z n | d BL (cid:0) P w ∗ n , P w n (cid:1) > δ (cid:27)(cid:19) ≤ P n (cid:18)(cid:26) w ∗ n ∈ Z n | d BL (cid:16) P w ∗ n , ˜ P ρ (cid:17) > δ or d BL (cid:16) ˜ P ρ , P w n (cid:17) > δ (cid:27)(cid:19) (33) ≤ P n (cid:18)(cid:26) w ∗ n ∈ Z n | d BL (cid:16) P w ∗ n , ˜ P ρ (cid:17) > δ (cid:27)(cid:19) (32) < ε . Hence, for all n ≥ n := max ˜ P ∈ ˜ P { n , ˜ P } and for all w n ∈ B n , we have: sup P w n ∈M ( Z ) P n (cid:18)(cid:26) w ∗ n ∈ Z n | d BL (cid:0) P w ∗ n , P w n (cid:1) > δ (cid:27)(cid:19) < ε . (34)Due to the uniform continuity of the operator S , for every ε > there is δ > such thatfor all P, Q ∈ M ( Z ) : d BL ( P, Q ) ≤ δ ⇒ d H ( S ( P ) , S ( Q )) ≤ ε . (35)Moreover, the triangle inequality yields: d BL ( P w ′ n , P w n ) ≤ d BL ( P w ′ n , P w ∗ n ) + d BL ( P w ∗ n , P w n ) . (36)Again we use the relation between the Prohorov metric π d H and the Ky Fan metric, Dudley(1989, Theorem 11.3.5): π d H (cid:0) L P ∗ n ( S n ) , L P n ( S n )) = π d H ( S n ◦ W ′ n , S n ◦ W n ) ≤ inf n ˜ ε > | K N (cid:16)n d H ( S n ◦ W ′ n , S n ◦ W n ) > ˜ ε, w ∗ N ∈ Z N o(cid:17) ≤ ˜ ε o = inf (cid:8) ˜ ε > | ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d H ( S n ( w ′ n ) , S n ( w n )) > ˜ ε, w ∗ n ∈ Z n (cid:9)(cid:1) ≤ ˜ ε (cid:9) . S , this is equivalent to inf { ˜ ε > | ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d H ( S ( P w ′ n ) , S ( P w n ) > ˜ ε, w ∗ n ∈ Z n (cid:9)(cid:1) ≤ ˜ ε } . Due to the uniform continuity of S , see (35), we obtain, for all n ≥ max { n , n } :( W n , W ∗ n , W ′ n )( K N ) (cid:16)n ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d H ( S ( P w ′ n ) , S ( P w n )) > ε , w ∗ n ∈ Z n o(cid:17) (35) ≤ ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | d BL ( P w ′ n , P w n ) > δ , w ∗ n ∈ Z n (cid:9)(cid:1) = ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n |{ w n / ∈ B n , d BL ( P w ′ n , P w n ) > δ } or { w n ∈ B n , d BL ( P w ′ n , P w n ) > δ } , w ∗ n ∈ Z n (cid:9)(cid:1) ≤ ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | w n / ∈ B n , d BL ( P w ′ n , P w n ) > δ , w ∗ n ∈ Z n (cid:9)(cid:1) +( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | w n ∈ B n , d BL ( P w ′ n , P w n ) > δ , w ∗ n ∈ Z n (cid:9)(cid:1) (31) = ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | w n ∈ B n , d BL ( P w ′ n , P w n ) > δ , w ∗ n ∈ Z n (cid:9)(cid:1) . The triangle inequality, (36), then yields for all n ≥ max { n , n } : ( W n , W ∗ n , W ′ n )( K N ) (cid:0)(cid:8) ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | w n ∈ B n , d BL ( P w ′ n , P w n ) > δ , w ∗ n ∈ Z n (cid:9)(cid:1) (36) ≤ ( W n , W ∗ n , W ′ n )( K N ) (cid:16)n ( w n , w ∗ n , w ′ n ) ∈ Z n × Z n × Z n | { w n ∈ B n and d BL ( P w ′ n , P w ∗ n ) > δ } or { w n ∈ B n and d BL ( P w ∗ n , P w n ) > δ } o(cid:17) ≤ P ∗ n (cid:18)(cid:26) w ′ n ∈ Z n | w n ∈ B n , d BL ( P w ′ n , P w ∗ n ) > δ (cid:27)(cid:19) + P n (cid:18)(cid:26) w ∗ n ∈ Z n | w n ∈ B n , d BL ( P w ∗ n , P w n ) > δ (cid:27)(cid:19) (29) , (32) < ε ε ε . The equivalence between the Prohorov metric and the bounded Lipschitz metric on Polishspaces, see Huber (1981, Chapter 2, Corollary 4.3), yields the existence of ˜ n such that forevery n ≥ ˜ n : d BL ( L P ∗ n ( S n ) , L P n ( S n )) < ε . And therefore E (cid:2) d BL (cid:0) L P ∗ n ( S n ) , L P n ( S n ) (cid:1)(cid:3) < ε . (37)29or the convergence of the term in part III the same argumentation as for part I can beapplied, as the assumptions on Q N and Q ∗ N are the same as for P N and P ∗ N . In particular forevery ε > there is ˜ n ∈ N such that for all n ≥ ˜ n : d BL (cid:0) L Q ∗ n ( S n ) , L Q n ( S n ) (cid:1) < ε , respectively E (cid:2) d BL (cid:0) L Q ∗ n ( S n ) , L Q n ( S n ) (cid:1)(cid:3) < ε . (38)Hence, (28), (37), and (38) yield, for all n ≥ max { ˜ n , ˜ n } : E (cid:2) d BL (cid:0) L P ∗ n ( S n ) , L Q ∗ n ( S n ) (cid:1)(cid:3) < ε ε ε ε. As L P ∗ n ( S n ) and L Q ∗ n ( S n ) are random variables itself we have, due to Huber (1981, Chapter2 Theorem 4.2, (2) ⇒ (1)), for all n ≥ max { ˜ n , ˜ n } : d BL (cid:0) L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n )) (cid:1) < ε. Hence, for all ε > there is δ > such that there is n = max { ˜ n , ˜ n } ∈ N such that, forall n ≥ n : d BL ( P n , Q n ) < δ ⇒ d BL ( L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n ))) < ε and therefore the assertion. (cid:3) Proof of Theorem 2.7:Proof of Theorem 2.7:

The proof follows the same lines as the proof of Theorem 2.6 andtherefore we only state the diﬀerent steps. Again we start with the triangle inequality: d BL ( L P ∗ n ( S n ) , L Q ∗ n ( S n )) ≤ d BL ( L P ∗ n ( S n ) , L P n ( S n )) | {z } I + d BL ( L P n ( S n ) , L Q n ( S n )) | {z } II + d BL ( L Q n ( S n ) , L Q ∗ n ( S n )) | {z } III . To proof the convergence of the term in part II, we need the weak Varadarajan propertyof the stochastic process. Due to the deﬁnition α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , µ ) ≤ for all ℓ ∈ N , i ∈ N , and obviously: α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N ) ≤ ℓ + 1 , ℓ > . (39)30ence, due to the strong stationarity of the stochastic process, we have: n n X i =1 n X j =1 α (( Z i ) i ∈ N , P N , i, j ) = 1 n n X i =1 n X j =1 α ( σ ( Z i ) , σ ( Z j ) , P N ) ≤ n n X i =1 n X j ≥ i α ( σ ( Z i ) , σ ( Z j ) , P N ) ≤ n n X i =1 n X j ≥ i α ( σ ( Z , . . . , Z i ) , σ ( Z j , . . . ) , P N )= 2 n n X i =1 n − i X ℓ =0 α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N ) stationarity ≤ n n X ℓ =0 α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N ) , i ∈ N = 2 n n X ℓ =0 ( α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N )) ( α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N )) , i ∈ N (39) ≤ n n X ℓ =0 ( ℓ + 1) ( α ( σ ( Z , . . . , Z i ) , σ ( Z i + ℓ , . . . ) , P N )) , i ∈ N (6) −→ , n → ∞ . Now, the same argumentation as in the proof of Theorem 2.6 yields the weak Varadarajanproperty and therefore, for all ε > , E [ d BL ( L P n ( S n ) , L Q n ( S n ))] < ε . (40)Regarding the term in part I, we use a central limit theorem for the blockwise bootstrappedempirical process by Bühlmann (1994, Corollary 1 and remark) to show its convergence.Again, regard the distribution P N ∈ M ( Z N ) and let P ∗ N be the bootstrap approximation of P N , via the blockwise bootstrap. Deﬁne, for all n ∈ N , the random variables W n : Z N → Z n , W n = ( Z , . . . , Z n ) , z N w n , and W ′ n : Z N → Z n , W ′ n = ( Z ′ , . . . , Z ′ n ) , z N w ′ n ,such that W n ( P N ) = P n and W ′ n ( P ∗ N ) = P ∗ n .Moreover denote the bootstrap sample by W ∗ n : Z N → Z n , W ∗ n := ( Z ∗ , . . . , Z ∗ n ) , z N w ∗ n , and the distribution of W ∗ n by P n . The bootstrap approximation of P m is P ∗ m = ⊗ mj =1 1 n P ni =1 δ Z ∗ i = ⊗ mj =1 P W ∗ n , m ∈ N , by deﬁnition of the bootstrap procedure. Note thatthe sample Z ∗ , . . . , Z ∗ n depends and on the blocklength b ( n ) and on the number of blocks ℓ ( n ) .Further denote the joint distribution of W N , W ∗ N , and W ′ N by K N ∈ M ( Z N × Z N × Z N ) .Then, K N has marginal distributions K N ( B × Z N × Z N ) = P N ( B ) for all B ∈ B ⊗ N ,31 N ( Z N × B × Z N ) = P N ( B ) for all B ∈ B ⊗ N , and K N ( Z N × Z N × B ) = P ∗ N ( B ) for all B ∈ B ⊗ N .Then, L P n ( S n ) = S n ( P n ) = S n ◦ W n ( P N ) and L P ∗ n ( S n ) = S n ( P ∗ n ) = S n ◦ W ′ n ( P ∗ N ) and therefore d BL ( L P ∗ n ( S n ) , L P n ( S n )) = d BL ( L ( S n ◦ W ′ n ) , L ( S n ◦ W n )) . As Z = [0 , d is compact, it is in particular totally bounded. Hence the set BL ( Z , d Z ) is a uniform Glivenko-Cantelli class, due to Dudley et al. (1991, Proposition 12). Similarto part I of the proof of Theorem 2.6, the bootstrap structure and the Glivenko-Cantelliproperty given above yield for arbitrary, but ﬁxed ε > :for every δ > there is n ∈ N such that, for all n ≥ n and all P w ∗ n ∈ M ( Z ) , P ∗ n (cid:18)(cid:26) w ′ n ∈ Z n | d BL ( P w ′ n , P w ∗ n ) ≤ δ (cid:27)(cid:19) ≥ − ε . Now, regard the empirical process of ( Z , . . . , Z n ) . Set t = ( t , . . . , t d ) ∈ R d . Moreover t < b means t i < b i for all i ∈ { , . . . , d } . Hence we can deﬁne the empirical process andthe blockwise bootstrapped empirical process by n n X i =1 I { Z i ≤ t } and n n X i =1 I { Z ∗ i ≤ t } . Regard the process G n ( t ) = √ n P ni =1 I { Z ∗ i ≤ t } − √ n P ni =1 I { Z i ≤ t } , t ∈ [0 , d . Now, due to theassumptions on the stochastic process and on the moving block bootstrap, Bühlmann (1994,Corollary 1 and remark) yields the almost sure convergence in distribution to a Gaussianprocess G : √ n n X i =1 I { Z ∗ i ≤ t } − √ n n X i =1 I { Z i ≤ t } −→ D G ( t ) , t ∈ [0 , d , almost surely with respect to P N , n → ∞ , in the (extended) Skorohod topology on D d ([0 , .The space D d ([0 , is a generalization of the space of cadlag functions on [0 , , seeBillingsley (1999, Chapter 12), and consists of functions f : [0 , d → R . A detailed de-scription of this space and the extended Skorohod topology can be found in Straf (1972,1969) and Bickel and Wichura (1971). The deﬁnition of the space D d ([0 , can, for exam-ple, be found in Bickel and Wichura (1971, Chapter 3).Straf (1972, Lemma 5.4) yields, that the above convergence in the Skorohod topology isequivalent to the convergence for all continuity points t of G . Hence, √ n n X i =1 I { Z ∗ i ≤ t } − √ n n X i =1 I { Z i ≤ t } −→ D G ( t ) almost surely with respect to P N , n → ∞ , t of G .Multiplying by √ n yields, for every continuity point t of G , n n X i =1 I { Z ∗ i ≤ t } − n n X i =1 I { Z i ≤ t } − √ n G ( t ) −→ D almost surely with respect to P N , n → ∞ . As convergence in distribution to a constant implies convergence in probability, see e. g.van der Vaart (1998, Theorem 2.7(iii)) and as √ n G ( t ) converges in probability to , for allﬁxed continuity points t ∈ [0 , d of G : n n X i =1 I { Z ∗ i ≤ t } − n n X i =1 I { Z i ≤ t } −→ P almost surely with respect to P N , n → ∞ . This yields the convergence of the corresponding probability measures, see for exampleBillingsley (1995, Chapter 29) for a theory on R d : d BL ( 1 n n X i =1 δ Z ∗ i , n n X i =1 δ Z i ) −→ P almost surely with respect to P N , n → ∞ , respectively d BL ( P W ∗ n , P W n ) −→ P almost surely with respect to P N , n → ∞ . As the space [0 , d is compact, we can use an argumentation similar to the proof of Theorem2.6. Then, for every ε > , there is n ∈ N such that for all n ≥ n d BL (cid:0) L P ∗ n ( S n ) , L P n ( S n ) (cid:1) < ε , respectively, E (cid:2) d BL (cid:0) L P ∗ n ( S n ) , L P n ( S n ) (cid:1)(cid:3) < ε . (41)The convergence of the term in part III follows simultaneously to part I for the distributions Q N and Q ∗ N . Hence, for every ε > , there is n ∈ N such that for all n ≥ n E (cid:2) d BL (cid:0) L Q ∗ n ( S n ) , L Q n ( S n ) (cid:1)(cid:3) < ε . (42)The combination of (40), (41), and (42) yields for all n ≥ max { n , n } : E (cid:2) d BL (cid:0) L P ∗ n ( S n ) , L Q ∗ ( S n ) (cid:1)(cid:3) < ε ε ε ε. As L P ∗ n ( S n ) and L Q ∗ n ( S n ) are random variables itself we have, due to Huber (1981, Chapter2, Theorem 4.2, (2) ⇒ (1)), for all n ≥ max { n , n } : d BL (cid:0) L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n )) (cid:1) < ε. ε > there is δ > such that there is n = max { n , n } ∈ N such that for all n ≥ n : d BL ( P n , Q n ) < δ ⇒ d BL ( L ( L P ∗ n ( S n )) , L ( L Q ∗ n ( S n ))) < ε. This yields the assertion. (cid:3)

References

E. Beutner and H. Zähle. Functional delta-method for the bootstrap of quasi-Hadamarddiﬀerentiable functionals.

Electron. J. Stat. , 10, 2016.P. J. Bickel and M. J. Wichura. Convergence criteria for multiparameter stochastic processesand some applications.

Ann. Math. Statist. , 42:1656–1670, 1971.P. Billingsley.

Probability and measure . Wiley Series in Probability and MathematicalStatistics. John Wiley & Sons, Inc., New York, third edition, 1995.P. Billingsley.

Convergence of probability measures . Wiley Series in Probability and Statis-tics: Probability and Statistics. John Wiley & Sons, Inc., New York, second edition,1999.G. Boente, R. Fraiman, and V. J. Yohai. Qualitative robustness for general stochasticprocesses. Technical report, Department of Statistics, University of Washington, 1982.G. Boente, R. Fraiman, and V. J. Yohai. Qualitative robustness for stochastic processes.

The Annals of Statistics , 15(3):1293–1312, 1987.R. C. Bradley. Basic properties of strong mixing conditions. A survey and some openquestions.

Probab. Surv. , 2:107–144, 2005.R. C. Bradley.

Introduction to strong mixing conditions. Vol. 1 . Kendrick Press, Heber City,UT, 2007a.R. C. Bradley.

Introduction to strong mixing conditions. Vol. 2 . Kendrick Press, Heber City,UT, 2007b.R. C. Bradley.

Introduction to strong mixing conditions. Vol. 3 . Kendrick Press, Heber City,UT, 2007c.P. Bühlmann. Blockwise bootstrapped empirical process for stationary sequences.

Ann.Statist. , 22(2):995–1012, 1994.P. Bühlmann. The blockwise bootstrap for general empirical processes of stationary se-quences.

Stochastic Process. Appl. , 58(2):247–265, 1995.O. Bustos. On qualitative robustness for general processes. unpublished manuscript, 1980.34. Christmann, M. Salibian-Barrera, and S. Van Aelst. On the stability of bootstrap esti-mators. arXiv preprint arXiv:1111.1876 , 2011.A. Christmann, M. Salibián-Barrera, and S. Van Aelst. Qualitative robustness of bootstrapapproximations for kernel based methods. In

Robustness and complex data structures ,pages 263–278. Springer, Heidelberg, 2013.D. D. Cox. metrics on stochastic processes and qualitative robustness. Technical report,Department of Statistics, University of Washington, 1981.A. Cuevas and J. Romo. On robustness properties of bootstrap approximations.

J. Statist.Plann. Inference , 37(2):181–191, 1993.Z. Denkowski, S. Migórski, and N. S. Papageorgiou.

An introduction to nonlinear analysis:applications . Kluwer Academic Publishers, Boston, MA, 2003.P. Doukhan.

Mixing . Springer, New York, 1994.R. M. Dudley.

Real Analysis and Probability . Chapman & Hall, New York, 1989.R. M. Dudley.

Uniform central limit theorems , volume 63 of

Cambridge Studies in AdvancedMathematics . Cambridge University Press, Cambridge, 2014.R. M. Dudley, E. Giné, and J. Zinn. Uniform and universal Glivenko-Cantelli classes.

J.Theoret. Probab. , 4(3):485–510, 1991.B. Efron. Bootstrap methods: another look at the jackknife.

Ann. Statist. , 7(1):1–26, 1979.B. Efron and R. J. Tibshirani.

An introduction to the bootstrap , volume 57 of

Monographson Statistics and Applied Probability . Chapman and Hall, New York, 1993.R. Hable and A. Christmann. On qualitative robustness of support vector machines.

Journalof Multivariate Analysis , 102:993–1007, 2011.F. R. Hampel.

Contributions to the theory of robust estimation . PhD thesis, Univ. California,Berkeley, 1968.F. R. Hampel. A general qualitative deﬁnition of robustness.

Annals of MathematicalStatistics , 42:1887–1896, 1971.J. Hoﬀmann-Jørgensen.

Probability with a view toward statistics. Vol. I . Chapman & HallProbability Series. Chapman & Hall, New York, 1994.P. J. Huber.

Robust statistics . John Wiley & Sons Inc., New York, 1981.J. Jurečková and J. Picek.

Robust statistical methods with R . Chapman & Hall/CRC, BocaRaton, FL, 2006.A. Klenke. Probability theory: a comprehensive course . Springer Science & Business Media,2013. 35. Krätschmer, A. Schied, and H. Zähle. Domains of weak continuity of statistical func-tionals with a view toward robust statistics.

J. Multivariate Anal. , 158:1–19, 2017.H. R. Künsch. The jackknife and the bootstrap for general stationary observations.

Ann.Statist. , 17(3):1217–1241, 1989.S. N. Lahiri.

Resampling methods for dependent data . Springer Series in Statistics. Springer,New York, 2003.R. Y. Liu and K. Singh. Moving blocks jackknife and bootstrap capture weak dependence.In

Exploring the limits of bootstrap (East Lansing, MI, 1990) , Wiley Ser. Probab. Math.Statist. Probab. Math. Statist., pages 225–248. Wiley, New York, 1992.R. A. Maronna, R. D. Martin, and V. J. Yohai.

Robust statistics . Wiley Series in Probabilityand Statistics. John Wiley & Sons Ltd., Chichester, 2006.U. V. Naik-Nimbalkar and M. B. Rajarshi. Validity of blockwise bootstrap for empiricalprocesses with stationary observations.

Ann. Statist. , 22(2):980–994, 1994.P. Papantoni-Kazakos and R. M. Gray. Robustness of estimators on stationary observations.

The Annals of Probability , 7(6):989–1002, 1979.K. R. Parthasarathy.

Probability measures on metric spaces , volume 352. American Math-ematical Soc., 1967.M. Peligrad. On the blockwise bootstrap for empirical processes for stationary sequences.

Ann. Probab. , 26(2):877–901, 1998.D. N. Politis and J. P. Romano. A circular block-resampling procedure for stationary data.In

Exploring the limits of bootstrap (East Lansing, MI, 1990) , Wiley Ser. Probab. Math.Statist. Probab. Math. Statist., pages 263–270. 1990.D. Radulović. The bootstrap for empirical processes based on stationary observations.

Stochastic Process. Appl. , 65, 1996.M. Rosenblatt. A central limit theorem and a strong mixing condition.

Proc. Nat. Acad.Sci. U. S. A. , 42:43–47, 1956.B. Schölkopf and A. J. Smola.

Learning with Kernels . Massachusetts Institute of Technology,Cambridge, 2002.Q. M. Shao and H. Yu. Bootstrapping the sample means for stationary mixing sequences.

Stochastic Process. Appl. , 48(1):175–190, 1993.K. Singh. On the asymptotic accuracy of Efron’s bootstrap.

Ann. Statist. , 9(6):1187–1195,1981.I. Steinwart and A. Christmann.

Support vector machines . Information Science and Statis-tics. Springer, New York, 2008. 36. Steinwart, D. Hush, and C. Scovel. Learning from dependent observations.

Journal ofMultivariate Analysis , 100:175–194, 2009.M. L. Straf. A general skorohod space, 1969.M. L. Straf. Weak convergence of stochastic processes with several parameters. pages187–221, 1972.K. Strohriegl and R. Hable. On qualitative robustness for stochastic processes.

Metrika ,pages 895–917, 2016.A. W. van der Vaart.

Asymptotic statistics , volume 3 of

Cambridge Series in Statistical andProbabilistic Mathematics . Cambridge University Press, Cambridge, 1998.H. Zähle. Qualitative robustness of statistical functionals under strong mixing.

Bernoulli ,21(3):1412–1434, 2015.H. Zähle. A deﬁnition of qualitative robustness for general point estimators, and examples.