[PDF] Open-end nonparametric sequential change-point detection based on the retrospective CUSUM statistic

Abstract

The aim of online monitoring is to issue an alarm as soon as there is significant evidence in the collected observations to suggest that the underlying data generating mechanism has changed. This work is concerned with open-end, nonparametric procedures that can be interpreted as statistical tests. The proposed monitoring schemes consist of computing the so-called retrospective CUSUM statistic (or minor variations thereof) after the arrival of each new observation. After proposing suitable threshold functions for the chosen detectors, the asymptotic validity of the procedures is investigated in the special case of monitoring for changes in the mean, both under the null hypothesis of stationarity and relevant alternatives. To carry out the sequential tests in practice, an approach based on an asymptotic regression model is used to estimate high quantiles of relevant limiting distributions. Monte Carlo experiments demonstrate the good finite-sample behavior of the proposed monitoring schemes and suggest that they are superior to existing competitors as long as changes do not occur at the very beginning of the monitoring. Extensions to statistics exhibiting an asymptotic mean-like behavior are briefly discussed. Finally, the application of the derived sequential change-point detection tests is succinctly illustrated on temperature anomaly data.

Full PDF

OOpen-end nonparametric sequentialchange-point detection based on the retrospective CUSUM statistic

Mark Holmes and Ivan Kojadinovic School of Mathematics & Statistics, The University of Melbourne, Parkville, VIC 3010, Australia, e-mail: [email protected] CNRS / Universit´e de Pau et des Pays de l’Adour / E2S UPPA, Laboratoire de math´ematiques etapplications, IPRA, UMR 5142, B.P. 1155, 64013 Pau Cedex, France, e-mail: [email protected]

Abstract:

The aim of online monitoring is to issue an alarm as soon as there issigniﬁcant evidence in the collected observations to suggest that the underlying datagenerating mechanism has changed. This work is concerned with open-end, nonpara-metric procedures that can be interpreted as statistical tests. The proposed monitoringschemes consist of computing the so-called retrospective CUSUM statistic (or minorvariations thereof) after the arrival of each new observation. After proposing suitablethreshold functions for the chosen detectors, the asymptotic validity of the proceduresis investigated in the special case of monitoring for changes in the mean, both underthe null hypothesis of stationarity and relevant alternatives. To carry out the sequen-tial tests in practice, an approach based on an asymptotic regression model is usedto estimate high quantiles of relevant limiting distributions. Monte Carlo experimentsdemonstrate the good ﬁnite-sample behavior of the proposed monitoring schemes andsuggest that they are superior to existing competitors as long as changes do not occurat the very beginning of the monitoring. Extensions to statistics exhibiting an asymp-totic mean-like behavior are brieﬂy discussed. Finally, the application of the derivedsequential change-point detection tests is succinctly illustrated on temperature anomalydata.

MSC 2010 subject classiﬁcations:

Primary 62L99, 62E20; secondary 62G10.

Keywords and phrases: change-point detection, online monitoring, open-end proce-dures, sequential testing.

1. Introduction

In situations in which observations are continuously collected over time, the aim of sequential or online change-point detection is to issue an alarm as soon as possible if it is thoughtthat the probabilistic properties of the underlying unobservable data generating mechanismhave changed. While this problem has a long history in statistical process control (see, e.g.,Lai, 2001; Montgomery, 2007, for an overview), we adopt herein the alternative perspectiveproposed in the seminal work of Chu, Stinchcombe and White (1996) and treat the issuefrom the point of view of statistical testing. To ﬁx ideas, assume that we have at hand aninitial stretch X , . . . , X m (frequently referred to as the learning sample ) from a univariatestationary time series ( X i ) i ∈ Z . As new observations arrive, the aim is to look for evidenceagainst stationarity and issue an alarm if such evidence is deemed signiﬁcant. Speciﬁcally, a r X i v : . [ m a t h . S T ] J u l ssuming that the k th observation has been collected, a positive statistic D m ( k ) measuringdeparture from stationarity is computed from the available observations X , . . . , X k andcompared to a suitably chosen threshold w ( k/m ). If D m ( k ) > w ( k/m ), the hypothesis that X , . . . , X k is a stretch from a stationary time series is rejected and the monitoring stops.Otherwise, a new observation X k +1 is collected and the previous steps are repeated using X , . . . , X k +1 .Among such procedures, one can distinguish between closed-end approaches for which themonitoring eventually stops if stationarity is not rejected after the arrival of observation X n , n > m , and open-end approaches which, in principle, can continue indeﬁnitely. Thesequential testing procedures studied in this work pertain to this second category and, forthat reason, their null hypothesis is H : X , . . . , X m , X m +1 , X m +2 , . . . , is a stretch from a stationary time series . (1.1)Their interpretation as statistical tests implies that, given a signiﬁcance level α ∈ (0 , / w ( k/m ), k ≥ m + 1, need to be chosen such that, ideally, under H , P { D m ( k ) ≤ w ( k/m ) for all k ≥ m + 1 } = P (cid:26) sup m +1 ≤ k< ∞ D m ( k ) w ( k/m ) ≤ (cid:27) ≥ − α. (1.2)The previous display shows how closely the choice of the so-called detector D m and the threshold function w are related. Intuitively, under H , the sample paths of the stochasticprocess { D m ( k ) /w ( k/m ) } k ≥ m +1 should ﬂuctuate but not exceed a constant threshold toooften while, when H is not true, they are expected to rapidly exceed it after a change inthe data generating process has occurred.The starting point of our investigations is the recent seminal work of G¨osmann, Kleyand Dette (2020) who study open-end monitoring schemes sensitive to potential changesin a parameter θ (such as the mean) of a time series. A ﬁrst natural approach to tacklethis problem, often referred to as the ordinary CUSUM (cumulative sum) in the sequentialchange-point detection literature, consists of comparing an estimator θ m of θ computedfrom the learning sample X , . . . , X m with an estimator θ m +1: k of θ computed from theobservations X m +1 , . . . , X k collected after the monitoring has started; see, e.g., Horv´athet al. (2004), Aue et al. (2006) as well as the references given in the recent review by Kirchand Weber (2018). The idea of G¨osmann, Kley and Dette (2020) is to deﬁne detectors thattake into account all of the diﬀerences θ j − θ j +1: k , j ∈ { m + 1 , . . . , k − } . This approach,that can be regarded as adapted from retrospective (or oﬄine , a posteriori ) change-pointdetection (which assumes that data collection has been completed before testing is carriedout), treats each j ∈ { m + 1 , . . . , k − } as a potential change-point. In their experiments,G¨osmann, Kley and Dette (2020) found that their approach is not only more powerful thanthe ordinary CUSUM but also more powerful than the so-called Page CUSUM procedurewhich consists of deﬁning a detector from the diﬀerences θ m − θ j +1: k , j ∈ { m + 1 , . . . , k − } ;see, e.g, Fremdt (2015) and Kirch and Weber (2018).When computed after the k th observation has been collected, the detector proposed byG¨osmann, Kley and Dette (2020) (thus involving all the diﬀerences θ j − θ j +1: k , j ∈ { m +1 , . . . , k − } ) does not however coincide with the so-called retrospective CUSUM statistic X , . . . , X k (also involving all the diﬀerences θ j − θ j +1: k , j ∈ { m + 1 , . . . , k − } ; see, e.g., Cs¨org˝o and Horv´ath, 1997; Aue and Horv´ath, 2013 and thereferences therein). Reasons that led G¨osmann, Kley and Dette (2020) not to consider suchan approach are discussed in their Remark 2.1. As shall be explained in the next section,we believe that an open-end monitoring scheme using the retrospective CUSUM statisticas detector could be even more powerful than the procedure of G¨osmann, Kley and Dette(2020) as long as changes do not occur at the very beginning of the monitoring.The aim of this work is to address the theoretical and practical issues associated withdeﬁning a nonparametric detector for open-end monitoring such that it coincides at each k with the retrospective CUSUM statistic. The theoretical issues are mostly related to thechoice of the threshold function, while the practical issues come from the fact that quantilesof the underlying limiting distribution required to carry out the sequential test are harderto estimate.This paper is organized as follows. In the second section, we propose three open-end non-parametric monitoring schemes related to the retrospective CUSUM statistic designed to besensitive to changes in the mean of univariate time series. Their asymptotic behavior as thesize m of the learning sample tends to inﬁnity is studied in the third section both underthe null hypothesis of stationarity and relevant alternatives. Section 4 is concerned with theestimation of high quantiles of related limiting distributions necessary in practice to carryout the sequential tests. The ﬁfth section presents a summary of extensive numerical exper-iments demonstrating the good ﬁnite-sample properties of the resulting sequential testingprocedures. An extension to time series parameters whose estimators admit an asymptoticmean-like linearization as considered in G¨osmann, Kley and Dette (2020) is brieﬂy discussedin Section 6. A short illustration involving temperature anomaly data concludes the work.All proofs are deferred to the appendices. Throughout the paper, all convergences are for m → ∞ unless mentioned otherwise. A preliminary implementation of the studied tests isavailable in the package npcp (Kojadinovic, 2020) for the R statistical system (R Core Team,2020).

2. The retrospective CUSUM for monitoring changes in the mean

Our aim is to derive open-end nonparametric sequential change-point detection proceduresthat are particularly sensitive to alternative hypotheses of the form H : ∃ k (cid:63) ≥ m such that E ( X ) = · · · = E ( X k (cid:63) ) (cid:54) = E ( X k (cid:63) +1 ) = E ( X k (cid:63) +2 ) = . . . . (2.1)After the arrival of the k th observation with k > m , the data at hand consist of the stretch X , . . . , X k . If we were in the context of retrospective change-point detection, a natural teststatistic would be the so-called retrospective CUSUM statistic (see, e.g., Cs¨org˝o and Horv´ath,1997; Aue and Horv´ath, 2013, and the references therein) deﬁned by R k = max ≤ j ≤ k − j ( k − j ) k / | ¯ X j − ¯ X j +1: k | , (2.2)3here ¯ X j : k =  k − j + 1 k (cid:88) i = j X i , if j ≤ k, , otherwise . (2.3)In the deﬁnition of R k , we see that every j ∈ { , . . . , k − } is treated as a potential change-point in the sequence X , . . . , X k . The maximum over j then implies that R k will be largeas soon as the diﬀerence between ¯ X j and ¯ X j +1: k is large for some j .In the sequential context considered in this work, since X , . . . , X m is the learning sampleknown to be a stretch from a stationary time series, a ﬁrst natural modiﬁcation of (2.2) isto restrict the maximum over j to j ∈ { m, . . . , k − } . This is the idea considered by Detteand G¨osmann (2019) in a closed-end setting who, additionally, replaced the normalizingfactor k / by m / so that the asymptotics of the corresponding monitoring scheme couldbe studied as the size m of the learning sample tends to inﬁnity. The resulting detector is R m ( k ) = max m ≤ j ≤ k − j ( k − j ) m / | ¯ X j − ¯ X j +1: k | , k ≥ m + 1 . (2.4)In an open-end setting, G¨osmann, Kley and Dette (2020) choose however not to consider thedetector R m (see Remark 2.1 in the latter reference) but suggested instead the detector E m ( k ) = max m ≤ j ≤ k − k − jm / | ¯ X j − ¯ X j +1: k | , k ≥ m + 1 . (2.5)The diﬀerence between R m and E m evidently lies in the weighting of the absolute diﬀerencesof means | ¯ X j − ¯ X j +1: k | , j ∈ { m, . . . , k − } . Instead of weighting | ¯ X j − ¯ X j +1: k | by j ( k − j ) /m , (2.5) replaces j/m by 1. While this modiﬁcation may be beneﬁcial in terms of powerwhen k is close to m , it could have a negative impact when k is substantially larger than m because, then, j ( k − j ) /m can be substantially larger than ( k − j ) /m . In other words,we suspect that, for changes not occurring at the beginning of the monitoring, a suitabledetection scheme based on R m could be more powerful than the one proposed by G¨osmann,Kley and Dette (2020) based on E m . Remark . As mentioned in the introduction, the simplest detector in an open-end settingis probably the so-called ordinary CUSUM initially considered by Horv´ath et al. (2004) forinvestigating changes in the parameters of linear models. With the aim of detecting changesin the mean, it can be deﬁned by Q m ( k ) = k − mm / | ¯ X m − ¯ X m +1: k | , k ≥ m + 1 . (2.6)Following G¨osmann, Kley and Dette (2020), we will use it as a benchmark in our MonteCarlo experiments.As explained in the introduction, the choice of a detector needs to be accompanied bythe choice of a suitable threshold function. To heuristically justify our choice of a suitablethreshold function for the detector R m in (2.4), we momentarily consider the closed-endsetting in which monitoring stops at the latest after observation X n is collected. Following4ette and G¨osmann (2019) among others, to be able to study the sequential testing schemeasymptotically, we set n = (cid:98) mT (cid:99) for some real T >

1. Then, under H in (1.1), assumingthat the functional central limit theorem holds for ( X i ) i ∈ Z , it can be veriﬁed using relativelysimple arguments (see, e.g., Dette and G¨osmann, 2019, Section 3, or Kojadinovic and Verdier,2020, Section 2.3) that { R m ( (cid:98) mt (cid:99) ) } t ∈ [1 ,T ] converges in distribution to { L ( t ) } t ∈ [1 ,T ] , where L ( t ) = σ sup ≤ s ≤ t | tW ( s ) − sW ( t ) | , t ∈ [1 , T ] ,W is a standard Brownian motion and σ = (cid:80) i ∈ Z Cov( X , X i ) > X i ) i ∈ Z . For any ﬁxed t ∈ [1 , T ], by Brownian scaling and the substitution u = s/t , t − / σ − L ( t ) is equal in distribution tosup ≤ s ≤ t (cid:12)(cid:12)(cid:12) W (cid:16) st (cid:17) − st W (1) (cid:12)(cid:12)(cid:12) = sup /t ≤ u ≤ | W ( u ) − uW (1) | . Hence, for large t , the distribution of t − / σ − L ( t ) is close to that of the supremum of a Brow-nian bridge on [0 , H in (1.1), the distribution of t − / σ − R m ( (cid:98) mt (cid:99) )stabilizes as m and t increase. The latter observation suggests the possibility of an open-endsequential testing scheme based on R m in (2.4) with a threshold function that is not toodiﬀerent from t (cid:55)→ t / .As shall become clear from Theorem 3.3 below, in order to ensure that (1.2) is fullymeaningful when D m = R m , the corresponding threshold function actually needs to divergeto ∞ (as t → ∞ ) slightly faster than t / . We propose to use as threshold function for R m w R ( t ) = t / η w γ ( t ) , t ∈ [1 , ∞ ) , (2.7)where η > w γ ( t ) = max (cid:26)(cid:18) t − t (cid:19) γ , (cid:15) (cid:27) , t ∈ [1 , ∞ ) , (2.8)with γ ≥ (cid:15) > (cid:15) = 10 − in our implementation).Let us ﬁrst explain the role of the parameter η . Following the perspective adopted inthe discussion below (1.2), the resulting monitoring can be seen as consisting of computing { w R ( k/m ) } − R m ( k ) for k ≥ m + 1. It is elementary that for any ﬁxed k ≥ m + 1, as weincrease η both the mean and the variance of { w R ( k/m ) } − R m ( k ) decrease. The top-left plotof Figure 1 displays one sample path of {{ w R ( k/m ) } − R m ( k ) } m +1 ≤ k ≤ m +5000 for m = 100, γ = 0 and η = 0 . η = 0 .

001 instead (dottedline). Unsurprisingly, because of the factor ( k/m ) − η in { w R ( k/m ) } − R m ( k ), the sample pathwith η = 0 . η = 0 . η is conﬁrmed by thetop-right plot of Figure 1 which displays the corresponding empirical standard deviations at k against k − m computed from 1000 sample paths. As expected, increasing the parameter η increases the rate of convergence (as k → ∞ ) of { w R ( k/m ) } − R m ( k ) (and its mean and5 . . . . . k − m R m ( k ) w R ( k m ) h = h = . . . . k − m e m p i r i c a l s t. de v . a t k . . . . . k − m S m ( k ) w S ( k m ) . . . . k − m e m p i r i c a l s t. de v . a t k . . . . . . . k − m T m ( k ) w T ( k m ) . . . . k − m e m p i r i c a l s t. de v . a t k Fig 1 . Left column: for m = 100 , γ = 0 and (cid:15) = 10 − , the solid lines represent one sample pathof {{ w R ( k/m ) } − R m ( k ) } m +1 ≤ k ≤ m +5000 (ﬁrst row), {{ w S ( k/m ) } − S m ( k ) } m +1 ≤ k ≤ m +5000 (second row) and {{ w T ( k/m ) } − T m ( k ) } m +1 ≤ k ≤ m +5000 (third row) for η = 0 . computed from an independent sequence ofstandard normal random variables. The dotted lines represent the sample paths computed from the samesequence but with η = 0 . instead. Right column: corresponding empirical standard deviations at k against k − m computed from 1000 sample paths. variance) to zero. Intuitively, in the context of open-end monitoring, one would thereforewant η to be very small so that there is little reduction in variability as time elapses. Thepractical choice of the parameter η will be discussed in detail in Section 4.Let us now explain the role of the parameter γ . The multiplication by the function w γ in (2.7) aims at possibly improving the ﬁnite-sample performance of the sequential testingscheme at the beginning of the monitoring and has a negligible eﬀect later. The use of sucha modiﬁcation is common in the literature and can be found for instance in Horv´ath et al.(2004), Fremdt (2015), Kirch and Weber (2018) and G¨osmann, Kley and Dette (2020), amongmany others. Notice that, unlike what is frequently done in the literature, we do not imposethat γ be strictly smaller than 1 /

2. To provide some further insight, consider the top-leftplot of Figure 2 which displays the empirical 95% quantile (solid line), empirical standard6

100 200 300 400 500 . . . . k − m R m ( k ) w R ( k m ) w i t h g = . . . . k − m R m ( k ) w R ( k m ) w i t h g = . . . . . k − m R m ( k ) w R ( k m ) w i t h g = . . . . . k − m R m ( k ) w R ( k m ) w i t h g = . Fig 2 . The solid (resp. dashed, dotted) line represents the empirical 95% quantile (resp. empirical standarddeviation, sample mean) of { w R ( k/m ) } − R m ( k ) with m = 100 , η = 0 . and γ ∈ { , . , . , . } against k − m computed from 1000 sample paths. deviation (dashed line) and sample mean (dotted line) of { w R ( k/m ) } − R m ( k ) for m = 100, η = 0 .

001 and γ = 0 against k − m computed from 1000 sample paths computed fromindependent standard normal sequences. As one can see, because of the small value of η , thedistribution of { w R ( k/m ) } − R m ( k ) appears to stabilize rather quickly as k increases. Thespeed at which this occurs can be increased by increasing γ . For instance, by comparingthe top-left and bottom-left plots of Figure 2, one can clearly see that the distribution of { w R ( k/m ) } − R m ( k ) stabilizes quicker as k increases for γ = 0 .

25 than for γ = 0. However,as one can see from the plot for γ = 0 .

45, the value of γ should not be taken too large.In addition to the detector R m in (2.4), we shall also consider the detectors S m ( k ) = 1 m k − (cid:88) j = m j ( k − j ) m / | ¯ X j − ¯ X j +1: k | , k ≥ m + 1 , (2.9) T m ( k ) = (cid:118)(cid:117)(cid:117)(cid:116) m k − (cid:88) j = m (cid:26) j ( k − j ) m / ( ¯ X j − ¯ X j +1: k ) (cid:27) , k ≥ m + 1 , (2.10)with corresponding threshold functions w S ( t ) = t / η w γ ( t ) , (2.11) w T ( t ) = t η w γ ( t ) . (2.12)As one can see, S m and T m could be regarded as the L and L versions, respectively, of R m in (2.4). 7he parameters η and γ in (2.11) and (2.12) play the same role as in (2.7). For η , this canbe empirically veriﬁed from the second and third rows of graphs in Figure 1. For γ , plotssimilar to Figure 2 for S m and T m reveal that values of γ larger than 0.25 seem meaningful forthese two detectors. Speciﬁcally, it seems possible to improve the ﬁnite-sample performanceof the corresponding schemes at the beginning of the monitoring by taking γ as large as 0 . S m and as large as 0 .

45 for T m .

3. Asymptotics of the procedures

To study the asymptotic behavior of the three considered monitoring schemes under H in (1.1), we follow Horv´ath et al. (2004), Aue et al. (2006), Fremdt (2015) and G¨osmann,Kley and Dette (2020), among others, and assume that the observations satisfy the followingcondition. Condition 3.1.

The data are a stretch from a stationary time series ( X i ) i ∈ Z such that σ = (cid:80) i ∈ Z Cov( X , X i ) , the long-run variance of ( X i ) i ∈ Z , is strictly positive and ﬁnite.Furthermore, for all m ∈ N , there exists two independent standard Brownian motions W m, and W m, such that, for some < ξ < / , sup m +1 ≤ k ≤∞ k − m ) ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:88) i = m +1 { X i − E ( X ) } − σW m, ( k − m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (1) (3.1) and m ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) i =1 { X i − E ( X ) } − σW m, ( m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (1) . (3.2)As mentioned in Remark 2.6 of G¨osmann, Kley and Dette (2020), the validity of theprevious conditions is discussed in Section 2 of Aue and Horv´ath (2004) for diﬀerent classesof time series including GARCH and strongly mixing processes. Remark . In the prototypical situation in which ( X i ) i ∈ Z is a sequence of independentnormal random variables with variance σ , there exists a probability space on which theabove conditions are trivially satisﬁed with O P (1) replaced with 0. This is essentially justthe statement that the increments of a Brownian motion are standard normal random vari-ables. To be more precise, let ( B j ) j ≥ be independent standard Brownian bridges that areindependent of ( X i ) i ∈ Z . Without loss of generality, we may assume that the X i are stan-dard normal, and we can deﬁne W ,m by ﬁrst specifying its values at integer times by W ,m ( k − m ) = (cid:80) ki = m +1 X i , k ≥ m , and then interpolating between these values usingthe bridges ( B m + j ) j ≥ : for t ∈ ( k − m, k − m + 1), set W ,m ( t ) = W ,m ( k − m ) + B k { t − ( k − m ) } + { t − ( k − m ) }{ W ,m ( k − m + 1) − W ,m ( k − m ) } (it is an exercise to check that theresulting process W ,m is a standard Brownian motion). Then, the term in absolute valuesin (3.1) is exactly 0 for each k and m . Similarly, setting W ,m ( j ) = (cid:80) ji =1 X i for j ≤ m andinterpolating with the bridges ( B j ) ≤ j ≤ m − makes (3.2) hold with 0 in the absolute value foreach m . Furthermore, by construction, W ,m and W ,m are independent.The following result is proven in Appendix A.8 heorem 3.3. Under H in (1.1) and Condition 3.1, for any ﬁxed η > , (cid:15) > and γ ≥ , σ − sup m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) (cid:32) sup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | ,σ − sup m +1 ≤ k< ∞ { w S ( k/m ) } − S m ( k ) (cid:32) sup t ∈ [1 , ∞ ) { w S ( t ) } − (cid:90) t | tW ( s ) − sW ( t ) | d s,σ − sup m +1 ≤ k< ∞ { w T ( k/m ) } − T m ( k ) (cid:32) sup t ∈ [1 , ∞ ) { w T ( t ) } − (cid:115)(cid:90) t { tW ( s ) − sW ( t ) } d s, where the arrow ‘ (cid:32) ’ denotes convergence in distribution, the detectors R m , S m and T m aredeﬁned in (2.4) , (2.9) and (2.10) , respectively, the threshold functions w R , w S and w T aredeﬁned in (2.7) , (2.11) and (2.12) , respectively, and W is a standard Brownian motion. Inaddition, all the limiting random variables are almost surely ﬁnite. Imposing that η is strictly positive in the previous theorem is necessary for ensuring thatthe limiting random variables are almost surely ﬁnite. Recall the deﬁnition of the function w γ in (2.8). Since the function t (cid:55)→ /w γ ( t ) is bounded from below by 1, the following result,proven in Appendix A, implies that for the monitoring scheme based on R m and w R , thiscondition is necessary and suﬃcient. Proposition 3.4.

For any ﬁxed

M > , P (cid:26) sup ≤ s ≤ t< ∞ t / | tW ( s ) − sW ( t ) | ≥ M (cid:27) = 1 , where W is a standard Brownian motion.Remark . We thus have that sup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | is almost surely ﬁnitefor η > η = 0. By the law of the iterated logarithm for Brownian motion,this supremum remains ﬁnite if t η in w R in (2.7) is replaced by h ( t ), where h ( t ) = √ log log t when t > e e and h ( t ) = 1 when t ≤ e e . We expect that Theorem 3.3 remains valid with sucha modiﬁcation which could be considered optimal in the sense that, as t → ∞ , h divergesslower to inﬁnity than t (cid:55)→ t η for any η >

0. The latter implies that the use of h ( t ) insteadof t η entails the lowest possible variability reduction for the monitoring scheme (in the senseof the discussion on the role of η in the previous section) in the limit as k → ∞ .The next corollary provides an operational version of Theorem 3.3 to carry out the threesequential change-point detection tests in practice. Corollary 3.6.

The statement of Theorem 3.3 remains true if the long-run variance σ isreplaced by an estimator σ m of σ computed from the learning sample X , . . . , X m such that σ m − σ = o P (1) . To ﬁx ideas, let us brieﬂy explain how Corollary 3.6 can be used to carry out the se-quential test based on R m in (2.4). Given a signiﬁcance level α ∈ (0 , / q R, − α , the (1 − α )-quantile of the continuous random variablesup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | (this aspect will be discussed in detail in Section 4).9hen, under H in (1.1), from the Portmanteau theorem, P (cid:26) σ − m sup m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) ≤ q R, − α (cid:27) → P (cid:26) sup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | ≤ q R, − α (cid:27) = 1 − α. (3.3)Hence, for large m , we can expect that, under H , P { R m ( k ) ≤ σ m q R, − α w R ( k/m ) for all k ≥ m + 1 } ≈ − α. In practice, after the arrival of observation X k , k > m , R m ( k ) is computed from X , . . . , X k and compared to the threshold σ m q R, − α w R ( k/m ) (or, equivalently, σ − m { w R ( k/m ) } − R m ( k )is computed and compared to the threshold q R, − α ). If R m ( k ) > σ m q R, − α w R ( k/m ), thenull hypothesis is rejected and the monitoring stops. Otherwise, the next observation iscollected and the previous iteration is carried out with the k + 1 available data points.Corollary 3.6 and (3.3) in particular guarantee that such a sequential testing procedure willhave asymptotic level α . Steps to carry out the tests based on S m in (2.9) and T m in (2.10)can be obtained mutatis mutandis .We now turn to the asymptotic behavior of the monitoring schemes under sequences ofalternatives related to H in (2.1). We start with the procedure based on R m in (2.4) whichwe study under a condition similar to the one used by G¨osmann, Kley and Dette (2020,Theorem 2.13). Condition 3.7.

The data are a stretch, for some m ∈ N , from the sequence of randomvariables ( X ( m ) i ) i ∈ N deﬁned by X ( m ) i = (cid:40) Y (0) i , if i ≤ k (cid:63)m ,Y ( m ) i , otherwise , where ( k (cid:63)m ) m ∈ N is a sequence of integers such that k (cid:63)m ≥ m and ( Y (0) i ) i ∈ Z , ( Y (1) i ) i ∈ Z , . . . , ( Y ( m ) i ) i ∈ Z , . . . are sequences of random variables deﬁned on the same probability space andsatisfying(i) for each m ≥ , (cid:0) E ( Y ( m ) i ) (cid:1) i ∈ Z is a constant sequence,(ii) √ m | E ( Y (0) ) − E ( Y ( m ) ) | → ∞ ,(iii) (cid:112) k (cid:63)m k (cid:63)m (cid:88) i =1 { Y (0) i − E ( Y (0) ) } = O P (1) , (3.4) and either(iv) there exist constants c > and C > c > such that c ≤ m − k (cid:63)m ≤ C for all m ∈ N and √ m k (cid:63)m + (cid:98) cm (cid:99) (cid:88) i = k (cid:63)m +1 { Y ( m ) i − E ( Y ( m ) ) } = O P (1) , (3.5)10 r(v) k (cid:63)m /m → ∞ , and there exists a constant c > such that (cid:112) k (cid:63)m k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) (cid:88) i = k (cid:63)m +1 { Y ( m ) i − E ( Y ( m ) ) } = O P (1) . (3.6) Remark . Statement ( iv ) (resp. ( v )) in Condition 3.7 is related to what were called early (resp. late ) changes in G¨osmann, Kley and Dette (2020) because k (cid:63)m /m = O (1) (resp. k (cid:63)m /m → ∞ ). Remark . Suppose that ( Y (0) i ) i ∈ Z is a stationary centered sequence for which the centrallimit theorem holds, ( a m ) m ∈ N is a sequence such that √ ma m → ∞ and, for any m ∈ N , Y ( m ) i = Y (0) i + a m , i ∈ Z . It is then easy to verify that the sequences ( Y ( m ) i ) i ∈ Z , m ≥

0, satisfy( i ) and ( ii ) in Condition 3.7. Let us verify that they also satisfy ( iii ), ( iv ) and ( v ). Let ε > Y (0) i ) i ∈ Z , there exists M > n ∈ N P (cid:16) n − / n (cid:88) i =1 Y (0) i > M (cid:17) < ε. (3.7)Hence, (3.4) holds for any sequence ( k (cid:63)m ) m ∈ N . For c = 1, the quantity on the left-hand sideof (3.5) is equal to 1 √ m k (cid:63)m + m (cid:88) i = k (cid:63)m +1 Y (0) i . Since P ( m − / (cid:80) mi =1 Y (0) i > M ) = P ( m − / (cid:80) n + mi = n +1 Y (0) i > M ) for all m, n ∈ N , (3.7) impliesthat sup n ∈ N sup m ∈ N P ( m − / (cid:80) n + mi = n +1 Y (0) i > M ) < ε which in turn implies that (3.5) holdswith c = 1 for any sequence ( k (cid:63)m ) m ∈ N . Similarly, for c = 1, the quantity on the left hand sideof (3.6) is equal to 1 (cid:112) k (cid:63)m k (cid:63)m (cid:88) i = k (cid:63)m +1 Y (0) i , and since sup n ∈ N P ( n − / (cid:80) ni = n +1 Y (0) i > M ) < ε , (3.6) holds with c = 1 for any sequence( k (cid:63)m ) m ∈ N .The following result proven in Appendix B establishes the consistency of the procedurebased on R m under Condition 3.7. Theorem 3.10.

Under Condition 3.7, for any ﬁxed η ≥ , (cid:15) ≥ and γ ≥ , sup m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) P → ∞ , where the arrow ‘ P → ’ denotes convergence in probability. σ m appearing in Corollary 3.6 con-verges in probability to a strictly positive constant. An immediate corollary of the previ-ous theorem is then that, under Condition 3.7, for any ﬁxed η ≥ (cid:15) ≥ γ ≥ σ − m sup m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) diverges in probability to inﬁnity. In other words, un-der the considered conditions, for any ﬁxed η ≥ (cid:15) ≥ γ ≥

0, and any threshold κ > P (cid:20) σ − m sup m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) > κ (cid:21) → . Proving the consistency of the procedures based on S m in (2.9) and T m in (2.10) for latechanges turns out to be more diﬃcult. The following result, proven in Appendix B, showsthat they are consistent for early changes. Note that the considered condition is very similarto those considered for instance in Dette and G¨osmann (2019, Theorem 3.8) or Kojadinovicand Verdier (2020, Proposition 2.7) in the context of closed-end monitoring. Condition 3.11.

The data are a stretch, for some m ∈ N , from the sequence of randomvariables ( X ( m ) i ) i ∈ N deﬁned by X ( m ) i = (cid:40) Y i , if i ≤ k (cid:63)m ,Z i , otherwise , where k (cid:63)m = (cid:98) mc (cid:99) for some constant c ≥ , and ( Y i ) i ∈ Z and ( Z i ) i ∈ Z are stationary sequencesdeﬁned on the same probability space such that E ( Y ) (cid:54) = E ( Z ) and for which the functionalcentral limit theorem holds. Theorem 3.12.

Under Condition 3.11, for any ﬁxed

T > c , { m − / H m ( s, t ) } ≤ s ≤ t ≤ T con-verges in probability to { K c ( s, t ) } ≤ s ≤ t ≤ T , where, for any ≤ s ≤ t , H m ( s, t ) = m − / (cid:98) ms (cid:99) ( (cid:98) mt (cid:99) − (cid:98) ms (cid:99) )( ¯ X ( m ) (cid:98) ms (cid:99) − ¯ X ( m ) (cid:98) ms (cid:99) +1: (cid:98) mt (cid:99) ) ,K c ( s, t ) = ( s ∧ c ) { ( t ∨ c ) − ( s ∨ c ) }{ E ( X ( m ) k (cid:63) ) − E ( X ( m ) k (cid:63) +1 ) } , and ∨ and ∧ are the maximum and minimum operators, respectively. Consequently, for anyﬁxed η ≥ , (cid:15) ≥ and γ ≥ , sup m +1 ≤ k< ∞ { w S ( k/m ) } − S m ( k ) P → ∞ and sup m +1 ≤ k< ∞ { w T ( k/m ) } − T m ( k ) P → ∞ . (3.8)Since the estimator σ m appearing in Corollary 3.6 converges in probability to a strictlypositive constant under Condition 3.11, an immediate corollary of Theorem 3.12 is that,under the same conditions, σ − m sup m +1 ≤ k< ∞ { w S ( k/m ) } − S m ( k ) P → ∞ and σ − m sup m +1 ≤ k< ∞ { w T ( k/m ) } − T m ( k ) P → ∞ .

4. Estimation of high quantiles of the limiting distributions

From the discussion following Corollary 3.6, we know that accurate estimations of high quan-tiles of the limiting distributions appearing in Theorem 3.3 are necessary to carry out the12equential tests based on the detectors R m , S m and T m deﬁned in (2.4), (2.9) and (2.10),respectively. Before we present the approach considered in this work, let us brieﬂy explainhow Horv´ath et al. (2004) and G¨osmann, Kley and Dette (2020) proceeded for the proce-dure based on Q m in (2.6) and the one based on E m in (2.5), respectively. From the latterreferences, suitable threshold functions for these two detectors are w Q ( t ) = w E ( t ) = tw γ ( t ), t ∈ [1 , ∞ ), where the function w γ is deﬁned in (2.7) but with γ restricted to the inter-val [0 , / H in (1.1) andCondition 3.1, one has σ − m sup m +1 ≤ k< ∞ { w Q ( k/m ) } − Q m ( k ) (cid:32) sup t ∈ [0 , W ( t )max( t γ , (cid:15) ) ,σ − m sup m +1 ≤ k< ∞ { w E ( k/m ) } − E m ( k ) (cid:32) sup ≤ s ≤ t ≤ t γ , (cid:15) ) | W ( t ) − W ( s ) | , where W is a standard Brownian motion. Notice that, using the law of the iterated loga-rithm/local modulus of continuity for Brownian motion, it can be veriﬁed, since γ < / (cid:15) is taken equal to zero.When γ = 0, it seems particularly natural to estimate high quantiles of these limiting dis-tributions by Monte Carlo simulation using an approximation of W on a ﬁne grid of [0 , (cid:15) = 0 even when γ > R m in (2.4). Proposition 4.1.

For any ﬁxed η > , (cid:15) > and γ ≥ , the random variable sup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | = sup ≤ s ≤ t< ∞ t / η max[ { ( t − /t } γ , (cid:15) ] | tW ( s ) − sW ( t ) | (4.1) and the random variable sup γ ≥ α ∈ (0 , / q R, − α denote the (1 − α )-quantile ofsup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | . To estimate q R, − α , we propose to take m relatively13arge and simulate a large number N of sample paths of {{ w R ( k/m ) } − R m ( k ) } m +1 ≤ k ≤ m +2 p for p large from sequences of independent standard normal random variables. At this point, itmay be tempting to use the (1 − α )-empirical quantile of sup m +1 ≤ k ≤ m +2 p { w R ( k/m ) } − R m ( k )as an estimate of q R, − α . Although the latter is expected to be a good estimate of the (1 − α )-quantile of sup ≤ s ≤ t ≤ T { w R ( t ) } − | tW ( s ) − sW ( t ) | for T = 1 + 2 p /m , it may underestimate q R, − α since the supremum for 1 ≤ s ≤ t < ∞ obviously dominates the supremum for1 ≤ s ≤ t ≤ T for any T >

1. The latter will actually depend on the value η . Indeed,recall from our discussion in Section 2 that, because of the factor ( k/m ) − η , increasing η decreases the variance of { w R ( k/m ) } − R m ( k ) as k increases. This has two consequences asfar as the above mentioned empirical estimation of q R, − α is concerned: ( i ) for any ﬁxed η >

0, since the variability of { w R ( k/m ) } − R m ( k ) decreases as k increases, if we couldset p large enough, we would be able to obtain a good estimate of q R, − α ; unfortunately,our margin of action in that respect is limited by computational resources; ( ii ) for anyﬁxed p , as η decreases to 0, the probability that our estimate of the (1 − α )-quantile ofsup ≤ s ≤ t ≤ p /m { w R ( t ) } − | tW ( s ) − sW ( t ) | is also a good estimate of the (1 − α )-quantile ofsup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | decreases to zero; in other words, for any ﬁxed p , as η decreases, it is less and less likely that the distribution of sup m +1 ≤ k ≤ m +2 p { w R ( k/m ) } − R m ( k )is a good approximation of the distribution of sup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | .To try to empirically solve this quantile estimation problem, we propose, for any ﬁxed η >

0, to attempt to model the relationship between p and the (1 − α )-empirical quan-tile of sup m +1 ≤ k ≤ m +2 p { w R ( k/m ) } − R m ( k ), and to use the ﬁtted model to extrapolate thevalue of the quantile for larger p . To do so, we set m to 500, generated N = 15000sample paths of {{ w R ( k/m ) } − R m ( k ) } m +1 ≤ k ≤ m +2 using a computer cluster and, for p ∈{ , . . . , } , estimated the (1 − α )-quantiles of sup m +1 ≤ k ≤ m +2 p { w R ( k/m ) } − R m ( k ). Let q ( p ) R, − α , p ∈ { , . . . , } , denote the resulting estimates. Then, we ﬁtted a so-called asymp-totic regression model to the pairs ( p, q ( p ) R, − α ), p ∈ { , . . . , } . The considered model, oftenused for the analysis of dose–response curves, is a three-parameter model with mean function f ( x ) = c + ( d − c ) { − exp( − x/e ) } , where y = d is the equation of the upper horizontalasymptote of f . Its ﬁtting was carried out using the R package drc (Ritz et al., 2015). Acandidate estimate of q R, − α , the (1 − α )-quantile of sup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | ,is then the resulting estimate of the parameter d .The ﬁrst row of graphs in Figure 3 shows the scatter plots of { ( p, q ( p ) R, − α ) } p ∈{ ,..., } for α = 0 . γ = 0 and η ∈ { . , . , . , . , . , . } . The corresponding ﬁttedasymptotic regression models are represented by dashed curves. The estimated upper hor-izontal asymptotes are represented by dotted horizontal lines whose equations are given inthe upper right corners of the plots.As one can see from the ﬁrst two graphs in the ﬁrst row of Figure 3, the scatter plotsfor η = 0 . η = 0 .

05 reveal the presence of a plateau for larger values of p . Thelatter is an empirical indication of the fact that the distribution of the random variablesup m +1 ≤ k ≤ m +2 p { w R ( k/m ) } − R m ( k ), say for p = 18, seems to be a good approximation of thedistribution of sup m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) and thus of sup ≤ s ≤ t< ∞ { w R ( t ) } − | tW ( s ) − sW ( t ) | . In other words, because of the relatively large values of η leading to a relativelyquick reduction of the variability of { w R ( k/m ) } − R m ( k ) as k increases, the supremum of14 . . . . . . . R m y = 1.563 h = . . . . . . . R m y = 1.695 h = . . . . . . . R m y = 1.889 h = . . . . . . . R m y = 1.927 h = . . . . . . . R m y = 1.956 h = . . . . . . . R m y = 1.963 h = . . . . S m y = 0.743 h = . . . . S m y = 0.85 h = . . . . S m y = 0.971 h = . . . . S m y = 0.991 h = . . . . S m y = 1.007 h = . . . . S m y = 1.001 h = . . . . . . T m y = 0.849 h = . . . . . . T m y = 0.96 h = . . . . . . T m y = 1.088 h = . . . . . . T m y = 1.107 h = . . . . . . T m y = 1.121 h = . . . . . . T m y = 1.128 h = Fig 3 . For α = 0 . and γ = 0 , scatter plots of { ( p, q ( p ) R, − α ) } p ∈{ ,..., } (ﬁrst row), { ( p, q ( p ) S, − α ) } p ∈{ ,..., } (second row) and { ( p, q ( p ) T, − α ) } p ∈{ ,..., } (third row), corresponding ﬁtted asymptotic regression mod-els (dashed curves) and estimates of the upper horizontal asymptotes (dotted lines) for η ∈{ . , . , . , . , . , . } . {{ w R ( k/m ) } − R m ( k ) } m +1 ≤ k< ∞ tends to occur for k relatively close to m (and smaller than m + 2 ). Speciﬁcally, under H in (1.1) and with η = 0 . { w R ( k/m ) } − R m ( k ) at k = 2 could be very roughly approximated to be (2 / − . ≈

53% (resp. (2 / − . ≈ { w R ( k/m ) } − R m ( k ) in theearly stages of the monitoring. The corresponding standard deviation reductions for η = 0 . η , the larger the estimated quantiles q ( p ) R, − α , p ∈ { , . . . , } , and thus the larger the estimated horizontal asymptote (which is a can-didate estimate of q R, − α ). As one can see, the rate of increase of the estimated quantilesdecreases as η decreases. For instance, there are hardly any diﬀerences between the plotfor η = 0 .

001 and the plot for η = 0 . η = 0. The latter empirical observation practically im-plies that the proposed estimation technique will provide a ﬁnite estimate of q R, − α evenwhen, according to Proposition 3.4, q R, − α is known to be inﬁnite. It follows that it is notmeaningful in practice to consider values of η that are “very small”. Based on the previousapproximate variance reduction calculations and the plots given in Figure 3, our intuitiverecommendation is not to take η smaller than 0.001. Notice, that with m = 100, this valueof η induces an approximate standard deviation reduction under H for { w R ( k/m ) } − R m ( k )after 10 monitoring steps of less than 2%.The estimated quantiles for η = 0 . α ∈ { . , . , . } , γ = 0 as well for the largest15 able 1 For α ∈ { . , . , . } , estimated (1 − α ) -quantiles of the limiting distributions appearing in Theorem 3.3related to the monitoring schemes based on R m , S m and T m for η = 0 . and diﬀerent values of γ . Thequantiles were estimated using asymptotic regression models. Standard errors of the estimates are providedbetween parentheses. Estimated quantiles for larger values of η are available in the R package npcp . R m S m T m − α γ = 0 γ = 0 . γ = 0 γ = 0 . γ = 0 γ = 0 . meaningful value of γ (among those that were considered) for the three monitoring schemesare reported in Table 1 along with standard errors. Estimated quantiles for larger values of η are available in the R package npcp (Kojadinovic, 2020). Remark . It is a research project of its own to investigate more thoroughly the estimationof the quantiles both empirically and theoretically. For instance, one may wish to investigatebounds on the probability for a given η >

T > ≤ s ≤ t ≤ T { w R ( t ) } − | tW ( s ) − sW ( t ) | occurs at some ( s, t ) with t > T , and whether such occurrences tend to be associatedwith small suprema or large suprema.

5. Monte Carlo experiments

To investigate the ﬁnite-sample properties of the studied open-end sequential change-pointdetection procedures, we carried out extensive Monte Carlo experiments. One should howeverkeep in mind that numerical experiments cannot provide a full insight into the ﬁnite-samplebehavior of open-end approaches as ﬁnite computing resources impose that monitoring hasto be stopped eventually.To estimate the long-run variance σ related to the learning sample, we used the approachof Andrews (1991) based on the quadratic spectral kernel with automatic bandwidth selectionas implemented in the function lrvar() of the R package sandwich (Zeileis, 2004). Weconsidered 10 data generating models, denoted M1, . . . , M10. Models M1, . . . , M5 aresimple AR(1) models with normal innovations whose autoregressive parameter is equal to0, 0.1, 0.3, 0.5 and 0.7, respectively. These models were chosen among others to allow acomparison of our results with those of G¨osmann, Kley and Dette (2020, Section 4.1). ModelM6 generates independent observations from the Student t distribution with 5 degrees offreedom. Model M7 is a GARCH(1,1) model with normal innovations and parameters ω =0 . β = 0 .

919 and α = 0 .

072 to mimic SP500 daily log-returns following Jondeau, Poonand Rockinger (2007). Models M8 and M9 are the nonlinear autoregressive model usedin Paparoditis and Politis (2001, Section 3.3) and the exponential autoregressive modelconsidered in Auestad and Tjøstheim (1990) and Paparoditis and Politis (2001, Section 3.3).The underlying generating equations are X i = 0 . X i − ) + (cid:15) i and X i = { . − . − X i − ) } X i − + 0 . (cid:15) i , (cid:15) i are independent standard normal innovations. Note that, for alltime series models, a burn-out sample of 100 observations was used. Finally, in order to mimiccount data, model M10 generates independent observations from a Poisson distribution withparameter 3.In a ﬁrst series of experiments, we attempted to assess how well the studied tests holdtheir level. To do so, we generated 5000 samples from models M1–M10 and used the ﬁrst m observations of each sample as learning sample. All the sequential tests were carried out atthe 5% nominal level using the estimated quantiles available in the R package npcp (see alsoTable 1). Because monitoring cannot go on indeﬁnitely, we stopped the sequential testingprocedures after the arrival of observation n = m + 10000. The percentages of rejection of H in (1.1) for the procedures based on R m in (2.4), S m in (2.9), T m in (2.10) with γ = 0and η ∈ { . , . , . , . } , as well as for the procedures based on E m in (2.5) and Q m in (2.6) with γ = 0, are reported in Table 2 (to carry out the procedures based on Q m and E m , we used the quantiles reported in Table 1 of Horv´ath et al. (2004) and G¨osmann, Kleyand Dette (2020), respectively). Given the closed-end nature of the experiments, one shouldkeep in mind that the empirical levels would have been higher had larger values of n beenconsidered. As one can see from Table 2, for most models, the empirical levels drop belowthe 5% threshold rather quickly as m increases. When the tests are too liberal, it is probablymostly a consequence of the diﬃculty of the estimation of the long-run variance σ . It is forinstance unsurprising that a large value of m is necessary to obtain a reasonably accurateestimate of σ for model M5 (the AR(1) model with the strongest serial dependence) ormodel M9 (an exponential autoregressive model). Notice that, overall, the tests based on E m and Q m are less liberal then the tests based on R m , S m and T m when the serial dependence isstrong and m is small. The opposite tends to occur as m increases. Among the three proposedprocedures, the one based on S m is the most conservative, followed by the procedure basedon T m . The fact that the empirical levels are higher for large values of η is due to the factor( k/m ) − η which favors the occurrence of false alarms (threshold exceedences) at the beginningof the monitoring.In a second series of experiments, we studied the ﬁnite-sample behavior of the tests under H in (2.1) using simulation scenarios similar to those considered in G¨osmann, Kley andDette (2020, Section 4.1). We generated 1000 samples of size n = m + 7000 from modelsM1 with m = 100 and M4 with m = 200 and, for each sample, added a positive oﬀset of δ to all observations after position k (cid:63) ∈ { m, m + 500 , m + 1000 , m + 5000 } . We started byinvestigating the inﬂuence of η on the most conservative procedure according to Table 2.The rejection percentages for the test based on S m with γ = 0 and η ∈ { . , . , . } , aswell as for the procedures based on E m and Q m with γ = 0 are represented in Figure 4. Asone can see, when the change occurs right at the beginning of the monitoring, the larger η ,the more powerful the procedure based on S m . As k (cid:63) increases, the inﬂuence of the factor( k/m ) − η comes into eﬀect and the opposite tends to occurs (although the diﬀerence in powerdoes not seem of practical importance for the monitoring periods under consideration). Asfar as the procedures based on E m and Q m are concerned, they are more powerful than theprocedure based on S m when the change occurs at the beginning of the monitoring but, asexpected, become less powerful as k (cid:63) increases. The fourth column of plots in Figure 4 showsthat the diﬀerence in terms of power can be substantial. One should however keep in mind17 able 2 Percentages of rejection of H in (1.1) for the procedures based on R m , S m , T m with γ = 0 and η ∈ { . , . , . , . } , as well as for the procedures based on E m and Q m with γ = 0 . The rejectionpercentages are computed from 5000 samples of size n = m + 10000 generated from the time series modelsM1 – M10. R m with η = S m with η = T m with η =Model m E m Q m M1 100 8.7 7.8 7.4 7.1 5.5 4.8 4.6 4.5 6.9 5.4 5.4 5.4 6.1 6.7200 6.7 5.2 4.7 4.5 3.7 2.8 2.8 2.8 4.6 3.7 3.6 3.5 4.9 5.1400 5.4 3.6 3.1 2.8 2.4 1.6 1.4 1.2 3.3 2.3 2.1 2.0 4.7 5.0800 4.3 2.5 2.1 1.7 1.7 0.9 0.7 0.7 2.6 1.4 1.2 1.1 3.1 3.9M2 100 9.0 8.2 7.8 7.6 6.0 5.1 4.9 4.8 7.2 6.1 5.9 6.0 6.3 6.6200 6.9 5.3 5.0 4.7 3.8 3.0 2.9 2.8 4.9 3.8 3.7 3.6 4.8 5.2400 5.4 3.6 3.2 2.7 2.4 1.6 1.4 1.3 3.5 2.3 2.2 2.1 4.8 5.1800 4.2 2.5 2.1 1.9 1.7 0.9 0.7 0.7 2.6 1.4 1.2 1.1 3.2 3.9M3 100 10.4 9.8 9.5 9.3 7.1 6.2 6.0 5.9 8.4 7.4 7.4 7.3 6.7 7.2200 7.5 5.9 5.5 5.3 4.1 3.4 3.2 3.1 5.4 4.2 4.1 4.0 4.9 5.6400 5.5 3.6 3.2 2.9 2.6 1.6 1.5 1.4 3.7 2.5 2.4 2.3 4.9 5.2800 4.2 2.5 2.1 1.9 1.8 0.9 0.9 0.7 2.7 1.5 1.2 1.2 3.3 4.0M4 100 13.1 12.3 11.9 11.6 9.0 8.2 8.0 7.9 10.7 9.7 9.6 9.6 7.5 7.8200 8.9 7.3 6.8 6.4 4.8 4.0 3.8 3.6 6.3 5.0 4.8 4.7 5.3 6.0400 6.1 4.0 3.6 3.3 2.8 1.8 1.7 1.6 4.0 2.7 2.5 2.5 5.0 5.4800 4.4 2.4 2.1 1.9 1.9 1.0 0.9 0.8 2.8 1.7 1.5 1.4 3.3 4.0M5 100 18.4 18.3 17.7 17.4 13.9 13.1 12.8 12.6 15.5 14.8 14.5 14.6 9.6 9.7200 11.5 10.1 9.6 9.4 7.3 6.3 5.9 5.6 8.9 7.2 7.0 6.9 6.6 7.2400 7.6 5.2 4.7 4.4 3.8 2.6 2.5 2.2 5.0 3.5 3.4 3.3 5.3 6.0800 4.9 2.9 2.5 2.2 2.1 1.3 1.1 1.1 3.1 2.0 1.8 1.7 3.7 4.4M6 100 10.8 10.0 9.4 8.9 6.8 5.9 5.7 5.6 8.2 6.9 6.8 6.8 6.8 6.9200 8.3 6.4 5.8 5.4 4.3 3.3 3.1 2.9 5.7 4.4 4.2 4.1 5.2 5.6400 5.9 3.8 3.4 3.1 2.9 1.8 1.7 1.6 3.7 2.7 2.4 2.3 4.7 5.2800 4.5 2.4 2.1 1.9 1.9 1.0 1.0 0.9 2.7 1.6 1.4 1.4 3.4 3.8M7 100 12.8 11.4 10.8 10.5 7.7 6.9 6.8 6.5 9.3 8.3 8.1 7.9 6.7 6.8200 9.6 7.6 7.1 6.7 4.8 4.1 3.9 3.7 6.1 5.0 4.8 4.8 5.3 5.5400 6.4 4.2 3.6 3.4 2.9 1.8 1.5 1.5 3.8 2.6 2.5 2.5 4.4 4.8800 5.0 3.0 2.4 2.1 1.8 1.2 1.1 1.0 2.9 1.6 1.5 1.4 3.4 3.9M8 100 9.3 8.5 8.0 7.8 6.7 5.9 5.7 5.6 7.9 6.8 6.7 6.6 6.6 7.1200 6.9 5.6 5.2 4.9 4.1 3.4 3.2 3.1 5.1 3.9 3.8 3.7 5.1 5.5400 5.4 3.5 3.1 2.8 2.5 1.5 1.4 1.3 3.9 2.4 2.1 2.1 4.8 5.1800 4.2 2.4 2.2 1.9 1.8 1.0 0.9 0.8 2.8 1.5 1.4 1.3 3.7 4.1M9 100 36.0 35.7 35.1 34.8 28.0 26.9 26.6 26.1 31.1 29.9 29.6 29.6 15.2 13.0200 28.8 26.3 25.3 24.7 18.6 15.7 15.2 15.0 22.6 19.6 19.1 18.8 11.7 10.4400 20.5 17.0 16.0 15.3 11.0 8.6 8.2 7.8 14.4 11.5 11.0 10.7 9.6 8.7800 14.8 10.3 9.4 8.8 5.5 3.7 3.4 3.1 9.0 5.8 5.4 5.1 7.5 7.2M10 100 8.9 7.9 7.4 7.1 5.5 5.0 4.8 4.8 6.6 5.6 5.6 5.6 6.4 6.5200 7.8 5.6 5.1 4.9 4.2 3.2 3.1 3.0 5.1 4.3 4.2 4.0 5.0 5.4400 5.0 3.3 2.9 2.6 2.4 1.5 1.4 1.3 3.5 2.2 2.0 2.0 5.0 5.2800 4.1 2.3 2.1 1.9 1.5 0.9 0.8 0.7 2.4 1.3 1.2 1.1 3.5 3.6 that, because of the factor ( k/m ) − η , the test based on E m for instance may again becomemore powerful than the procedure based on S m for k (cid:63) extremely large. Nevertheless, even ifwe consider the least favorable setting ( η = 0 . t (cid:55)→ t − η andthe fact that the weighting used in the deﬁnition of E m penalizes in some sense late changesas explained in the discussion below (2.5).From the second row of plots of Figure 4, we see that the previous conclusions seem toremain qualitatively the same when model M4 is used although, unsurprisingly, the strongerserial dependence gives the impression that the values of k (cid:63) are smaller.Figure 5 reports the rejection percentages of H in (1.1) for the same experiment but for18 .0 0.2 0.4 0.6 0.8 1.0 d R e j e c t i on pe r c en t age M1, k* = 100 0.0 0.2 0.4 0.6 0.8 1.0 d R e j e c t i on pe r c en t age M1, k* = 600S m h = m h = m h = m Q m d R e j e c t i on pe r c en t age M1, k* = 1100 0.0 0.2 0.4 0.6 0.8 1.0 d R e j e c t i on pe r c en t age M1, k* = 51000.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 200 0.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 700 0.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 1200 0.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 5200

Fig 4 . Rejection percentages of H in (1.1) for the procedure based on S m with γ = 0 and η ∈{ . , . , . , . } , as well as for the procedures based on E m and Q m with γ = 0 estimated from 1000samples of size n = m + 7000 from model M1 with m = 100 or M4 with m = 200 such that, for each sample,a positive oﬀset of δ was added to all observations after position k (cid:63) . the procedures based on R m , S m , T m with γ = 0 and η = 0 . E m and Q m with γ = 0. As one can see, among the three studied procedures, thosebased on R m and T m seem more powerful than the one based S m when the change occursat the beginning of the monitoring (which is in accordance with the fact that the procedurebased on S m is the most conservative as can be seen from Table 2), while there seems to bevery little diﬀerence between the three tests when the change occurs later.The increase in power resulting from taking γ equal to its largest meaningful value isillustrated in Figure 6. As expected, the improvement is visible only when changes occur atthe beginning of the monitoring. In practice, we suggest to increase the value of γ only whenit is believed that the size m of the learning sample permits a reasonably accurate estimationof the long-run variance σ .Finally, we report the results of an experiment involving a longer monitoring period withlate changes. Table 3 provides the percentages of rejection of H in (1.1) for the proceduresbased on R m , S m , T m with γ = 0 and η ∈ { . , . , . , . , . } estimated from 2000samples of size n = 20000 from model M1 with m = 100 such that, for each sample, apositive oﬀset of δ = 0 . k (cid:63) = 15000. Theseresults highlight again the role of η and its inﬂuence on power through the factor ( k/m ) − η .Notice that the corresponding rejection percentages of the procedures based E m and Q m areless than 1%.Taking into account all the empirical results summarized in this section, we recommend toset η to 0.005 or 0.001 and use either the procedure based on T m or the procedure based on19 .0 0.2 0.4 0.6 0.8 1.0 d R e j e c t i on pe r c en t age M1, k* = 100 0.0 0.2 0.4 0.6 0.8 1.0 d R e j e c t i on pe r c en t age M1, k* = 600R m S m T m E m Q m d R e j e c t i on pe r c en t age M1, k* = 1100 0.0 0.2 0.4 0.6 0.8 1.0 d R e j e c t i on pe r c en t age M1, k* = 51000.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 200 0.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 700 0.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 1200 0.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 5200

Fig 5 . Rejection percentages of H in (1.1) for the procedures based on R m , S m , T m with γ = 0 and η = 0 . , as well as for the procedures based on E m and Q m with γ = 0 estimated from 1000 samples ofsize n = m + 7000 from model M1 with m = 100 or M4 with m = 200 such that, for each sample, a positiveoﬀset of δ was added to all observations after position k (cid:63) . d R e j e c t i on pe r c en t age M1, k* = 100 0.0 0.2 0.4 0.6 d R e j e c t i on pe r c en t age M1, k* = 1100R m with g = m with g = m with g = m with g = d R e j e c t i on pe r c en t age M4, k* = 200 0.0 0.5 1.0 1.5 d R e j e c t i on pe r c en t age M4, k* = 1200

Fig 6 . For η = 0 . , rejection percentages of H in (1.1) for the procedures based on R m with γ = 0 , R m with γ = 0 . , S m with γ = 0 . and T m with γ = 0 . estimated from 1000 samples of size n = m + 7000 from model M1 with m = 100 or M4 with m = 200 such that, for each sample, a positive oﬀset of δ wasadded to all observations after position k (cid:63) . S m as they are more conservative than the monitoring scheme based on R m while almost aspowerful (except when changes occur at the beginning of the monitoring). If it is believed thatthe size m of the learning sample permits a reasonably accurate estimation of the long-runvariance σ , to improve the behavior of the procedures at the beginning of the monitoring,one can additionally set γ to 0.45 for T m and 0.85 for S m .20 able 3 Percentages of rejection of H in (1.1) for the procedures based on R m , S m , T m with γ = 0 and η ∈ { . , . , . , . , . } estimated from 2000 samples of size n = 20000 from model M1 with m = 100 such that, for each sample, a positive oﬀset of δ = 0 . was added to all observations after position k (cid:63) = 15000 . The corresponding rejection percentages of the procedures based on E m and Q m are 0.7 and0.7, respectively. η = 0 . η = 0 . η = 0 . η = 0 . η = 0 . R m S m T m

6. Extensions to parameters whose estimators exhibit a mean-like behavior

In their seminal work, G¨osmann, Kley and Dette (2020) actually considered monitoringschemes sensitive to changes in time series parameters whose estimators exhibit an asymp-totic mean-like behavior. The aim of this section is to brieﬂy demonstrate that the sametype of extension is possible for the sequential procedures studied in this work. For thesake of keeping the notation simple, we restrict our discussion to univariate time series andunivariate parameters.Let F be a univariate distribution function (d.f.) and let θ = θ ( F ) be a univariate parame-ter of F (such as the expectation, the variance, etc). Let F j : k be the empirical d.f. computedfrom the stretch X j , . . . , X k of available observations. More formally, for any integers j, k ≥ x ∈ R , let F j : k ( x ) =  k − j + 1 k (cid:88) i = j ( X i ≤ x ) , if j ≤ k, , otherwise , and let θ j : k = θ ( F j : k ) be the corresponding plug-in estimator of θ computed from the stretch X j , . . . , X k . Natural extensions of the detectors R m , S m and T m deﬁned in (2.4), (2.9)and (2.10), respectively, for monitoring changes in the parameter θ are then given, for any k ≥ m + 1, by R θm ( k ) = max m ≤ j ≤ k − j ( k − j ) m / | θ j − θ j +1: k | , (6.1) S θm ( k ) = 1 m k − (cid:88) j = m j ( k − j ) m / | θ j − θ j +1: k | , (6.2) T θm ( k ) = (cid:118)(cid:117)(cid:117)(cid:116) m k − (cid:88) j = m (cid:26) j ( k − j ) m / ( θ j − θ j +1: k ) (cid:27) . (6.3)Furthermore, assuming it exists, let IF ( x, F, θ ) = lim ε ↓ θ { (1 − ε ) F + εδ x } − θ ( F ) ε θ and F at x ∈ R , where δ x ( · ) = ( x ≤ · ) is thed.f. of the Dirac measure at x . To be able to study the asymptotic validity of monitoringschemes based on R θm in (6.1), S θm in (6.2) and T θm in (6.3), we follow G¨osmann, Kley andDette (2020) and focus on parameters θ that admit an asymptotic linearization in terms ofthe inﬂuence function, that is, such that θ j : k − θ = θ ( F j : k ) − θ ( F ) = 1 k − j + 1 k (cid:88) i = j IF ( X i , F, θ ) + R j,k , (6.4)where the remainders R j,k are asymptotically negligible in the sense of the following condi-tion. Condition 6.1.

The remainders in (6.4) satisfy k − / max ≤ i

1, let ¯ IF j : k =  k − j + 1 k (cid:88) i = j IF ( X i , F, θ ) , if j ≤ k, , otherwise . If the random variables IF ( X , F, θ ) , . . . , IF ( X m , F, θ ) , IF ( X m +1 , F, θ ) , . . . were observable,one could naturally consider analogues of the detectors R m , S m and T m in (2.4), (2.9)and (2.10), respectively, deﬁned, for k ≥ m + 1, by R IFm ( k ) = max m ≤ j ≤ k − j ( k − j ) m / | ¯ IF j − ¯ IF j +1: k | ,S IFm ( k ) = 1 m k − (cid:88) j = m j ( k − j ) m / | ¯ IF j − ¯ IF j +1: k | ,T IFm ( k ) = (cid:118)(cid:117)(cid:117)(cid:116) m k − (cid:88) j = m (cid:26) j ( k − j ) m / ( ¯ IF j − ¯ IF j +1: k ) (cid:27) . Upon assuming that Condition 3.1 holds for the sequence (cid:0) IF ( X i , F, θ ) (cid:1) i ∈ Z , one immediatelyobtains an analogue of Theorem 3.3 for the detectors R IFm , S IFm and T IFm . The next result,proven in Appendix C, shows that, if Condition 6.1 is additionally assumed, an analogueof Theorem 3.3 also holds for the computable detectors R θm in (6.1), S θm in (6.2) and T θm in (6.3). 22 roposition 6.2. Under H in (1.1) and Condition 6.1, for any η > , (cid:15) > and γ ≥ , sup m +1 ≤ k ≤∞ { w R ( k/m ) } − | R θm ( k ) − R IFm ( k ) | = o P (1) , sup m +1 ≤ k ≤∞ { w S ( k/m ) } − | S θm ( k ) − S IFm ( k ) | = o P (1) , sup m +1 ≤ k ≤∞ { w T ( k/m ) } − | T θm ( k ) − T IFm ( k ) | = o P (1) , where the threshold functions w R , w S and w T are deﬁned in (2.7) , (2.11) and (2.12) , respec-tively.Remark . As mentioned in G¨osmann, Kley and Dette (2020), the veriﬁcation of Condi-tion 6.1 is highly non-trivial. When θ is the variance or a quantile of F , it was shown tohold in probability in Section 4 of Dette and G¨osmann (2019). In the multivariate param-eter and time series case, it was veriﬁed in G¨osmann, Kley and Dette (2020, Section 3.2)for a time-dependent linear model. In a related way, note that an inspection of the proofof Proposition 6.2 reveals that Condition 6.1 could actually be replaced by the requirementthat the remainders in (6.4) satisfysup m +1 ≤ k< ∞ k − / max ≤ i

7. Data example

As a small data example, we consider a ﬁctitious scenario consisting of monitoring globaltemperature anomalies for changes in the mean. Speciﬁcally, we use the time series of monthlyglobal (land and ocean) temperature anomalies available at which covers the period January 1880 – May 2020. The time series in degrees Celsius isrepresented in the left panel of Figure 7. The solid vertical line marks the beginning of theﬁctitious monitoring and corresponds to September 1921 (and thus to a learning sample ofsize m = 500). Note that this monitoring scenario is indeed fully ﬁctitious, among otherthings, because temperature anomalies are computed with respect to the 20th century av-erage (see, e.g., Smith et al., 2008) and, therefore, the corresponding time series would nothave been available until the beginning of the current century.The solid curve in the right panel of Figure 7 displays the evolution of the normalizeddetector σ − m sup m +1 ≤ k< ∞ { w T ( k/m ) } − T m ( k ) with η = 0 .

001 and γ = 0 .

45 against k ≥ m +1.The solid (resp. dashed) horizontal line represents the estimated 0.95-quantile (resp. 0.99-quantile) of the corresponding limiting distribution in Theorem 3.3. The solid vertical linerepresents the date at which the normalized detector exceeded the aforementioned 0.95-quantile and corresponds to November 1939. Note that to estimate the point of changecorresponding to an exceedence at position k , it seems natural to useargmax m ≤ j ≤ k − j ( k − j ) m / | ¯ X j − ¯ X j +1: k | + 1 . The date of change corresponding to an exceedence in November 1939 is thus estimated tobe April 1925 and is marked by a dashed vertical line in the right panel of Figure 7.23 . . . . G l oba l t e m pe r a t u r e ano m a li e s P r o c edu r e ba s ed on T m Fig 7 . Left: monthly global (land and ocean) temperature anomalies in degrees Celsius for the pe-riod January 1880 – May 2020. The solid vertical line corresponds to September 1921 and marksthe beginning of the ﬁctitious monitoring. Right: the solid curve displays the normalized detector σ − m sup m +1 ≤ k< ∞ { w T ( k/m ) } − T m ( k ) with η = 0 . and γ = 0 . against k ≥ m + 1 . The solid (resp.dashed) horizontal line represents the estimated 0.95-quantile (resp. 0.99-quantile) of the corresponding lim-iting distribution in Theorem 3.3. The solid (resp. dashed) vertical line corresponds to November 1939, thedate of exceedence (resp. April 1925, the estimated date of change).

8. Concluding remarks

This work has demonstrated that it is relevant to deﬁne open-end sequential change-pointtests such that the underlying detectors coincide with (or are related to) the retrospectiveCUSUM statistic at each monitoring step. From a practical perspective, when focusing onchanges in the mean, such an approach was observed to lead to an increase in power withrespect to existing competitors except when changes occur at the very beginning of the mon-itoring. Given the potentially very long term nature of open-end monitoring (by deﬁnition),it can be argued that having extra power for all but very early changes is strongly desir-able. The price to pay for the additional power is a more complicated theoretical settingand the fact that quantiles of the underlying limiting distributions required to carry out thesequential tests in practice are harder to estimate. As far as monitoring for changes in otherparameters than the mean is concerned, extensions are possible as long as the underlyingestimators exhibit a mean-like asymptotic behavior as considered in G¨osmann, Kley andDette (2020).

Acknowledgments

The authors would like to thank Ed Perkins for helpful discussions. MH is supported byFuture Fellowship FT160100166 from the Australian Research Council.

Appendix A: Proofs of the results under H The three following lemmas are used in the proof of Theorem 3.3.24 emma A.1.

Assume that Condition 3.1 holds and, for any k ≥ m + 1 , let ˜ R m ( k ) = σm − / max m ≤ j ≤ k − (cid:12)(cid:12)(cid:12)(cid:12) km { W m, ( m ) + W m, ( j − m ) } − jm { W m, ( m ) + W m, ( k − m ) } (cid:12)(cid:12)(cid:12)(cid:12) , (A.1)˜ S m ( k ) = σm − / m k − (cid:88) j = m (cid:12)(cid:12)(cid:12)(cid:12) km { W m, ( m ) + W m, ( j − m ) } − jm { W m, ( m ) + W m, ( k − m ) } (cid:12)(cid:12)(cid:12)(cid:12) , (A.2)˜ T m ( k ) = σm − / (cid:118)(cid:117)(cid:117)(cid:116) m k − (cid:88) j = m (cid:20) km { W m, ( m ) + W m, ( j − m ) } − jm { W m, ( m ) + W m, ( k − m ) } (cid:21) . (A.3) Then, for any ﬁxed η > , (cid:15) > and γ ≥ , sup m +1 ≤ k ≤∞ { w R ( k/m ) } − | R m ( k ) − ˜ R m ( k ) | = o P (1) , (A.4)sup m +1 ≤ k ≤∞ { w S ( k/m ) } − | S m ( k ) − ˜ S m ( k ) | = o P (1) , (A.5)sup m +1 ≤ k ≤∞ { w T ( k/m ) } − | T m ( k ) − ˜ T m ( k ) | = o P (1) , (A.6) where w R , w S and w T are deﬁned in (2.7) , (2.11) and (2.12) , respectively, and R m , S m and T m are deﬁned in (2.4) , (2.9) and (2.10) , respectively.Proof. Let us ﬁrst show (A.4). From (2.4) and (A.1), using the reverse triangle inequality forthe maximum norm, we have that, for any k ≥ m +1, { w R ( k/m ) } − | R m ( k ) − ˜ R m ( k ) | ≤ U m ( k ),where U m ( k ) = (cid:15) − m − / (cid:18) km (cid:19) − / − η max m ≤ j ≤ k − (cid:12)(cid:12)(cid:12)(cid:12) j ( k − j ) m { ¯ X j − ¯ X j +1: k }− km σ { W m, ( m ) + W m, ( j − m ) } + jm σ { W m, ( m ) + W m, ( k − m ) } (cid:12)(cid:12)(cid:12)(cid:12) , (A.7)using the fact that t (cid:55)→ /w γ ( t ) is bounded by (cid:15) − . From (2.3) and under Condition 3.1, we25btain that, for any k ≥ m + 1 and j ∈ { m, . . . , k − } , j ( k − j ) { ¯ X j − ¯ X j +1: k } =( k − j ) j (cid:88) i =1 { X i − E ( X ) } − j k (cid:88) i = j +1 { X i − E ( X ) } = k j (cid:88) i =1 { X i − E ( X ) } − j k (cid:88) i =1 { X i − E ( X ) } = k m (cid:88) i =1 { X i − E ( X ) } + k j (cid:88) i = m +1 { X i − E ( X ) }− j m (cid:88) i =1 { X i − E ( X ) } − j k (cid:88) i = m +1 { X i − E ( X ) } . Hence, by the triangle inequality, we have thatsup m +1 ≤ k ≤∞ { w R ( k/m ) } − | R m ( k ) − ˜ R m ( k ) | ≤ sup m +1 ≤ k ≤∞ U m ( k ) ≤ (cid:15) − ( I m + I (cid:48) m + J m + J (cid:48) m ) , (A.8)where I m = m − / sup m +1 ≤ k< ∞ (cid:18) km (cid:19) − / − η max m ≤ j ≤ k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) i =1 { X i − E ( X ) } − σW m, ( m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,I (cid:48) m = m − / sup m +1 ≤ k< ∞ (cid:18) km (cid:19) − / − η max m ≤ j ≤ k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) j (cid:88) i = m +1 { X i − E ( X ) } − σW m, ( j − m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,J m = m − / sup m +1 ≤ k< ∞ (cid:18) km (cid:19) − / − η max m ≤ j ≤ k − jm (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) i =1 { X i − E ( X ) } − σW m, ( m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (A.9) J (cid:48) m = m − / sup m +1 ≤ k< ∞ (cid:18) km (cid:19) − / − η max m ≤ j ≤ k − jm (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:88) i = m +1 { X i − E ( X ) } − σW m, ( k − m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (A.10)To prove (A.4), it remains to show that I m , I (cid:48) m , J m and J (cid:48) m converge to zero in probability.For any ξ ∈ (0 , / I m = 1 m ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) i =1 { X i − E ( X ) } − W m, ( m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ξ + η sup m +1 ≤ k< ∞ k − / − η , which, under Condition 3.1, is smaller than a term bounded in probability times m ξ − / ,which converges to zero because ξ < /

2. Similarly, I (cid:48) m ≤ m η sup m +1 ≤ k< ∞ k − / − η max m ≤ j ≤ k − ( j − m ) ξ × sup m +1 ≤ (cid:96)< ∞ (cid:96) − m ) ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:96) (cid:88) i = m +1 { X i − E ( X ) } − σW m, ( (cid:96) − m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . m η sup m +1 ≤ k< ∞ k − / − η max m ≤ j ≤ k − ( j − m ) ξ ≤ m η sup m +1 ≤ k< ∞ k − / − η + ξ ≤ m ξ − / → . The fact that J m in (A.9) and J (cid:48) m in (A.10) converge to zero in probability can be checkedby proceeding similarly. Hence, we have shown (A.4). It remains to prove (A.5) and (A.6).From (2.9) and (A.2), using the reverse triangle inequality for the L norm, it can beveriﬁed that, for any k ≥ m + 1, { w S ( k/m ) } − | S m ( k ) − ˜ S m ( k ) | ≤ (cid:15) − m − / (cid:18) km (cid:19) − / − η m k − (cid:88) j = m (cid:12)(cid:12)(cid:12)(cid:12) j ( k − j ) m { ¯ X j − ¯ X j +1: k }− km σ { W m, ( m ) + W m, ( j − m ) } + jm σ { W m, ( m ) + W m, ( k − m ) } (cid:12)(cid:12)(cid:12)(cid:12) ≤ k − mk U m ( k ) , where U m ( k ) is deﬁned in (A.7). Similarly, from (2.10) and (A.3), using the reverse triangleinequality for the Euclidean norm, { w T ( k/m ) } − | T m ( k ) − ˜ T m ( k ) | ≤ (cid:15) − m − / (cid:18) km (cid:19) − − η (cid:32) m k − (cid:88) j = m (cid:20) j ( k − j ) m { ¯ X j − ¯ X j +1: k }− km σ { W m, ( m ) + W m, ( j − m ) } + jm σ { W m, ( m ) + W m, ( k − m ) } (cid:21) (cid:33) / ≤ (cid:114) k − mk U m ( k ) . The fact that (A.5) and (A.6) hold then immediately follows from the two previous dis-plays, (A.8) and the fact that I m , I (cid:48) m , J m and J (cid:48) m converge to zero in probability. Lemma A.2.

For any ﬁxed η > , the random function R η deﬁned by R η ( s, t ) = 1 t / η | ( t − s ) W (1) + tW ( s − − sW ( t − | , ≤ s ≤ t < ∞ , (A.11) where W and W are independent standard Brownian motions, is almost surely bounded anduniformly continuous.Proof. Fix η > R η is almost surely bounded on A = { ( s, t ) ∈ [1 , ∞ ) : s ≤ t } . By the triangle inequality, sup ≤ s ≤ t< ∞ R η ( s, t ) is smaller thansup ≤ s ≤ t< ∞ t − / − η ( t − s ) | W (1) | + sup ≤ s ≤ t< ∞ t − / − η | W ( s − | + sup ≤ s ≤ t< ∞ t − / − η s | W ( t − |≤ | W (1) | sup ≤ s ≤ t< ∞ t − / − η + sup ≤ s< ∞ s − / − η | W ( s − | + sup ≤ t< ∞ t − / − η | W ( t − |≤ | W (1) | + 2 sup ≤ t< ∞ t − / − η | W ( t − | .

27t thus remains to verify that sup ≤ t< ∞ t − / − η | W ( t − | is almost surely bounded. Fromthe law of the iterated logarithm for Brownian motion, we have that, almost surely,lim sup t →∞ t − / − η | W ( t − | = lim T →∞ sup T ≤ t< ∞ t − / − η | W ( t − | = lim T →∞ sup T ≤ t< ∞ ( t + 1) − / − η | W ( t ) |≤ lim T →∞ sup T ≤ t< ∞ (2 t log log t ) − / | W ( t ) | × lim T →∞ sup T ≤ t< ∞ (2 t log log t ) / ( t + 1) − / − η = 1 × lim t →∞ (2 t log log t ) / ( t + 1) − / − η = 0 . Hence, on one hand, there exists T ∈ (1 , ∞ ) such that sup T ≤ t< ∞ t − / − η | W ( t − | ≤ t (cid:55)→ t − / − η | W ( t − | is a process whose samplepaths are almost surely continuous, sup ≤ t ≤ T < t − / − η | W ( t − | < ∞ with probability one.It remains to prove that R η is almost surely uniformly continuous on A = { ( s, t ) ∈ [1 , ∞ ) : s ≤ t } . Let ε >

0. We must show that there exists some (random) δ > s, t ) , ( s (cid:48) , t (cid:48) ) ∈ A such that d (( s, t ) , ( s (cid:48) , t (cid:48) )) < δ , | R η ( s, t ) − R η ( s (cid:48) , t (cid:48) ) | ≤ ε .By the law of the iterated logarithm, there exists T ε ≥ u ≥ T ε | W ( u ) | u / η ≤ ε , (A.12)and since W (1) is almost surely ﬁnite we can choose T ε to also satisfy | W (1) | ( T ε ) / η ≤ ε . (A.13)Since [1 , T ε + 3] is compact, there exists T (cid:48) ε > T ε + 5 (also random) such thatsup u ∈ [1 ,T ε +3] | W ( u ) | ( T (cid:48) ε ) / η < ε . (A.14)We will consider the following three subsets of A whose union is A : A = { ( s, t ) ∈ A : s ≥ T ε + 2 } ,A = { ( s, t ) ∈ A : t ≤ T (cid:48) ε + 1 } ,A = { ( s, t ) ∈ A : s ≤ T ε + 2 , t ≥ T (cid:48) ε + 1 } , and ﬁnd a single δ ≤ s, t ) is in.Let ( s, t ) ∈ A , and let ( s (cid:48) , t (cid:48) ) be such that d (( s, t ) , ( s (cid:48) , t (cid:48) )) ≤ s (cid:48) ≥ T ε + 1). Then | R η ( s, t ) − R η ( s (cid:48) , t (cid:48) ) | ≤ t / η | ( t − s ) W (1) + tW ( s − − sW ( t − | (A.15)+ 1 t (cid:48) / η | ( t (cid:48) − s (cid:48) ) W (1) + t (cid:48) W ( s (cid:48) − − s (cid:48) W ( t (cid:48) − | (A.16) ≤ | W (1) | ( T ε ) / η + | W ( s − | ( s − / η + | W ( s (cid:48) − | ( s (cid:48) − / η + | W ( t − | ( t − / η + | W ( t (cid:48) − | ( t (cid:48) − / η . By (A.12) and (A.13), this is at most 6 ε/

8. 28et ( s, t ) ∈ A and ( s (cid:48) , t (cid:48) ) be such that d (( s, t ) , ( s (cid:48) , t (cid:48) )) ≤ s (cid:48) ≤ T ε + 3). Then,using (A.15) and (A.16) again, we get that | R η ( s, t ) − R η ( s (cid:48) , t (cid:48) ) | ≤ | W (1) | ( T (cid:48) ε ) / η + | W ( s − | ( T (cid:48) ε ) / η + | W ( s (cid:48) − | ( T (cid:48) ε ) / η + | W ( t − | ( t − / η + | W ( t (cid:48) − | ( t (cid:48) − / η . Using (A.12), (A.14) and (A.13), we get that this is at most 6 ε/ s, t ) ∈ A . Since the set A (cid:48) = { ( s, t ) ∈ A : t ≤ T (cid:48) ε + 2 } (notethat A (cid:48) ⊃ A ) is compact and R η is continuous, it follows that R η is uniformly continuouson A (cid:48) , so there exists some (random) δ > s, t ) , ( s (cid:48) , t (cid:48) ) ∈ A (cid:48) and d (( s, t ) , ( s (cid:48) , t (cid:48) )) ≤ δ , then we have | R η ( s, t ) − R η ( s (cid:48) , t (cid:48) ) | ≤ ε/

2. Note that if ( s, t ) ∈ A and d (( s, t ) , ( s (cid:48) , t (cid:48) )) ≤

1, then ( s, t ) and ( s (cid:48) , t (cid:48) ) are both in A (cid:48) .Let δ = min( δ , d (( s, t ) , ( s (cid:48) , t (cid:48) )) ≤ δ , we have that ( s, t ) is in (at least)one of A , A , A , so by the above | R η ( s, t ) − R η ( s (cid:48) , t (cid:48) ) | < ε . Lemma A.3.

For any ﬁxed η > , (cid:15) > and γ ≥ , sup m +1 ≤ k< ∞ { w R ( k/m ) } − ˜ R m ( k ) (cid:32) σ sup ≤ s ≤ t< ∞ { w γ ( t ) } − R η ( s, t ) , (A.17)sup m +1 ≤ k< ∞ { w S ( k/m ) } − ˜ S m ( k ) (cid:32) σ sup ≤ t< ∞ { w γ ( t ) } − t − (cid:90) t | R η ( s, t ) | d s, (A.18)sup m +1 ≤ k< ∞ { w T ( k/m ) } − ˜ T m ( k ) (cid:32) σ sup ≤ t< ∞ { w γ ( t ) } − t − / (cid:115)(cid:90) t { R η ( s, t ) } d s, (A.19) where w R , w S and w T are deﬁned in (2.7) , (2.11) and (2.12) , respectively, ˜ R m , ˜ S m and ˜ T m are deﬁned in (A.1) , (A.2) and (A.3) , respectively, w γ is deﬁned in (2.8) and the randomfunction R η is deﬁned in (A.11) . In addition, all the limiting random variables are almostsurely ﬁnite.Proof. Fix η > (cid:15) > γ ≥ W and W be independent standard Brownianmotions. Then, from (A.1), for any m ∈ N , sup m +1 ≤ k< ∞ { w R ( k/m ) } − ˜ R m ( k ) is equal indistribution to σm − / sup m +1 ≤ k< ∞ { w R ( k/m ) } − max m ≤ j ≤ k − (cid:12)(cid:12)(cid:12)(cid:12) km { W ( m ) + W ( j − m ) }− jm { W ( m ) + W ( k − m ) } (cid:12)(cid:12)(cid:12)(cid:12) . Next, notice that, for any k ≥ m + 1 and any j ∈ { m, . . . , k − } , there exists 1 ≤ s ≤ t suchthat k = (cid:98) mt (cid:99) and j = (cid:98) ms (cid:99) . Hence, the previous expression can be rewritten as σm − / sup t ∈ [1 , ∞ ) { w R ( (cid:98) mt (cid:99) /m ) } − sup s ∈ [1 ,t ] (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) mt (cid:99) m { W ( m ) + W ( (cid:98) ms (cid:99) − m ) }− (cid:98) ms (cid:99) m { W ( m ) + W ( (cid:98) mt (cid:99) − m ) } (cid:12)(cid:12)(cid:12)(cid:12) .

29y Brownian scaling, the latter is equal in distribution to σ sup ≤ s ≤ t< ∞ { w R ( (cid:98) mt (cid:99) /m ) } − (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) mt (cid:99) m (cid:26) W (1) + W (cid:18) (cid:98) ms (cid:99) m − (cid:19)(cid:27) − (cid:98) ms (cid:99) m (cid:26) W (1) + W (cid:18) (cid:98) mt (cid:99) m − (cid:19)(cid:27)(cid:12)(cid:12)(cid:12)(cid:12) , which, using (2.7), (2.8) and the function R η deﬁned in (A.11), can be expressed as σ sup ≤ s ≤ t< ∞ { w γ ( (cid:98) mt (cid:99) /m ) } − R η ( (cid:98) ms (cid:99) /m, (cid:98) mt (cid:99) /m ) . Using additionally the fact that, for any functions f, g , (cid:12)(cid:12) sup x | f ( x ) | − sup x | g ( x ) | (cid:12)(cid:12) ≤ sup x | f ( x ) − g ( x ) | , (A.20)we obtain that (cid:12)(cid:12)(cid:12) sup ≤ s ≤ t< ∞ { w γ ( (cid:98) mt (cid:99) /m ) } − R η ( (cid:98) ms (cid:99) /m, (cid:98) mt (cid:99) /m ) − sup ≤ s ≤ t< ∞ { w γ ( t ) } − R η ( s, t ) (cid:12)(cid:12)(cid:12) ≤ sup ≤ s ≤ t< ∞ (cid:12)(cid:12)(cid:12) { w γ ( (cid:98) mt (cid:99) /m ) } − R η ( (cid:98) ms (cid:99) /m, (cid:98) mt (cid:99) /m ) − { w γ ( t ) } − R η ( s, t ) (cid:12)(cid:12)(cid:12) ≤ sup ≤ s ≤ t< ∞ , ≤ s (cid:48)≤ t (cid:48) < ∞| s − s (cid:48)|≤ /m, | t − t (cid:48)|≤ /m (cid:12)(cid:12)(cid:12) { w γ ( t (cid:48) ) } − R η ( s (cid:48) , t (cid:48) ) − { w γ ( t ) } − R η ( s, t ) (cid:12)(cid:12)(cid:12) , (A.21)since sup t ∈ [1 , ∞ ) |(cid:98) mt (cid:99) /m − t | ≤ /m .From Lemma A.2, we know that R η is almost surely bounded and uniformly continuous on { ( s, t ) ∈ [1 , ∞ ) : s ≤ t } . Furthermore, the function t (cid:55)→ /w γ ( t ) being bounded, continuousand converging to zero as t → ∞ , it is also uniformly continuous on [1 , ∞ ). The latter factsimply that the function ( s, t ) (cid:55)→ { w γ ( t ) } − R η ( s, t ) is almost surely bounded and uniformlycontinuous on { ( s, t ) ∈ [1 , ∞ ) : s ≤ t } and, therefore, that (A.21) converges almost surelyto zero and, ﬁnally, that (A.17) holds with the limit being almost surely ﬁnite.Let us now show (A.18). First, notice that the limiting random variable is almost surelyﬁnite as an immediate consequence of the inequalitysup ≤ t< ∞ t − (cid:90) t | R η ( s, t ) | d s ≤ sup ≤ s ≤ t< ∞ R η ( s, t ) . Then, from (A.2), for any m ∈ N , sup m +1 ≤ k< ∞ { w S ( k/m ) } − ˜ S m ( k ) is equal in distribution to σm − / sup m +1 ≤ k< ∞ { w S ( k/m ) } − m k − (cid:88) j = m (cid:12)(cid:12)(cid:12)(cid:12) km { W ( m ) + W ( j − m ) } − jm { W ( m ) + W ( k − m ) } (cid:12)(cid:12)(cid:12)(cid:12) . σ sup m +1 ≤ k< ∞ { w S ( k/m ) } − m k − (cid:88) j = m (cid:12)(cid:12)(cid:12)(cid:12) km { W (1) + W ( j/m − } − jm { W (1) + W ( k/m − } (cid:12)(cid:12)(cid:12)(cid:12) = σ sup m +1 ≤(cid:98) mt (cid:99) < ∞ (cid:26) w S (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − × m (cid:98) mt (cid:99)− (cid:88) j = m (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) mt (cid:99) m (cid:26) W (1) + W (cid:18) jm − (cid:19)(cid:27) − jm (cid:26) W (1) + W (cid:18) (cid:98) mt (cid:99) m − (cid:19)(cid:27)(cid:12)(cid:12)(cid:12)(cid:12) = σ sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − / − η × (cid:90) t (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) mt (cid:99) m (cid:26) W (1) + W (cid:18) (cid:98) ms (cid:99) m − (cid:19)(cid:27) − (cid:98) ms (cid:99) m (cid:26) W (1) + W (cid:18) (cid:98) mt (cid:99) m − (cid:19)(cid:27)(cid:12)(cid:12)(cid:12)(cid:12) d s = σ sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − (cid:90) t R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19) d s, where the second equality follows from the fact that the integrand is zero on the interval[ (cid:98) mt (cid:99) /m, t ] since (cid:98) ms (cid:99) = (cid:98) mt (cid:99) for s ∈ [ (cid:98) mt (cid:99) /m, t ]. Then, from (A.20), we obtain that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − (cid:90) t R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19) d s − sup ≤ t< ∞ { w γ ( t ) } − t − (cid:90) t R η ( s, t )d s (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − (cid:90) t R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19) d s −{ w γ ( t ) } − t − (cid:90) t R η ( s, t )d s (cid:12)(cid:12)(cid:12)(cid:12) , which is smaller than I m + J m , where I m = sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) t R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19) d s − (cid:90) t R η ( s, t )d s (cid:12)(cid:12)(cid:12)(cid:12) ,J m = sup ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − − { w γ ( t ) } − t − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) t R η ( s, t )d s. To show (A.18), we shall verify that both I m and J m converge to zero almost surely. For I m ,we have I m ≤ sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − t × sup ≤ s ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12) R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19) − R η ( s, t ) (cid:12)(cid:12)(cid:12)(cid:12) . ≤ t< ∞ mt/ (cid:98) mt (cid:99) ≤ m ≥

2, the ﬁrst supremumon the right is bounded by 2 (cid:15) − . The second supremum converges to zero almost surely fromthe almost sure uniform continuity of R η on { ( s, t ) ∈ [1 , ∞ ) : s ≤ t } shown in Lemma A.2.For J m , we have J m ≤ sup ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − mt (cid:98) mt (cid:99) − { w γ ( t ) } − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) × sup ≤ s ≤ t< ∞ R η ( s, t ) . Since the second supremum on the right is almost surely bounded by Lemma A.2, it suﬃcesto verify that the ﬁrst supremum converges to zero. The latter is smaller thansup ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − mt (cid:98) mt (cid:99) − { w γ ( t ) } − mt (cid:98) mt (cid:99) + { w γ ( t ) } − mt (cid:98) mt (cid:99) − { w γ ( t ) } − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − − { w γ ( t ) } − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:15) − sup ≤ t< ∞ mt − (cid:98) mt (cid:99)(cid:98) mt (cid:99) . The ﬁrst term on the right-hand side converges to zero by the uniform continuity of thefunction t (cid:55)→ /w γ ( t ) on [1 , ∞ ). The second term converges to zero since it is smaller than (cid:15) − m − .It remains to prove (A.19). Notice ﬁrst that the limit in (A.19) is almost surely ﬁnite sincesup ≤ t< ∞ t − / (cid:115)(cid:90) t { R η ( s, t ) } d s ≤ sup ≤ s ≤ t< ∞ R η ( s, t ) . Then, starting from (A.3) and proceeding as previously, it can be veriﬁed that the randomvariable sup m +1 ≤ k< ∞ { w T ( k/m ) } − ˜ T m ( k ) is equal in distribution to σ sup ≤ t< ∞ (cid:26) w T (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:32) (cid:90) t (cid:34) (cid:98) mt (cid:99) m (cid:26) W (1) + W (cid:18) (cid:98) ms (cid:99) m − (cid:19)(cid:27) (cid:98) ms (cid:99) m (cid:26) W (1) + W (cid:18) (cid:98) mt (cid:99) m − (cid:19)(cid:27) (cid:35) d s (cid:33) / = σ sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − / (cid:115)(cid:90) t (cid:26) R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19)(cid:27) d s. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − / (cid:115)(cid:90) t (cid:26) R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19)(cid:27) d s − sup ≤ t< ∞ { w γ ( t ) } − t − / (cid:115)(cid:90) t { R η ( s, t ) } d s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − / (cid:115)(cid:90) t (cid:26) R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19)(cid:27) d s −{ w γ ( t ) } − t − / (cid:115)(cid:90) t { R η ( s, t ) } d s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , which is smaller than I (cid:48) m + J (cid:48) m , where I (cid:48) m = sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − / × (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:115)(cid:90) t (cid:26) R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19)(cid:27) d s − (cid:115)(cid:90) t { R η ( s, t ) } d s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,J (cid:48) m = sup ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − / − { w γ ( t ) } − t − / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:115)(cid:90) t { R η ( s, t ) } d s. Using Minkwoski’s inequality in the form |(cid:107) f (cid:107) − (cid:107) g (cid:107) | ≤ (cid:107) f − g (cid:107) , and using similar argu-ments as previously, I (cid:48) m ≤ sup ≤ t< ∞ (cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) (cid:98) mt (cid:99) m (cid:19) − / (cid:115)(cid:90) t (cid:26) R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19) − R η ( s, t ) (cid:27) d s ≤ (cid:15) − sup ≤ t< ∞ (cid:18) mt (cid:98) mt (cid:99) (cid:19) / × sup ≤ s ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12) R η (cid:18) (cid:98) ms (cid:99) m , (cid:98) mt (cid:99) m (cid:19) − R η ( s, t ) (cid:12)(cid:12)(cid:12)(cid:12) as → . Proceeding as for J m , J (cid:48) m as → ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − (cid:18) mt (cid:98) mt (cid:99) (cid:19) / − { w γ ( t ) } − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ ≤ t< ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:26) w γ (cid:18) (cid:98) mt (cid:99) m (cid:19)(cid:27) − − { w γ ( t ) } − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:15) − sup ≤ t< ∞ √ mt − (cid:112) (cid:98) mt (cid:99) (cid:112) (cid:98) mt (cid:99) as → t (cid:55)→ /w γ ( t ) and the fact that the argument ofthe second supremum on the right hand side is at most (cid:98) mt (cid:99) − .33 roof of Theorem 3.3. From Lemmas A.1 and A.3, we have that, for any ﬁxed η > (cid:15) > γ ≥ m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) (cid:32) σ sup ≤ s ≤ t< ∞ { w γ ( t ) } − R η ( s, t ) , sup m +1 ≤ k< ∞ { w S ( k/m ) } − S m ( k ) (cid:32) σ sup ≤ t< ∞ { w γ ( t ) } − t − (cid:90) t | R η ( s, t ) | d s, sup m +1 ≤ k< ∞ { w T ( k/m ) } − T m ( k ) (cid:32) σ sup ≤ t< ∞ { w γ ( t ) } − t − / (cid:115)(cid:90) t { R η ( s, t ) } d s, where w γ is deﬁned in (2.8) and the random function R η is deﬁned in (A.11).It thus remains to verify that the expressions of the limiting random variables can besimpliﬁed to coincide in distribution with those given in the statement of the theorem. Thelatter is merely a consequence of the fact that the random functions U ( s, t ) = ( t − s ) W (1) + tW ( s − − sW ( t − , ≤ s ≤ t, and V ( s, t ) = tW ( s ) − sW ( t ) , ≤ s ≤ t, are equal in distribution. Since U and V are centered Gaussian processes whose samplepaths are continuous almost surely, the equality in distribution is a direct consequence ofthe equality of their covariance functions. Indeed, for any 1 ≤ s ≤ t and 1 ≤ s (cid:48) ≤ t (cid:48) , it is anexercise to verify by direct computation thatCov { U ( s, t ) , U ( s (cid:48) , t (cid:48) ) } = Cov { V ( s, t ) , V ( s (cid:48) , t (cid:48) ) } . Proof of Proposition 3.4.

We have for all k ≥ ≤ s ≤ t< ∞ t / | tW ( s ) − sW ( t ) | ≥ k ) / | k W (2 k − ) − k − W (2 k ) | = 2 k − k/ | W (2 k − ) − { W (2 k ) − W (2 k − ) }| = 12 k/ | W (2 k − ) − { W (2 k ) − W (2 k − ) }| . Consider an arbitrary ﬁxed

M > D k = {| W (2 k − ) − { W (2 k ) − W (2 k − ) }| ≥ M k/ } . It is suﬃcient to show that P ( (cid:83) ∞ k =1 D k ) = 1 (since this shows that the supremumis at least M with probability 1), or equivalently, P (cid:0)(cid:84) ∞ k =1 D C k (cid:1) = 0. Now, P (cid:32) ∞ (cid:92) k =1 D C k (cid:33) = lim r →∞ P (cid:32) r (cid:92) k =1 D C k (cid:33) = lim r →∞ P ( D C1 ) r (cid:89) k =2 P (cid:32) D C k (cid:12)(cid:12)(cid:12) k − (cid:92) j =1 D C j (cid:33) = P ( D C1 ) lim r →∞ r (cid:89) k =2 (cid:40) − P (cid:32) D k (cid:12)(cid:12)(cid:12) k − (cid:92) j =1 D C j (cid:33)(cid:41) ,

34o that it is enough to show that there exists δ M > P (cid:16) D k | (cid:84) k − j =1 D C j (cid:17) ≥ δ M for all k ≥

2. The latter holds with δ M = P ( | Z | ≥ / M ) / >

0, where Z is a standard normal ran-dom variable, since for all k ≥

2, with A k = { W (2 k ) − W (2 k − ) and W (2 k − ) have opposite sign } , P (cid:32) D k (cid:12)(cid:12)(cid:12) k − (cid:92) j =1 D C j (cid:33) ≥ P (cid:16) A k , | W (2 k ) − W (2 k − ) | ≥ M k/ (cid:17) = P (cid:16) | W (2 k ) − W (2 k − ) | ≥ M k/ (cid:17) P ( A k )= P ( | Z | ≥ / M ) / . In the above, we have used the fact that the sign of the increment and its magnitude areindependent of the past and each other, and that the increment is Gaussian with mean 0and variance 2 k − . Appendix B: Proofs of the results under alternativesProof of Theorem 3.10.

Let us ﬁrst prove the claim when ( iv ) in Condition 3.7 holds.Recall the deﬁnition of the function w γ in (2.8) and notice that w γ ( t ) ≤ t ∈ [1 , ∞ ), (cid:15) ≥ γ ≥

0. Thus, for all η ≥ (cid:15) ≥ γ ≥ m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) ≥ sup m +1 ≤ k< ∞ ( k/m ) − / − η R m ( k ) k = k (cid:63)m + (cid:98) cm (cid:99) ≥ (cid:18) k (cid:63)m + (cid:98) cm (cid:99) m (cid:19) − / − η max m ≤ j ≤ k (cid:63)m + (cid:98) cm (cid:99)− j ( k (cid:63)m + (cid:98) cm (cid:99) − j ) m / (cid:12)(cid:12) ¯ X ( m ) j − ¯ X ( m ) j +1: k (cid:63)m + (cid:98) cm (cid:99) (cid:12)(cid:12) j = k (cid:63)m ≥ (cid:18) k (cid:63)m m + c (cid:19) − / − η k (cid:63)m (cid:98) cm (cid:99) m / (cid:12)(cid:12) ¯ X ( m ) k (cid:63)m − ¯ X ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) cm (cid:99) (cid:12)(cid:12) ≥ (cid:18) k (cid:63)m m + c (cid:19) − / − η k (cid:63)m ( cm − m / (cid:12)(cid:12) ¯ X ( m ) k (cid:63)m − E ( X ( m ) k (cid:63)m ) + E ( X ( m ) k (cid:63)m ) − E ( X ( m ) k (cid:63)m +1 )+ E ( X ( m ) k (cid:63)m +1 ) − ¯ X ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) cm (cid:99) (cid:12)(cid:12) ≥ m / (cid:18) k (cid:63)m m + c (cid:19) − / − η (cid:18) k (cid:63)m m (cid:19) (cid:18) c − m (cid:19) (cid:110) | E ( X ( m ) k (cid:63)m +1 ) − E ( X ( m ) k (cid:63)m ) |− | ¯ X ( m ) k (cid:63)m − E ( X ( m ) k (cid:63)m ) − ¯ X ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) cm (cid:99) + E ( X ( m ) k (cid:63)m +1 ) | (cid:111) (B.1)by the triangle inequality. Since ( iv ) in Condition 3.7 holds, for m large enough, (B.1) is atleast ( C + c ) − / − η c ( c/

2) times m / (cid:8) | E ( Y ( m ) ) − E ( Y (0) ) | − | ¯ Y (0) k (cid:63)m − E ( Y (0) ) − ¯ Y ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) cm (cid:99) + E ( Y ( m ) ) | (cid:9) . (B.2)Notice that (3.4) implies that √ m { ¯ Y (0) k (cid:63)m − E ( Y (0) ) } = ( m/k (cid:63)m ) / × (cid:112) k (cid:63)m { ¯ Y (0) k (cid:63)m − E ( Y (0) ) } = O P (1) and that (3.5) implies that √ m { ¯ Y ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) cm (cid:99) − E ( Y ( m ) ) } = O P (1). Hence, (B.2)diverges in probability to inﬁnity as a consequence of ( ii ) in Condition 3.7.35ssume now that ( v ) in Condition 3.7 holds. Then, we can use the fact thatsup m +1 ≤ k< ∞ { w R ( k/m ) } − R m ( k ) k = k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) ≥ (cid:18) k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) m (cid:19) − / − η R m ( k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) ) ≥ m / η ( k (cid:63)m + ck (cid:63)m ) / η max m ≤ j ≤ k (cid:63)m + (cid:98) ck (cid:63)m (cid:99)− j ( k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) − j ) m / (cid:12)(cid:12) ¯ X ( m ) j − ¯ X ( m ) j +1: k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) (cid:12)(cid:12) j = k (cid:63)m ≥ m η k (cid:63)m ( ck (cid:63)m − k (cid:63)m + ck (cid:63)m ) / η (cid:12)(cid:12) ¯ X ( m ) k (cid:63)m − ¯ X ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) (cid:12)(cid:12) ≥ m η ( k (cid:63)m ) / − η ( c − /k (cid:63)m )(1 + c ) / η (cid:12)(cid:12) ¯ X ( m ) k (cid:63)m − E ( X ( m ) k (cid:63)m ) + E ( X ( m ) k (cid:63)m ) − E ( X ( m ) k (cid:63)m +1 ) + E ( X ( m ) k (cid:63)m +1 ) − ¯ X ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) (cid:12)(cid:12) , which, for m large enough, is larger than m / c c ) / η (cid:8) | E ( Y ( m ) ) − E ( Y (0) ) | − | ¯ Y (0) k (cid:63)m − E ( Y (0) ) − ¯ Y ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) + E ( Y ( m ) ) | (cid:9) . (B.3)This time, since k (cid:63) /m → ∞ , (3.4) implies that √ m { ¯ Y (0) k (cid:63)m − E ( Y (0) ) } = ( m/k (cid:63)m ) / × (cid:112) k (cid:63)m { ¯ Y (0) k (cid:63)m − E ( Y (0) ) } = o P (1) while (3.6) implies that ( m/k (cid:63)m ) / (cid:112) k (cid:63)m { ¯ Y ( m ) k (cid:63)m +1: k (cid:63)m + (cid:98) ck (cid:63)m (cid:99) − E ( Y ( m ) ) } = o P (1). Therefore, (B.3) diverges in probability to inﬁnity as a consequence of ( ii )in Condition 3.7. Proof of Theorem 3.12.

We adapt the proof of Proposition 2.7 of Kojadinovic and Verdier(2020) to the current setting. Given a set S , let (cid:96) ∞ ( S ) denote the space of all bounded real-valued functions on S equipped with the uniform metric. Fix T > c . For any s ∈ [1 , T ],let W m,Y ( s ) = m − / (cid:98) ms (cid:99) (cid:88) i =1 { Y i − E ( Y ) } and W m,Z ( s ) = m − / (cid:98) ms (cid:99) (cid:88) i =1 { Z i − E ( Z ) } . From Condition 3.11, we have that W m,Y converges weakly to a standard Brownian mo-tion W Y in (cid:96) ∞ ([1 , T ]) and W m,Z converges weakly to a standard Brownian motion W Z in (cid:96) ∞ ([1 , T ]).Let J m ( s, t ) = m − / H m ( s, t ) − K c ( s, t ), ( s, t ) ∈ ∆ T = { ( s, t ) ∈ [1 , T ] : s ≤ t } . The factthat m − / H m P → K c in (cid:96) ∞ (∆ T ) is proven if we show thatsup ( s,t ) ∈ ∆ T | J m ( s, t ) | P → . (B.4)The supremum on the left-hand side of (B.4) is equal tomax (cid:26) sup ≤ s ≤ t ≤ c | J m ( s, t ) | , sup ≤ s ≤ c ≤ t ≤ T | J m ( s, t ) | , sup c ≤ s ≤ t ≤ T | J m ( s, t ) | (cid:27) . (B.5)36otice ﬁrst that K c ( s, t ) =  , if 1 ≤ s ≤ t ≤ c,s ( t − c ) { E ( Y ) − E ( Z ) } , if 1 ≤ s ≤ c ≤ t ≤ T,c ( t − s ) { E ( Y ) − E ( Z ) } , if c ≤ s ≤ t ≤ T. (B.6)Furthermore, for any ( s, t ) ∈ ∆ T ∩ [1 , c ] , let D m,Y ( s, t ) = √ mλ m ( s, t ) { ¯ X ( m ) (cid:98) ms (cid:99) +1: (cid:98) mt (cid:99) − E ( X ( m ) k (cid:63)m ) } = W m,Y ( t ) − W m,Y ( s ) , where λ m ( s, t ) = ( (cid:98) mt (cid:99) − (cid:98) ms (cid:99) ) /m , ( s, t ) ∈ ∆ T , and, for any ( s, t ) ∈ ∆ T ∩ [ c, T ] , let D m,Z ( s, t ) = √ mλ m ( s, t ) { ¯ X ( m ) (cid:98) ms (cid:99) +1: (cid:98) mt (cid:99) − E ( X ( m ) k (cid:63)m +1 ) } = W m,Z ( t ) − W m,Z ( s ) . Under the assumptions of the theorem, from the continuous mapping theorem, D m,Y (cid:32) D Y in (cid:96) ∞ (∆ T ∩ [1 , c ] ) and D m,Z (cid:32) D Z in (cid:96) ∞ (∆ T ∩ [ c, T ] ), where D Y ( s, t ) = W Y ( t ) − W Y ( s )and D Z ( s, t ) = W Z ( t ) − W Z ( s ).From the expression of K c given in (B.6), for the ﬁrst supremum in (B.5), we obtain thatsup ≤ s ≤ t ≤ c | J m ( s, t ) | = m − / sup ≤ s ≤ t ≤ c | H m ( s, t ) | = o (1) × O P (1) P → , since H m converges weakly to ( s, t ) (cid:55)→ ( t − s ) D Y (0 , s ) − sD Y ( s, t ) in (cid:96) ∞ (∆ T ∩ [1 , c ] ) asa consequence of the fact that, for any 1 ≤ s ≤ t ≤ c , H m ( s, t ) = λ m ( s, t ) D m,Y (0 , s ) − λ m (0 , s ) D m,Y ( s, t ) and sup ( s,t ) ∈ ∆ T | λ m ( s, t ) − ( t − s ) | ≤ /m , and from the continuous mappingtheorem.Regarding the second supremum, for any 1 ≤ s ≤ c ≤ t ≤ T , we have that λ m ( s, t ) ¯ X (cid:98) ms (cid:99) +1: (cid:98) mt (cid:99) = λ m ( s, c ) ¯ X (cid:98) ms (cid:99) +1: (cid:98) mc (cid:99) + λ m ( c, t ) ¯ X (cid:98) mc (cid:99) +1: (cid:98) mt (cid:99) . Thus, on one hand, m − / H m ( s, t ) = λ m (0 , s ) { λ m ( s, t ) ¯ X (cid:98) ms (cid:99) − λ m ( s, c ) ¯ X (cid:98) ms (cid:99) +1: (cid:98) mc (cid:99) − λ m ( c, t ) ¯ X (cid:98) mc (cid:99) +1: (cid:98) mt (cid:99) } . On the other hand, from (B.6) and using again the fact that sup ( s,t ) ∈ ∆ T | λ m ( s, t ) − ( t − s ) | ≤ /m , K c ( s, t ) = λ m (0 , s ) { λ m ( s, t ) E ( Y ) − λ m ( s, c ) E ( Y ) − λ m ( c, t ) E ( Z ) } + O (1 /m ) , where the term O (1 /m ) is uniform in s, t, c . By the triangle inequality and using the factthat sup ( s,t ) ∈ ∆ T | λ m ( s, t ) | ≤ T , it then follows thatsup ≤ s ≤ c ≤ t ≤ T | J m ( s, t ) | ≤ m − / T (cid:20) sup ≤ s ≤ c | D m,Y (0 , s ) | + sup ≤ s ≤ c | D m,Y ( s, c ) | + sup c ≤ t ≤ T | D m,Z ( c, t ) | (cid:21) = o (1) × O P (1) . c ≤ s ≤ T , λ m (0 , s ) ¯ X (cid:98) ms (cid:99) = λ m (0 , c ) ¯ X (cid:98) mc (cid:99) + λ m ( c, s ) ¯ X (cid:98) mc (cid:99) +1: (cid:98) ms (cid:99) , and, hence, on one hand, for any c ≤ s ≤ t ≤ T , m − / H m ( s, t ) = λ m ( s, t ) { λ m (0 , c ) ¯ X (cid:98) mc (cid:99) + λ m ( c, s ) ¯ X (cid:98) mc (cid:99) +1: (cid:98) ms (cid:99) − λ m (0 , s ) ¯ X (cid:98) ms (cid:99) +1: (cid:98) mt (cid:99) } , while, on the other hand, K c ( s, t ) = λ m ( s, t ) { λ m (0 , c ) E ( Y ) + λ m ( c, s ) E ( Z ) − λ m (0 , s ) E ( Z ) } + O (1 /m ) , with the term O (1 /m ) again uniform in s, t, c . Finally, by the triangle inequality,sup c ≤ s ≤ t ≤ T | J m ( s, t ) | ≤ m − / T [ | D m,Y (0 , c ) | + sup c ≤ s ≤ T | D m,Z ( c, s ) | + sup c ≤ s ≤ t ≤ T | D m,Z ( s, t ) | (cid:21) = o (1) × O P (1) , which completes the proof of (B.4).It remains to prove (3.8). Recall that from the deﬁnition of the function w γ in (2.8), w γ ( t ) ≤ t ≥ (cid:15) > γ ≥

0. Hence,sup m +1 ≤ k< ∞ { w S ( k/m ) } − S m ( k ) ≥ sup m +1 ≤ k ≤(cid:98) mT (cid:99) ( k/m ) − / − η m k − (cid:88) j = m j ( k − j ) m / | ¯ X ( m ) j − ¯ X ( m ) j +1: k | = sup t ∈ [1 ,T ] (cid:18) (cid:98) mt (cid:99) m (cid:19) − / − η (cid:90) t | H m ( s, t ) | d s ≥ sup t ∈ [1 ,T ] t − / − η (cid:90) t | H m ( s, t ) | d s ≥ T − / − η m / sup t ∈ [1 ,T ] (cid:90) t | m − / H m ( s, t ) | d s P → ∞ since, by the continuous mapping theorem,sup t ∈ [1 ,T ] (cid:90) t | m − / H m ( s, t ) | d s P → sup t ∈ [1 ,T ] (cid:90) t | K c ( s, t ) | d s > . Similarly, sup m +1 ≤ k< ∞ { w T ( k/m ) } − T m ( k ) is larger thansup m +1 ≤ k ≤(cid:98) mT (cid:99) ( k/m ) − − η (cid:118)(cid:117)(cid:117)(cid:116) m k − (cid:88) j = m (cid:26) j ( k − j ) m / ( ¯ X j − ¯ X j +1: k ) (cid:27) ≥ sup t ∈ [1 ,T ] t − − η (cid:115)(cid:90) t { H m ( s, t ) } d s ≥ T − − η m / sup t ∈ [1 ,T ] (cid:115)(cid:90) t { m − / H m ( s, t ) } d s P → ∞ t ∈ [1 ,T ] (cid:115)(cid:90) t { m − / H m ( s, t ) } d s P → sup t ∈ [1 ,T ] (cid:115)(cid:90) t { K c ( s, t ) } d s > . Appendix C: Proofs of Propositions 4.1 and 6.2Proof of Proposition 4.1.

For any ﬁxed η > (cid:15) > γ ≥

0, we havesup ≤ s ≤ t< ∞ t / η max[ { ( t − /t } γ , (cid:15) ] | tW ( s ) − sW ( t ) | = sup ≤ s ≤ t< ∞ t / η max { (1 − /t ) γ , (cid:15) } (cid:12)(cid:12)(cid:12)(cid:12) tss W ( s ) − stt W ( t ) (cid:12)(cid:12)(cid:12)(cid:12) = sup ≤ s ≤ t< ∞ tst / η max { (1 − /t ) γ , (cid:15) } (cid:12)(cid:12)(cid:12)(cid:12) s W ( s ) − t W ( t ) (cid:12)(cid:12)(cid:12)(cid:12) = sup ≤ s ≤ t< ∞ st / η max { (1 − /t ) γ , (cid:15) } (cid:12)(cid:12)(cid:12)(cid:12) s W ( s ) − t W ( t ) (cid:12)(cid:12)(cid:12)(cid:12) . Let u = 1 /t and v = 1 /s . Then, the last expression on the right is equal tosup ≤ /v ≤ /u< ∞ /v (1 /u ) / η max { (1 − u ) γ , (cid:15) } | vW (1 /v ) − uW (1 /u ) | = sup

Fix η > (cid:15) > γ ≥

0, and, for any k ≥ m + 1, let V m ( k ) = (cid:15) − sup m +1 ≤ k ≤∞ ( k/m ) − / − η max m ≤ j ≤ k − j ( k − j ) m / | θ j − θ j +1: k − ¯ IF j + ¯ IF j +1: k | = (cid:15) − sup m +1 ≤ k ≤∞ ( k/m ) − / − η max m ≤ j ≤ k − j ( k − j ) m / | R ,j − R j +1: k | . (C.1)Then, using the reverse triangle inequality for the maximum norm,sup m +1 ≤ k ≤∞ { w R ( k/m ) } − | R θm ( k ) − R IFm ( k ) | ≤ V m ( k ) ≤ (cid:15) − m η sup m +1 ≤ k ≤∞ k − / − η (cid:26) max m ≤ j ≤ k − j ( k − j ) | R ,j | + max m ≤ j ≤ k − j ( k − j ) | R j +1 ,k | (cid:27) = o P (1)39ince m η sup m +1 ≤ k ≤∞ k − / − η max m ≤ j ≤ k − j ( k − j ) | R ,j | ≤ m η sup m +1 ≤ k ≤∞ k − / − η max m ≤ j ≤ k − j | R ,j |≤ m η sup m +1 ≤ k ≤∞ k − η × sup m +1 ≤ k ≤∞ k − / max m ≤ j ≤ k − j | R ,j |≤ sup m +1 ≤ k ≤∞ k − / max ≤ i

Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariancematrix estimation.

Econometrica Aue, A. and

Horv´ath, L. (2013). Structural breaks in time series.

J. Time Series Anal. Aue, A. and

Horv´ath, L. (2004). Delay time in sequential detection of change.

Statisticsand Probability Letters

221 – 231.

Aue, A. , Horv´ath, L. , Huˇskov´a, M. and

Kokoszka, P. (2006). Change-point moni-toring in linear models.

The Econometrics Journal uestad, B. and Tjøstheim, D. (1990). Identiﬁcation of nonlinear time series: First ordercharacterization and order determination.

Biometrika Chu, C.-S. J. , Stinchcombe, M. and

White, H. (1996). Monitoring Structural Change.

Econometrica Cs¨org˝o, M. and

Horv´ath, L. (1997).

Limit theorems in change-point analysis . WileySeries in Probability and Statistics . John Wiley and Sons, Chichester, UK.

Dette, H. and

G¨osmann, J. (2019). A Likelihood Ratio Approach to Sequential ChangePoint Detection for a General Class of Parameters.

Journal of the American StatisticalAssociation Fremdt, S. (2015). Page’s sequential procedure for change-point detection in time seriesregression.

Statistics G¨osmann, J. , Kley, T. and

Dette, H. (2020). A new approach for open-end sequentialchange point monitoring. arXiv . Horv´ath, L. , Huˇskov´a, M. , Kokoszka, P. and

Steinebach, J. (2004). Monitoringchanges in linear models.

Journal of Statistical Planning and Inference

225 - 251.

Jondeau, E. , Poon, S. H. and

Rockinger, M. (2007).

Financial modeling under non-Gaussian distributions . Springer, London.

Kirch, C. and

Weber, S. (2018). Modiﬁed sequential change point procedures based onestimating functions.

Electron. J. Statist. Kojadinovic, I. (2020). npcp: Some Nonparametric Tests for Change-Point Detection inPossibly Multivariate Observations R package version 0.2-2.

Kojadinovic, I. and

Verdier, G. (2020). Nonparametric sequential change-point detec-tion for multivariate time series based on empirical distribution functions.

Lai, T. L. (2001). Sequential analysis: some classical problems and new challenges.

StatisticaSinica Montgomery, D. C. (2007).

Introduction to statistical quality control . John Wiley & Sons.

Paparoditis, E. and

Politis, D. N. (2001). Tapered block bootstrap.

Biometrika Ritz, C. , Baty, F. , Streibig, J. C. and

Gerhard, D. (2015). Dose-Response AnalysisUsing R.

PLOS ONE . Smith, T. M. , Reynolds, R. W. , Peterson, T. C. and

Lawrimore, J. (2008). Im-provements to NOAA’s Historical Merged Land–Ocean Surface Temperature Analysis(1880–2006).

Journal of Climate R Core Team (2020). R: A Language and Environment for Statistical Computing R Foun-dation for Statistical Computing, Vienna, Austria.

Zeileis, A. (2004). Econometric Computing with HC and HAC Covariance Matrix Estima-tors.

Journal of Statistical Software11