[PDF] Backward CUSUM for Testing and Monitoring Structural Change

Abstract

It is well known that the conventional CUSUM test suffers from low power and large detection delay. We therefore propose two alternative detector statistics. The backward CUSUM detector sequentially cumulates the recursive residuals in reverse chronological order, whereas the stacked backward CUSUM detector considers a triangular array of backward cumulated residuals. While both the backward CUSUM detector and the stacked backward CUSUM detector are suitable for retrospective testing, only the stacked backward CUSUM detector can be monitored on-line. The limiting distributions of the maximum statistics under suitable sequences of alternatives are derived for retrospective testing and fixed endpoint monitoring. In the retrospective testing context, the local power of the tests is shown to be substantially higher than that for the conventional CUSUM test if a single break occurs after one third of the sample size. When applied to monitoring schemes, the detection delay of the stacked backward CUSUM is shown to be much shorter than that of the conventional monitoring CUSUM procedure. Moreover, an infinite horizon monitoring procedure and critical values are presented.

Full PDF

BBackward CUSUM for Testingand Monitoring Structural Change

Sven Otto ∗ University of Bonn J¨org BreitungUniversity of CologneMarch 6, 2020

Abstract

It is well known that the conventional CUSUM test suﬀers from low power andlarge detection delay. We therefore propose two alternative detector statistics. Thebackward CUSUM detector sequentially cumulates the recursive residuals in reversechronological order, whereas the stacked backward CUSUM detector considers a tri-angular array of backward cumulated residuals. While both the backward CUSUMdetector and the stacked backward CUSUM detector are suitable for retrospectivetesting, only the stacked backward CUSUM detector can be monitored on-line. Thelimiting distributions of the maximum statistics under suitable sequences of alter-natives are derived for retrospective testing and ﬁxed endpoint monitoring. In theretrospective testing context, the local power of the tests is shown to be substantiallyhigher than that for the conventional CUSUM test if a single break occurs after onethird of the sample size. When applied to monitoring schemes, the detection delayof the stacked backward CUSUM is shown to be much shorter than that of the con-ventional monitoring CUSUM procedure. Moreover, an inﬁnite horizon monitoringprocedure and critical values are presented.

Keywords: structural breaks, recursive residuals, sequential tests, change-point detection,local power, local delay ∗ Corresponding author: Sven Otto, University of Bonn, Institute for Finance and Statistics, Adenauer-allee 24-26, 53113 Bonn, Germany. Tel.: +49-228-73-9271. Mail: [email protected]. a r X i v : . [ ec on . E M ] M a r Introduction

Cumulative sums have become a standard statistical tool for testing and monitoring struc-tural changes in time series models. The CUSUM test was introduced by Brown et al.(1975) as a structural break test for the coeﬃcient vector in the linear regression model y t = x (cid:48) t β t + u t with time index t , where β t denotes the coeﬃcient vector and x t is thevector of regressor variables. Under the null hypothesis, there is no structural change, suchthat β t = β for all t = 1 , . . . , T , while, under the alternative hypothesis, the coeﬃcientvector changes at unknown time T ∗ , where 1 < T ∗ ≤ T .Sequential tests, such as the CUSUM test, consist of a detector statistic and a criticalboundary function. The CUSUM detector sequentially cumulates standardized one-stepahead forecast errors, which are also referred to as recursive residuals. The detector isevaluated for each time point within the testing period, and, if its path crosses the boundaryfunction at least once, the null hypothesis is rejected.A variety of retrospective structural break tests have been proposed in the literature.Kr¨amer et al. (1988) investigated the CUSUM test of Brown et al. (1975) under a moregeneral setting. The MOSUM tests by Bauer and Hackl (1978) and Chu et al. (1995) arebased on a moving time window of ﬁxed length. A CUSUM test statistic that cumulatesOLS residuals was proposed by Ploberger and Kr¨amer (1992), and Ploberger et al. (1989)presented a ﬂuctuation test based on a sequence of OLS estimates. Kuan and Hornik (1995)studied generalized ﬂuctuation tests. Andrews (1993) proposed a sup-Wald test, and thetests by Nyblom (1989) and Hansen (1992) consider likelihood scores instead of residuals.Since the seminal work of Chu et al. (1996), increasing interest has been focused onmonitoring structural stability in real time. Sequential monitoring procedures consist ofa detector statistic and a boundary function that are evaluated for periods beyond somehistorical time span { , , . . . , T } . It is assumed that there is no structural change withinthe historical time period. The monitoring time span with t > T can either have a ﬁxedendpoint M < ∞ or an inﬁnite horizon (see Figure 1). In the ﬁxed endpoint setting, themonitoring period starts at T + 1 and ends at M , while the boundary function depends onthe ratio m = M/T . This setting is suitable if the length of the monitoring period is known2n advance. In case of an inﬁnite horizon, the monitoring time span does not need to bespeciﬁed before the monitoring procedure starts. These two monitoring schemes are alsoreferred to as closed-end and open-end procedures (see Kirch and Kamgaing 2015). Thenull hypothesis of no structural change is rejected whenever the path of the detector crossessome critical boundary function for the ﬁrst time. CUSUM-based monitoring proceduresfor a ﬁxed endpoint are proposed in Leisch et al. (2000), Zeileis et al. (2005), Wied andGaleano (2013), and Dette and G¨osmann (2019), whereas Chu et al. (1996), Horv´ath et al.(2004), Aue et al. (2006), Fremdt (2015), and G¨osmann et al. (2019) considered an inﬁnitemonitoring horizon. Figure 1: Retrospective testing and monitoring0

T M (You are here) • retrospective ﬁxed endpoint monitoringinﬁnite horizon monitoring A drawback of the conventional retrospective CUSUM test is its low power, whereas theconventional monitoring CUSUM procedure exhibits large detection delays. This is due tothe fact that the pre-break recursive residuals are uninformative, as their expectation isequal to zero up to the break date, while the recursive residuals have a non-zero expectationafter the break. Hence, the cumulative sums of the recursive residuals typically contain alarge number of uninformative residuals that only add noise to the statistic. In contrast,if one cumulates the recursive residuals backwards from the end of the sample to thebeginning, the cumulative sum collects the informative residuals ﬁrst, and the likelihoodof exceeding the critical boundary will typically be larger than when cumulating residualsfrom the beginning onwards. In this paper, we show that such backward CUSUM testsmay indeed have a much higher power and lower detection delay than the conventionalforward CUSUM tests. 3nother way of motivating the backward CUSUM testing approach is to consider thesimplest possible situation, where, under the null hypothesis, it is assumed that the processis generated as y t = β + u t , with β and σ = V ar ( u t ) assumed to be known. We areinterested in testing the hypothesis, that at some time period T ∗ , the mean changes tosome unknown value β ∗ >

0. To test this hypothesis, we introduce the dummy variable D ∗ t , which is unity for t ≥ T ∗ and zero elsewhere. For this one-sided testing problem, thereexists a uniform most powerful test statistic, which is the t -statistic of the hypothesis δ = 0in the regression ( y t − β ) = δD ∗ t + u t : T T ∗ = 1 σ √ T − T ∗ + 1 T (cid:88) t = T ∗ ( y t − β ) . If β is unknown, we may replace it by the full sample mean y , resulting in the backwardcumulative sum of the OLS residuals from period T through T ∗ . Note that if T ∗ is unknown,the test statistic is computed for all possible values of T ∗ , whereas the starting point T ofthe backward cumulative sum remains constant. Since the sum of the OLS residuals is zero,it follows that the test is equivalent to a test based on the forward cumulative sum of theOLS residuals. In contrast, if we replace β with the recursive mean µ t − = ( t − − (cid:80) t − i =1 y t ,we obtain a test statistic based on the backward cumulative sum of the recursive residuals(henceforth, backward CUSUM). In this case, however, the test is diﬀerent from a test basedon the forward cumulative sum of the recursive residuals (henceforth, forward CUSUM).This is due to the fact that the sum of the recursive residuals is an unrestricted randomvariable. Accordingly, the two versions of the test may have quite diﬀerent properties. Inparticular, it turns out that the backward CUSUM is much more powerful than the standardforward CUSUM at the end of the sample. Accordingly, this version of the CUSUM testprocedure is better suited for the purpose of real-time monitoring, where it is crucial to bepowerful at the end of the sample.Furthermore, the conventional CUSUM test has no power against alternatives that donot aﬀect the unconditional mean of y t . In order to obtain tests that have power againstbreaks of this kind, we extend the existing invariance principle for recursive residuals to amultivariate version and consider a vector-valued CUSUM process instead of the univariate4USUM detector. For both retrospective testing and monitoring, we propose a vector-valued sequential statistic in the fashion of the score-based cumulative sum statistic ofHansen (1992). The maximum vector entry of the multivariate statistic then yields adetector and a sequential test that has power against a much larger class of structuralbreaks.In Section 2, the limiting distribution of the multivariate CUSUM process is derivedunder both the null hypothesis and local alternatives. Section 3 introduces the forwardCUSUM, the backward CUSUM, and the stacked backward CUSUM tests for both retro-spective testing and monitoring. While the backward CUSUM is only deﬁned for t ≤ T and can thus be implemented only for retrospective testing, the stacked backward CUSUMcumulates recursive residuals backwardly in a triangular scheme and is therefore suitablefor real-time monitoring. Furthermore, the local powers of the tests are compared. Inthe retrospective setting, the powers of the backward CUSUM and the stacked backwardCUSUM tests are substantially higher than that of the the conventional forward CUSUMtest if a single break occurs after one third of the sample size. In the case of monitoring,the detection delay of the stacked backward CUSUM under local alternatives is shown tobe much lower than that of the monitoring CUSUM detector by Chu et al. (1996). Section4 considers the estimation of the break date based on backward cumulated recursive resid-uals. We present an estimator, which is more accurate than the conventional maximumlikelihood estimator if the break is located at the end of the sample. In Section 5 we discusstesting against partial structural breaks. Section 6 presents simulated critical values andMonte Carlo simulation results, and Section 7 concludes. We consider the multiple linear regression model y t = x (cid:48) t β t + u t , t ∈ N , where y t is the dependent variable, and x t = (1 , x t , . . . , x tk ) (cid:48) is the vector of regressorvariables including a constant. The k × β t depends on5he time index t , and u t is an error term. Let { ( y t , x (cid:48) t ) (cid:48) , ≤ t ≤ T } be the set of historicalobservations, such that the time point T divides the time horizon into the retrospective timeperiod 1 ≤ t ≤ T and the monitoring period t > T . We impose the following assumptionson the regressors and the error term. Assumption 1. (a) { x t } t ∈ N is stationary and ergodic with E [ x t x (cid:48) t ] = C , where C ispositive deﬁnite, and E | x tj | κ < ∞ for some κ > , for all j = 2 , . . . , k .(b) { u t } t ∈ N is a stationary martingale diﬀerence sequence with respect to F t , the σ -algebragenerated by { ( x (cid:48) i +1 , u i ) (cid:48) , i ≤ t } , such that E [ u t |F t − ] = 0 , E [ u t |F t − ] = σ > , and E | u t | κ < ∞ for some κ > . Recursive residuals for linear regression models were introduced by Brown et al. (1975) asstandardized one-step ahead forecast errors. Let (cid:98) β t − = (cid:0) (cid:80) t − i =1 x i x (cid:48) i (cid:1) − (cid:0) (cid:80) t − i =1 x i y i (cid:1) be theOLS estimator at time t −

1. The recursive residuals are given by w t = y t − x (cid:48) t (cid:98) β t − (cid:113) x (cid:48) t (cid:0)(cid:80) t − i =1 x i x (cid:48) i (cid:1) − x t , t ≥ k + 1 , and w t = 0 for t = 1 , . . . , k .For testing against structural changes in the regression coeﬃcient vector, Brown et al.(1975) introduced the sequential statistic Q t,T = ( (cid:98) σ T ) − / (cid:80) tj =1 w j for t = 1 , . . . , T , where (cid:98) σ is a consistent estimator for σ . In the monitoring context, Chu et al. (1996) consideredthe detector statistic Q t,T − Q T,T for t > T . The limiting behavior of the underlyingempirical process has been thoroughly analyzed in the literature. Under H : β t = β for all t ∈ N , Sen (1982) showed that Q (cid:98) rT (cid:99) ,T = ( (cid:98) σ T ) − / (cid:80) (cid:98) rT (cid:99) j =1 w j converges weakly anduniformly to a standard Brownian motion W ( r ) for r ∈ [0 , H : β t = β + T − / g ( t/T ), where g ( r ) is piecewiseconstant and bounded. Let µ = plim T →∞ ( x , . . . , x k ) (cid:48) be the mean regressor, where x j isthe sample mean of the j -th component of the regressors, and let h ( r ) = 1 σ (cid:90) r g ( z ) d z − σ (cid:90) r (cid:90) z z g ( v ) d v d z. (1)6he authors showed that Q (cid:98) rT (cid:99) ,T converges weakly and uniformly to W ( r ) + µ (cid:48) h ( r ) for r ∈ [0 , g ( r ) is orthogonal to µ ,the limiting distributions under H and H coincide. Hence, if the break in the coeﬃcientvector does not aﬀect the unconditional mean of y t , then the CUSUM tests of Brown et al.(1975) and Chu et al. (1996) have no power against such an alternative.To sidestep this diﬃculty, we consider a multivariate cumulative sum process of recursiveresiduals, which is deﬁned as Q T ( r ) = 1 (cid:98) σ √ T C − / T (cid:98) rT (cid:99) (cid:88) t =1 x t w t , r ≥ , (2)where (cid:98) σ = ( T − k − − (cid:80) Tj =1 ( w j − w ) is a consistent estimator for σ (see Kr¨ameret al. 1988), and C T = T − (cid:80) Tt =1 x t x (cid:48) t denotes the sample covariance matrix. Note that Q T ( r ) is a vector of piecewise constant processes, where its domain can be divided intothe retrospective time period r ∈ [0 ,

1] and the monitoring period r >

1. On the domain r ∈ [0 , m ], m < ∞ , the multivariate CUSUM process is bounded in probability. Hence,each component of Q T ( r ) is in the space D ([0 , m ]) of c`adl`ag functions on [0 , m ], and Q T ( r )is an element of the k-fold product space D ([0 , m ]) k = D ([0 , m ]) × . . . × D ([0 , m ]). Thespace is equipped with the Skorokhod metric (see Billingsley 1999, p.166 and p.244), andthe symbol “ ⇒ ” denotes weak convergence with respect to this metric. The result presentedbelow summarizes the limiting behavior of Q T ( r ) for both the retrospective and the ﬁxedendpoint monitoring time period under both H and H : Theorem 1.

Let { ( x t , u t ) } t ∈ N satisfy Assumption 1, let g ( r ) be piecewise constant andbounded, and let β t = β + T − / g ( t/T ) for all t ∈ N . Then, for any ﬁxed and positive m < ∞ , Q T ( r ) ⇒ W ( r ) + C / h ( r ) , r ∈ [0 , m ] , (3) as T → ∞ , where W ( r ) is a k -dimensional standard Brownian motion and h ( r ) is deﬁnedas in (1) . Note that the function g ( r ) is constant if and only if β t is constant for all t ∈ N .Under H , we then obtain C / h ( r ) = , and thus Q T ( r ) ⇒ W ( r ). By contrast, under7 local alternative with a non-constant break function g ( r ), it follows that h ( r ) is non-zero, and, consequently, C / h ( r ) is non-zero, since C / is positive deﬁnite. The limitingdistributions of Q T ( r ) under both H and H coincide only for the trivial case where g ( r )is constant. Therefore, tests that are based on Q T ( r ) have power against a larger class ofalternatives than the tests of Brown et al. (1975) and Chu et al. (1996).The functional central limit theorem given by equation (3) is not suitable for analyzingthe asymptotic behavior of an inﬁnite horizon monitoring statistic, since the variance of Q T ( r ) is unbounded as r → ∞ , and sup r ≥ (cid:107) Q T ( r ) − W ( r ) (cid:107) might not converge in general.For an i.i.d. random process { v t } t ∈ N with E [ v ] = 0, E [ v ] = σ and E [ v κ ] < ∞ , κ > W ( r ), suchthat σ − (cid:80) Tt =1 v t = W ( T ) + o ( T /κ ), a.s., as T → ∞ , where the approximation rate isoptimal. This almost sure invariance principle is known as the KMT approximation, whichwas employed by Horv´ath (1995) to derive the limiting distribution of the inﬁnite horizonstatistic sup t>T | Q t,T − Q T,T | /d ( t/T ) for an appropriate boundary function d ( r ). Wu et al.(2007) and Berkes et al. (2014) extended the almost sure invariance principle to moregeneral classes of dependent random processes, which can be used to formulate the followingstochastic approximation result: Theorem 2.

Let { ( x t , u t ) } t ∈ N satisfy Assumption 1 and let β t = β for all t ∈ N . Then,there exists a k -dimensional standard Brownian motion W ( r ) , such that, as T → ∞ , sup r ≥ (cid:107) Q T ( r ) − W ( r ) (cid:107)√ r = o P (1) , where (cid:107) · (cid:107) denotes the maximum norm, which is the largest vector entry in absolute value. This result is the key tool to derive the limiting distribution of inﬁnite horizon moni-toring statistics that are based on the multivariate CUSUM process, which is done in thenext section. It also indicates that Q T ( r ) should be scaled by a factor of at least order √ r to approximate the process by a Brownian motion.8 CUSUM detectors

In this section, we consider sequential tests for both retrospective testing and monitoringthat are based on the multivariate CUSUM processes Q T ( r ). The null hypothesis of nostructural change in the regression coeﬃcient vector is formulated as H : β t = β for all t ∈ I , where the testing period is given by I =  { t ∈ N : 1 ≤ t ≤ T } in the retrospective context, { t ∈ N : T + 1 ≤ t ≤ mT } in the ﬁxed endpoint monitoring context, { t ∈ N : T + 1 ≤ t < ∞} in the inﬁnite horizon monitoring context.In the monitoring context, the non-contamination assumption β t = β for the historicaltime period t = 1 , . . . , T is imposed. The monitoring time span could have either a ﬁxedendpoint M = (cid:98) mT (cid:99) with m > m = ∞ .The sequential tests consist of a detector statistic and a critical boundary function, inwhich the detector is evaluated for each time point within the testing period, and, if itspath crosses the boundary function at least once, the null hypothesis is rejected. We makethe following assumption on the boundary function: Assumption 2.

The boundary function is of the form b ( r ) = λ α · d ( r ) , where λ α denotesthe critical value, which depends on the signiﬁcance level α , and d ( r ) is a continuous andstrictly increasing function with d (0) > and sup r ≥ √ r + 1 /d ( r ) < ∞ . While the forward CUSUM detectors for retrospective testing and monitoring are dis-cussed in Section 3.1, we introduce the backward CUSUM detector in Section 3.2 andthe stacked backward CUSUM detectors in Section 3.3. In Section 5 we present modiﬁeddetectors for testing and monitoring partial structural change.9 .1 Forward CUSUM

As an extension of the univariate CUSUM detector by Brown et al. (1975) we consider themultivariate retrospective CUSUM detector Q t,T = Q T (cid:0) tT (cid:1) = 1 (cid:98) σ √ T C − / T t (cid:88) j =1 x j w j , ≤ t ≤ T. The vector-valued detector is inspired by the score-based cumulative sum statistic of Hansen(1992). While Hansen (1992) considered OLS residuals and proposed averaging all entriesof the vector-valued cumulative sum, we consider recursive residuals and formulate themultivariate detectors with respect to the maximum norm (cid:107) · (cid:107) . The null hypothesis isrejected if the path of (cid:107) Q t,T (cid:107) exceeds the critical boundary function b t = λ α · d (cid:0) t/T (cid:1) atleast once within the retrospective testing period. The critical value λ α determines thesigniﬁcance level α such thatlim T →∞ P (cid:16) (cid:107) Q t,T (cid:107) ≥ λ α · d (cid:0) tT (cid:1) for at least one t = 1 , . . . , T (cid:12)(cid:12) H (cid:17) = α. Let M ret Q = max ≤ t ≤ T (cid:107) Q t,T (cid:107) /d (cid:0) t/T (cid:1) be the maximum statistic representation of theCUSUM detector. The above condition can be equivalently expressed aslim T →∞ P ( M ret Q ≥ λ α | H ) = α. Hence, λ α is the (1 − α ) quantile of the limiting null distribution of M ret Q . Note that M ret Q together with the critical value λ α deﬁnes a one-shot test that is equivalent to the sequentialCUSUM test.For real-time monitoring, we follow Chu et al. (1996) and deﬁne the multivariate retro-spective CUSUM detector as Q mon t,T = Q T (cid:0) tT (cid:1) − Q T (1) = 1 (cid:98) σ √ T C − / T t (cid:88) j = T +1 x j w j , t > T, and H is rejected if its maximum norm (cid:107) Q mon t,T (cid:107) exceeds the boundary b t = λ α · d (cid:0) ( t − T ) /T (cid:1) at least once for some t > T . For a ﬁxed endpoint M = (cid:98) mT (cid:99) , where 1 < m < ∞ , let M mon Q,m = max

T T (cid:107) Q mon t,T (cid:107) /d (cid:0) ( t − T ) /T (cid:1) .10 heorem 3. Let β t = β for all t ∈ N and let Assumptions 1 and 2 hold true. Then,(a) M ret Q d −→ sup r ∈ (0 , (cid:107) W ( r ) (cid:107) d ( r ) ,(b) M mon Q,m d −→ sup r ∈ (0 ,m − (cid:107) W ( r ) (cid:107) d ( r ) d = sup r ∈ (0 , m − m ) (cid:107) B ( r ) (cid:107) (1 − r ) d (cid:0) r − r (cid:1) , < m < ∞ ,(c) M mon Q, ∞ d −→ sup r ∈ (0 , ∞ ) (cid:107) W ( r ) (cid:107) d ( r ) d = sup r ∈ (0 , (cid:107) B ( r ) (cid:107) (1 − r ) d (cid:0) r − r (cid:1) ,as T → ∞ , where W ( r ) is a k -dimensional standard Brownian motion and B ( r ) is a k -dimensional standard Brownian bridge. While, for one-shot tests, the critical value determines the type I error, sequential testinginvolves two degrees of freedom. Besides the test size, which is controlled asymptotically byan appropriately chosen value for λ α , the shape of the boundary determines the distributionof the ﬁrst boundary crossing under the null hypothesis, which is also referred to as the“distribution of the size” (see Anatolyev and Kosenok 2018). Brown et al. (1975) suggestedthe linear boundary function b ( r ) = λ α (1 + 2 r ) , (4)which is our main benchmark. In this case, the retrospective maximum statistic satisﬁesmax ≤ t ≤ T (cid:107) Q t,T (cid:107) (cid:0) tT (cid:1) d −→ sup r ∈ (0 , (cid:107) W ( r ) (cid:107) r under H , as T → ∞ , whereas, for the monitoring maximum statistic, we obtainmax T T (cid:107) Q t,T (cid:107) (cid:0) tT (cid:1) d −→ sup r ∈ (0 , (cid:107) B ( r ) (cid:107) r . The linear boundary is widely applied in practice, but, as already noted by Brownet al. (1975), the crossing probabilities cannot be constant for all potential relative crossingtime points r . The authors argued that it is more natural to consider a boundary that is11roportional to the standard deviation of the limiting process. Such a boundary is givenby the radical function b ( r ) = λ α √ r . As noted by Zeileis (2004), if there is a single breakin the middle or at the end of the retrospective sample, there is no power gain using theradical boundary when compared to the linear boundary. Only in cases where a breakoccurs at the beginning of the sample, some increased power may be observed. Anotherproblem associated with the radical boundary is that it is not bounded away from zero. Inorder to obtain critical values and avoid size distortions, some trimming at the beginning ofthe sample in the fashion of the sup-Wald test by Andrews (1993) is necessary. For inﬁnitehorizon monitoring, Chu et al. (1996) also considered a boundary function of radical type,which is given by b ( r ) = (cid:113) ( r + 1) ln (cid:0) r +1 α (cid:1) . (5)The boundary is based on a result on boundary crossing probabilities for the path ofBrownian motions. Robbins and Siegmund (1970) showed that P (cid:16) | W ( r ) | ≥ (cid:113) ( r + 1) ln (cid:0) r +1 α (cid:1) for some r ≥ (cid:17) = α, and the univariate monitoring CUSUM detector together with the radical boundary byChu et al. (1996) thus yields a sequential test that has size α , as m → ∞ . Anatolyev andKosenok (2018) derived a theoretical boundary that yields a uniformly distributed size.However, their boundary has no closed form solution and is only valid for the univariateretrospective and ﬁxed endpoint monitoring cases. Furthermore, simulations, which areomitted here, indicate that, on the one hand, their approximative boundary does indeedyield a uniform size distribution, but, on the other hand, their CUSUM test performsuniformly worse in terms of power compared to the test when using the linear boundary ofBrown et al. (1975). Note that in the context of inﬁnite horizon monitoring the size cannotbe uniformly distributed. An alternative approach is to cumulate the recursive residuals in reversed order. Supposethere is a single break in β t at time t = T ∗ . Then, { w t , t < T ∗ } are the residuals from12he pre-break period, and { w t , t ≥ T ∗ } are those from the post-break period. The pre-break residuals do not contain any information about the break and have mean zero. Thepartial sum process T − / (cid:80) tj =1 w j has a random walk behavior for the pre-break period t < T ∗ , and cumulating those residuals brings nothing but noise to the detector statistic. Incontrast, the post-break residuals have nonzero mean and reveal relevant information abouta possible break. In order to focus on the post-break residuals, we consider backwardlycumulated partial sums of the form T − / (cid:80) t − j =0 w T − j . We deﬁne the retrospective backwardCUSUM detector as BQ t,T = Q T (1) − Q T (cid:0) t − T (cid:1) = 1 (cid:98) σ √ T C − / T T (cid:88) j = t x j w j , where 1 ≤ t ≤ T . The null hypothesis is rejected if the path of (cid:107) BQ t,T (cid:107) exceeds theboundary b t = λ α · d (cid:0) ( T − t − /T (cid:1) for at least one time index t . Theorem 4.

Let β t = β for all t ∈ N and let Assumptions 1 and 2 hold true. Then, M ret BQ = max ≤ t ≤ T (cid:107) BQ t,T (cid:107) d (cid:0) T − t +1 T (cid:1) d −→ sup r ∈ (0 , (cid:107) W ( r ) (cid:107) d ( r ) as T → ∞ , where W ( r ) is a k -dimensional standard Brownian motion. Using the same boundary as for the retrospective forward CUSUM, the limiting nulldistributions of their maximum statistics coincide. Simulated critical values when usingthe linear boundary are presented in Table 1. A simple illustrative example of the detectorpaths together with the linear boundary of Brown et al. (1975) are depicted in Figure 2,in which a process with k = 1 and a single break in the mean at 3 / t and is therefore not suitablefor a monitoring procedure. The path of (cid:107) BQ t,T (cid:107) is only deﬁned for t ≤ T , as its endpoint T is ﬁxed. 13igure 2: Illustrative example for the backward CUSUM with a break in the mean − − Forward CUSUM time 0 20 40 60 80 100 − − Backward CUSUM timedetector statistic linear boundary (5%) recursive residuals

Note: The process y t = µ t + u t , t = 1 , . . . , T , is simulated for T = 100 with µ t = 0 for t < µ t = 1 for t ≥

75, and i.i.d. standard normal innovations u t . The bold solid line paths are the trajectories of (cid:107) Q t,T (cid:107) and (cid:107) BQ t,T (cid:107) , where the detectors are univariate such that the norm is just the absolute value. In thebackground, the recursive residuals are plotted. The dashed lines correspond to the linear boundary (4)with signiﬁcance level α = 5% and critical value λ α = 0 . To combine the advantages of the backward CUSUM with the measurability properties ofthe forward CUSUM for monitoring, we resort to an inspection scheme, which goes backto Page (1954) and involves a triangular array of residuals together with an additionalmaximum. Let M ret BQ ( t ) = max ≤ s ≤ t (cid:107) Q T (cid:0) tT (cid:1) − Q T (cid:0) s − T (cid:1) (cid:107) d (cid:0) t − s +1 T (cid:1) be the backward CUSUM statistic with endpoint t . The idea is to compute this statisticsequentially for each time point t = 1 , . . . , T , yielding M ret BQ (1) , M ret BQ (2) , . . . , M ret BQ ( T ).The stacked backward CUSUM statistic is the maximum among this sequence of backwardCUSUM statistics. An important feature of this sequence is that it is measurable withrespect to the ﬁltration of information at time t and M ret BQ ( t ) can thus be adapted forreal-time monitoring. The stacked backward CUSUM detector is deﬁned as SBQ s,t,T = Q T (cid:0) tT (cid:1) − Q T (cid:0) s − T (cid:1) = 1 (cid:98) σ √ T C − / T t (cid:88) j = s x j w j , ≤ s ≤ t < ∞ . Since the upper and the lower summation index of

SBQ s,t,T are both ﬂexible with s ≤ t ,this induces a triangular scheme. H is rejected if (cid:107) SBQ s,t,T (cid:107) exceeds the two-dimensional14oundary b s,t = λ α · d (cid:0) ( t − s + 1) /T (cid:1) for some s and t with 1 ≤ s ≤ t ≤ T , or, equivalently,if the double maximum statistic M ret SBQ = max ≤ t ≤ T M ret BQ ( t ) = max ≤ t ≤ T max ≤ s ≤ t (cid:107) SBQ s,t,T (cid:107) d (cid:0) t − s +1 T (cid:1) exceeds λ α .The backward CUSUM maximum statistic M ret BQ ( t ) is itself a sequential statistic. Stack-ing all those maximum statistics on one another leads to an additional maximum and adouble supremum in the limiting distribution. The stacked backward CUSUM uses therecursive residuals in a multiple way such that the set over which the maximum is takenhas many more elements than the forward CUSUM and the backward CUSUM. For t = 1only w is cumulated, for t = 2 the residuals w and w are cumulated, for t = 3 we consider w , w , and w , and so forth.The triangular detector can also be monitored on-line across all the time points t > T .The null hypothesis is rejected if (cid:107) SBQ s,t,T (cid:107) exceeds b s,t = λ α · d (cid:0) ( t − s + 1) /T (cid:1) at leastonce for some s and t with T < s ≤ t . Analogously to the retrospective case, let M mon BQ ( t ) = max T ~~T , and let M mon SBQ,m = max~~

T
Let β t = β for all t ∈ N and let Assumptions 1 and 2 hold true. Then,(a) M ret SBQ d −→ sup r ∈ (0 , sup s ∈ (0 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) ,(b) M mon SBQ,m d −→ sup r ∈ (0 ,m − sup s ∈ (0 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) d = sup r ∈ (0 , m − m ) sup s ∈ (0 ,r ) (cid:107) (1 − s ) B ( r ) − (1 − r ) B ( s ) (cid:107) (1 − r )(1 − s ) d (cid:0) r − s (1 − r )(1 − s ) (cid:1) , < m < ∞ , c) M mon SBQ, ∞ d −→ sup r ∈ (0 , ∞ ) sup s ∈ (0 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) d = sup r ∈ (0 , sup s ∈ (0 ,r ) (cid:107) (1 − s ) B ( r ) − (1 − r ) B ( s ) (cid:107) (1 − r )(1 − s ) d (cid:0) r − s (1 − r )(1 − s ) (cid:1) ,as T → ∞ , where W ( r ) is a k -dimensional standard Brownian motion and B ( r ) is a k -dimensional standard Brownian bridge. Analogously to the forward CUSUM, for the linear boundary of Brown et al. (1975), itfollows that,max

T T max T ≤ s ≤ t − (cid:107) SBQ mon s,t,T (cid:107) t − sT ) d −→ sup r ∈ (0 , sup s ∈ (0 ,r ) (cid:107) (1 − s ) B ( r ) − (1 − r ) B ( s ) (cid:107) (1 − r )(1 − s ) + 2( r − s ) , (6)under H , as T → ∞ . Simulated critical values are presented in Tables 1 and 2. In order to illustrate the advantages of the backward CUSUM and the stacked backwardCUSUM tests, we consider the simple model y t = β t + u t with a local break in the mean.Let the mean be given by β t = β + T − / g ( t/T ) , (7)where g ( r ) is a piecewise constant and bounded function. Note that in this case themultivariate CUSUM process coincides with the univariate CUSUM process Q (cid:98) rT (cid:99) ,T . Fur-thermore, note that the covariance matrix C is equal to unity, and the maximum norm for k = 1 is simply the absolute value. Theorem 1 yields Q (cid:98) rT (cid:99) ,T ⇒ W ( r ) + h ( r ), r ∈ [0 , m ],where h ( r ) = 1 σ (cid:90) r g ( z ) d z − σ (cid:90) r (cid:90) z z g ( v ) d v d z, . . . . . t * = 0.1 r e j e c t i on f r equen cy c/ s . . . . . t * = 0.3 r e j e c t i on f r equen cy c/ s . . . . . t * = 0.5 r e j e c t i on f r equen cy c/ s . . . . . t * = 0.7 r e j e c t i on f r equen cy c/ s . . . . . t * = 0.9 r e j e c t i on f r equen cy c/ s . . . . . c/ s = 10 r e j e c t i on f r equen cy r* forward CUSUM backward CUSUM stacked backward CUSUM Note: The plots show simulated local power curves. While, for the plots at the top and the ﬁrst two plots at thebottom, the break location is ﬁxed with τ ∗ ∈ { . , . , . , . , . } and local break sizes c/σ are shown on thex-axis, for the last plot, the local break size is ﬁxed with c/σ = 10, and the breakpoint locations τ ∗ are givenon the x-axis. The linear boundary (4) is implemented for a signiﬁcance level of α = 5%. m = 4 . . . . t * = 1.5 c/ s r e l a t i v e m ean de l a y . . . . t * = 3 c/ s r e l a t i v e m ean de l a y . . . . c/ s = 20 r* r e l a t i v e m ean de l a y stacked backward CUSUM forward CUSUM (linear boundary) forward CUSUM (radical boundary) Note: The plots show simulated local mean delay curves with relative mean delays given on the y-axis. While,for the ﬁrst two plots, the break locations are ﬁxed with τ ∗ ∈ { . , } and local break sizes c/σ are given on thex-axis, for the last plot, the local break size is ﬁxed with c/σ = 20, and the breakpoint locations τ ∗ are givenon the x-axis. The stacked backward CUSUM is implemented with the linear boundary (4). For the forwardCUSUM both the linear boundary (4) and the radical boundary (5) are considered. The size level is α = 5%. and together with the continuous mapping theorem, it follows that M ret Q d −→ sup r ∈ (0 , | W ( r ) + h ( r ) | d ( r ) , M ret BQ d −→ sup r ∈ (0 , | W ( r ) + h (1) − h (1 − r ) | d ( r ) , M ret SBQ d −→ sup r ∈ (0 , sup s ∈ (0 ,r ) | W ( r ) − W ( s ) + h ( r ) − h ( s ) | d ( r − s ) , as T → ∞ . While, under H , the limiting distributions for the retrospective forwardCUSUM and the retrospective backward CUSUM coincide, they diﬀer from each otherunder the alternative. The maximum statistics in the ﬁxed endpoint monitoring casesatisfy M mon Q,m d −→ sup r ∈ (0 ,m − | W ( r ) + h ( r + 1) − h (1) | d ( r ) , M mon SBQ,m d −→ sup r ∈ (0 ,m − sup s ∈ (0 ,r ) | W ( r ) − W ( s ) + h ( r + 1) − h ( s + 1) | d ( r − s ) , as T → ∞ .Generally, none of the tests can be shown to be uniformly more powerful in comparisonto the other tests. However, we can compare the tests under particular alternatives. Weconsider a single break in the mean, where the break function is given by g ( r ) = c · { r ≥ τ ∗ } τ ∗ denotes the break location. Then, h ( r ) = cσ (cid:90) rτ ∗ d z − cσ (cid:90) r (cid:90) zτ ∗ z d v d z = cτ ∗ σ (cid:90) rτ ∗ z d z = cτ ∗ (ln( r ) − ln( τ ∗ ))1 { r ≥ τ ∗ } σ . (8)Simulated asymptotic local power curves under the limiting distribution at a 5% signif-icance level are presented in Figure 3 for the retrospective case. The Brownian motions areapproximated on a grid of 1,000 equidistant points, and the linear boundary d ( r ) = 1 + 2 r is implemented. The rejection rates are obtained from 100,000 Monte Carlo repetitionsfor diﬀerent break locations. The plots show that for a single break that is located after15% of the sample size, the backward CUSUM and the stacked backward CUSUM clearlyoutperform the forward CUSUM in terms of power. The backward CUSUM performs bestfor τ ∗ > .

3, while the stacked backward CUSUM outperforms the other two tests if thebreak is located at around 1 / m = 2, the local power curves of theforward CUSUM test and the stacked backward CUSUM test have exactly the same shapeas their counterparts in the retrospective case. The monitoring local power curve for abreak at τ ∗ ∈ (1 ,

2) then coincides with the corresponding retrospective curve in Figure 3with a single break at τ ∗ −

1. Hence, the power of the stacked backward CUSUM is alwayshigher than that of the forward CUSUM if τ ∗ ≥ .

15 in the monitoring case.The much more important performance measure for monitoring detectors is the delaybetween the actual break and the detection time point, since every ﬁxed nontrivial alter-native will be detected if the monitoring horizon is long enough. Let T d be the stoppingtime of the time point of the ﬁrst boundary crossing, and let the mean local relative delaybe given by E (cid:2) T d /T | τ ∗ ≤ T d /T ≤ m (cid:3) − τ ∗ . Figure 4 presents the simulated mean localrelative delay curves for the ﬁxed endpoint m = 4 for M mon SBQ, with the linear boundary,for M mon Q, with the linear boundary, and for M mon Q, with the radical boundary by Chu et al.(1996). The mean local relative delay of the stacked backward CUSUM is much lower thanthat of the forward CUSUM. Furthermore, the mean local relative delay is constant acrossdiﬀerent break locations, with the exception of breaks that are located at τ ∗ < . H . The upper three pictures in Figure 519igure 5: Size distributions of the retrospective and monitoring detectors Forward CUSUM time point of rejection den s i t y . . . . . Backward CUSUM time point of rejection den s i t y . . . . . Stacked backward CUSUM time point of rejection den s i t y . . . . . . . . Stacked backward CUSUM time point of rejection den s i t y . . . . . . . Forward CUSUM time point of rejection den s i t y . . . . . . . Forward CUSUM (radical boundary) time point of rejection den s i t y . . . . . Note: The frequencies of the location of the ﬁrst boundary exceedance under the null hypothesis are shown fora signiﬁcance level of 5% for the model with k = 1. The frequencies are based on random draws under thelimiting null distribution of the maximum statistics. The retrospective cases is considered for the upper threehistograms and the ﬁxed endpoint monitoring case with m = 10 for the lower three. The linear boundary (4) isconsidered in the ﬁrst ﬁve plots and the radical boundary by Chu et al. (1996) is used in the last plot. present histograms of the asymptotic size distributions for retrospective tests using the lin-ear boundary. For the forward CUSUM, the highest rejection rates under H are obtainedat relative locations between 0 .

15 and 0 . . .

85. The distribution for the forward CUSUM is right-skewed, whereas,for the backward CUSUM, it is left-skewed. For the stacked backward CUSUM, the dis-tribution is much closer to a uniform distribution, although it is slightly left-skewed. Notethat the size distributions provide information about the location of false rejections, but,when comparing Figure 3 with Figure 5, it is reasonable to assume that this is also relatedto the distribution of the power across diﬀerent time points. There is no consensus on whichdistribution should be preferred, as whether one wishes to put more weight on particularregions of time points of rejection depends on the particular application. However, Zeileis20t al. (2005) and Anatolyev and Kosenok (2018) argue that if no further information isavailable, one might prefer a uniform distribution to a skewed one. The lower three pic-tures in Figure 5 present the distributions of the size for the ﬁxed monitoring horizon with m = 10. The distribution for the stacked backward CUSUM is much closer to a uniformdistribution compared to those of the forward CUSUM variants. As soon as the testing procedure has indicated a structural instability in the coeﬃcientvector, the next step is to locate the break point. In the single break model with coeﬃcientvector β t = β + δ { t ≥ T ∗ } , δ (cid:54) = , (9)Horv´ath (1995) suggested to estimate the relative break date τ ∗ = T ∗ /T by the relativetime index for which the likelihood ratio statistic is maximized. As an asymptoticallyequivalent estimator, Bai (1997) proposed the maximum likelihood estimator (cid:98) τ ret ML = 1 T · argmin ≤ t ≤ T (cid:0) S ( t ) + S ( t ) (cid:1) , (10)where S ( t ) is the OLS residual sum of squares using observations until time point t and S ( t ) is the OLS residual sum of squares using observations from time t + 1 onwards. Incase of monitoring, Chu et al. (1996) considered (cid:98) τ mon ML = 1 T · argmin T
5, the solid line showsthe trajectory of the asymptotic mean of the scaled detector h ∗ ( r ) / √ − r andthe dashed line shows the trajectory of h ∗ ( r ) given by equations (13) and (12). To bypass this problem, we use backwardly cumulated recursive residuals to estimatethe relative break location. As illustrated in Figure 2, the backward CUSUM detectoris approximately constant in the pre-break period and decreases to zero in the post-breakperiod, and the maximum is attained near the break location t = T ∗ when dividing (cid:107) BQ t,T (cid:107) by its standard deviation (cid:112) ( T − t + 1) /T . Accordingly, we consider the estimators (cid:98) τ ret = 1 T · argmax ≤ t ≤ T (cid:13)(cid:13) BS t,T (cid:13)(cid:13) , (cid:98) τ mon = 1 T · argmax T
Let { ( x t , u t ) } t ∈ N satisfy Assumption 1 and let β t be given by equation (9) .Then, as T → ∞ ,(a) (cid:98) τ ret p −→ τ ∗ , if τ ∗ ∈ (0 , ,(b) (cid:98) τ mon p −→ τ ∗ , if τ ∗ ∈ (1 , T d /T ] . It is not always a good idea to use all entries of the multivariate CUSUM process, especiallyif k is large and if the focus is to test for breaks in only some regression coeﬃcients.Following the discussion of Section 2, the univariate CUSUM tests of Brown et al. (1975)and Chu et al. (1996) are partial structural break tests in the sense that they have onlypower against a break in the intercept. However, since the critical values for the multivariateCUSUM test increase with the number of regressors k , the univariate CUSUM test has ahigher power against a break in the intercept than the multivariate counterpart if k ≥ l < k linear combinations of the regressioncoeﬃcients, which can be expressed by some orthonormal k × l matrix H , such that thepartial stability hypothesis (cid:101) H : H (cid:48) β t = H (cid:48) β is tested against (cid:101) H : H (cid:48) β t (cid:54) = H (cid:48) β for some t . The corresponding partial multivariate CUSUM statistic is given by (cid:101) Q t,T = H (cid:48) Q t,T . Incase of a test for a break in only the intercept, (cid:101) Q t,T coincides with the univariate CUSUMdetector Q t,T , where H = (1 , , . . . , (cid:48) . Analogously, we deﬁne (cid:103) BQ t,T = (cid:101) Q T,T − (cid:101) Q t − ,T , (cid:93) SBQ s,t,T = (cid:101) Q t,T − (cid:101) Q s − ,T . Under (cid:101) H , Theorem 1 yields (cid:101) Q (cid:98) rT (cid:99) ,T ⇒ H (cid:48) W ( r ), where H (cid:48) W ( r ) is an l -dimensional stan-dard Brownian motion, since the columns of H are orthonormal. Hence, the limiting dis-tributions of the maximum statistics that are based on the modiﬁed detectors coincide with23able 1: Asymptotic critical values for the retrospective tests M ret Q and M ret BQ M ret SBQ ν

20% 10% 5% 2 .

5% 1% 20% 10% 5% 2 .

5% 1%1 0.734 0.847 0.945 1.034 1.143 1.018 1.113 1.198 1.278 1.3742 0.839 0.941 1.032 1.115 1.219 1.107 1.196 1.277 1.352 1.4423 0.895 0.993 1.081 1.163 1.260 1.156 1.244 1.321 1.392 1.4814 0.933 1.029 1.114 1.192 1.287 1.190 1.275 1.350 1.419 1.5065 0.962 1.056 1.139 1.216 1.307 1.216 1.299 1.372 1.441 1.5266 0.985 1.077 1.160 1.235 1.323 1.237 1.317 1.388 1.457 1.5417 1.005 1.095 1.176 1.249 1.338 1.253 1.333 1.404 1.471 1.5568 1.021 1.110 1.189 1.261 1.349 1.268 1.347 1.418 1.483 1.566

Note: Critical values λ α are reported for the linear boundary in (4). The ν -dimensional Gaussianprocesses in the limiting distributions are simulated on a grid of 10,000 equidistant points with100,000 Monte Carlo repetitions. In case of a global structural break test we have ν = k , and in caseof a partial structural break test we have ν = l . those presented in Theorems 3–5, except that the Brownian motions are l -dimensional in-stead of k -dimensional. Critical values are presented in Tables 1 and 2 in the subsequent sec-tion. Under the conditions of Theorem 1, it follows that (cid:101) Q (cid:98) rT (cid:99) ,T ⇒ H (cid:48) W ( r ) + H (cid:48) C / h ( r ),where H (cid:48) C / h ( r ) (cid:54) = if H (cid:48) g ( r ) is not constant, Hence, the modiﬁed tests have poweragainst all nontrivial alternatives of the form H (cid:48) β t = H (cid:48) β + T − / H (cid:48) g ( t/T ). Tables 1 and 2 present critical values for the retrospective and monitoring detectors usingthe linear boundary (4). Empirical sizes for the retrospective case are shown in Table 3.The tests have only minor size distortions in ﬁnite samples. The empirical powers of theretrospective tests are compared with that of the sup-Wald test of Andrews (1993). Thesup-Wald statistic is given by max r ∈ [ r , − r ] T · S − S ( r ) − S ( r ) r (1 − r ) , where S is the OLS residual sum of squares using observations { , . . . , T } , S ( r ) is theOLS residual sum of squares using observations { , . . . , (cid:98) rT (cid:99)} , and S ( r ) is the OLS residualsum of squares using observations {(cid:98) rT (cid:99) + 1 , . . . , T } . The parameter r deﬁnes the lower24able 2: Asymptotic critical values for M mon SBQ,m ν = 1 ν = 2 ν = 3 ν = 4m 10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%1.2 0.782 0.859 1.024 0.859 0.935 1.092 0.902 0.975 1.129 0.932 1.003 1.1521.4 0.941 1.030 1.208 1.028 1.111 1.277 1.076 1.156 1.320 1.108 1.185 1.3451.6 1.026 1.113 1.292 1.111 1.192 1.365 1.158 1.238 1.406 1.189 1.269 1.4321.8 1.077 1.162 1.344 1.161 1.244 1.411 1.208 1.286 1.452 1.240 1.317 1.4762 1.113 1.198 1.374 1.196 1.277 1.442 1.244 1.321 1.481 1.275 1.350 1.5063 1.211 1.293 1.462 1.291 1.366 1.524 1.334 1.407 1.558 1.363 1.436 1.5824 1.262 1.339 1.500 1.336 1.410 1.564 1.378 1.450 1.599 1.407 1.478 1.6216 1.316 1.390 1.544 1.387 1.460 1.606 1.428 1.496 1.638 1.456 1.522 1.6608 1.346 1.419 1.569 1.417 1.486 1.629 1.456 1.522 1.661 1.483 1.548 1.68610 1.367 1.440 1.588 1.437 1.503 1.644 1.475 1.540 1.677 1.500 1.565 1.703 ∞ ν = 5 ν = 6 ν = 7 ν = 8m 10% 5% 1% 10% 5% 1% 10% 5% 1% 10% 5% 1%1.2 0.954 1.023 1.170 0.972 1.041 1.186 0.987 1.054 1.198 1.000 1.065 1.2061.4 1.133 1.208 1.366 1.152 1.225 1.381 1.167 1.241 1.396 1.181 1.253 1.4091.6 1.214 1.293 1.452 1.235 1.311 1.466 1.251 1.325 1.477 1.265 1.339 1.4881.8 1.265 1.340 1.496 1.283 1.357 1.511 1.300 1.372 1.525 1.315 1.385 1.5372 1.299 1.372 1.526 1.317 1.388 1.541 1.333 1.404 1.556 1.347 1.418 1.5663 1.386 1.457 1.601 1.404 1.472 1.615 1.420 1.487 1.629 1.433 1.500 1.6404 1.429 1.497 1.638 1.446 1.513 1.651 1.461 1.527 1.665 1.473 1.539 1.6796 1.476 1.541 1.680 1.492 1.557 1.696 1.507 1.571 1.709 1.519 1.583 1.7188 1.503 1.567 1.706 1.519 1.582 1.718 1.533 1.596 1.728 1.545 1.607 1.73910 1.520 1.584 1.718 1.536 1.599 1.732 1.551 1.612 1.744 1.562 1.623 1.752 ∞ Note: Critical values λ α are reported using the linear boundary (4). The ν -dimensional Gaussian processes in the limitingdistributions are simulated on a grid of 10,000 equidistant points with 100,000 Monte Carlo repetitions. In case of a globalstructural break test we have ν = k , and in case of a partial structural break test we have ν = l . The critical value for m = ∞ corresponds to the right-hand side process of equation (6). k = 1 k = 2 k = 3 k = 4 T

100 200 500 100 200 500 100 200 500 100 200 500 M ret Q M ret BQ M ret SBQ

Note: Simulated rejection rates under H are presented in percentage points. The values are obtainedfrom 100,000 Monte Carlo repetitions using the critical values from Table 1 for the linear boundarywith α = 5%. The cases k = 1 , . . . , y t = β + u t , y t = β + β x t + u t , y t = β + β x t + β x t + u t , and y t = β + β x t + β x t + β x t + u t , respectively, where x t , x t , x t , and u t are simulated independently as standard normal random variables for all t = 1 , . . . , T . Table 4: Size-adjusted powers of the retrospective tests

Model (14) ( k = 1) Model (15) ( k = 2) M ret Q M ret BQ M ret SBQ supW M ret Q M ret BQ M ret SBQ supW τ ∗ = 0 . τ ∗ = 0 . τ ∗ = 0 . τ ∗ = 0 . τ ∗ = 0 . τ ∗ = 0 . τ ∗ = 0 . τ ∗ = 0 . τ ∗ = 0 . Note: Simulated size-adjusted rejection rates under models (14) and (15) are presented in percentagepoints for a signiﬁcance level of 5% and a sample size of T = 100, where supW denotes the sup-Waldtest with r = 0 .

15. The values are obtained from 100,000 Monte Carlo repetitions for a sample sizeof T = 100, while the linear boundary (4) is implemented. and upper trimming parameters. In the subsequent simulations, we consider r = 0 . r ∈ [ r , − r ] B ( r ) (cid:48) B ( r ) / ( r (1 − r )), and critical values for diﬀerent values of r and k are tabulated in Andrews (1993), where it is also shown that the sup-Wald test hasweak optimality properties. In the case of a single structural break, its local power curveapproaches the power curve from the infeasible point optimal maximum likelihood testasymptotically, as the signiﬁcance level tends to zero. Note that the sup-Wald statistic isnot suitable for monitoring, since its numerator statistic T ( S − S ( t/T ) − S ( t/T )) is not26easurable with respect to the ﬁltration of information at time t .We illustrate the ﬁnite sample performance for a simple model with k = 1 and a breakin the mean, which is given by y t = µ t + u t , µ t = 2 + 0 . · { t ≥ τ ∗ T } , u t iid ∼ N (0 , , (14)and for a univariate linear regression model with a break in the slope coeﬃcient, which isgiven by y t = µ t + β t x t + u t , µ t = 2 , β t = 1 + 0 . · { t ≥ τ ∗ T } , x t , u t iid ∼ N (0 , , (15)where t = 1 , . . . , T . Table 4 presents the size-adjusted power results.First, we observe that the backward CUSUM and the stacked backward CUSUM out-perform the forward CUSUM, except for the case τ ∗ = 0 .

1. Second, while the forwardCUSUM test has much lower power than the sup-Wald test, the reversed order cumulationstructure in the backward CUSUM seems to compensate for this weakness of the forwardCUSUM test. The backward CUSUM performs equally well than the sup-Wald test, whichis remarkable since, as discussed previously, the latter test has weak optimality properties.Finally, while the sup-Wald statistic and the backward CUSUM detector are not suitablefor monitoring, the stacked backward CUSUM test is much more powerful than the forwardCUSUM test, and its detector statistic is therefore well suited for real-time monitoring.In order to evaluate the ﬁnite sample performances of the monitoring detectors, weconsider models (14) and (15) for the time points t = T + 1 , . . . , (cid:98) mT (cid:99) . We simulate theseries up to the ﬁxed endpoints m ∈ { . , , , } , while the critical values for the case m = ∞ are implemented (see Table 1). For M mon Q, ∞ with the linear boundary, the 5%critical values are given by 0 .

957 for k = 1 and 1 .

044 for k = 2. Table 5 presents theempirical sizes. Note, that the tests are undersized by construction, as not all of the sizeis used up to the time point mT . For k ≥

2, we observe some size distortions for smallsample sizes. The results in Table 6 show that the mean delay for the stacked backwardCUSUM is much lower than that of the forward CUSUM and is almost constant across thebreakpoint locations. 27able 5: Empirical sizes of the inﬁnite horizon monitoring detectors k = 1 k = 2 T = 100 T = 500 T = 100 T = 200 T = 500horizon SBQ Q CSW SBQ Q CSW SBQ Q SBQ Q SBQ Q m = 1 . m = 2 0.2 4.2 0.1 0.2 4.4 0.1 1.4 6.6 0.7 5.5 0.4 4.8 m = 4 1.0 4.7 0.9 0.9 4.8 0.8 4.8 7.3 2.5 6.0 1.4 5.2 m = 6 1.7 4.7 1.6 1.4 4.8 1.4 7.7 7.4 4.1 6.0 2.3 5.2 m = 8 2.4 4.7 2.0 2.0 4.8 1.8 10.3 7.4 5.7 6.0 3.3 5.2 m = 10 3.1 4.7 2.3 2.7 4.8 2.0 12.7 7.4 7.2 6.0 4.3 5.2 Note: Simulated rejection rates under H are presented in percentage points. The linear boundary (4) isimplemented, while critical values for α = 5% and m = ∞ are considered. The values are obtained from 100,000random draws of the models y t = β + u t and y t = β + β x t + u t for t = 1 , . . . , (cid:98) mT (cid:99) , where x t and u t arei.i.d. and standard normal. While SBQ and Q correspond to the stacked backward CUSUM and the forwardCUSUM with critical values for the case m = ∞ , the univariate test by Chu et al. (1996) using the radicalboundary (5) is denoted by CSW. Table 6: Empirical mean detection delays of the monitoring detectors

Model (14) Model (15) Model (14) Model (15)SBQ Q CSW SBQ Q SBQ Q CSW SBQ Q τ ∗ = 1 . τ ∗ = 3 36.0 99.1 71.1 52.4 129.6 τ ∗ = 2 38.4 59.4 60.1 57.7 77.0 τ ∗ = 5 34.5 178.0 89.4 48.1 233.6 τ ∗ = 2 . τ ∗ = 10 33.5 374.6 124.2 45.7 487.8 Note: The empirical mean detection delays are obtained from 100,000 Monte Carlo repetitions using size-adjustedcritical values for a signiﬁcance level of 5%, where models (14) and (15) are simulated for t = 1 , . . . , (cid:98) mT (cid:99) with T = 100and m = 20. While SBQ and Q correspond to the stacked backward CUSUM and the forward CUSUM with the linearboundary (4) and with critical values for the case m = ∞ , the univariate test by Chu et al. (1996) with the radicalboundary (5) is denoted by CSW. To compare the breakpoint estimator (11) with its maximum likelihood benchmark(10), we present Monte Carlo simulation results for model (14) for the bias and the meansquared error (MSE) in Table 7. If the break τ ∗ is located after 85% of the sample, theestimator based on backwardly cumulated recursive residuals has a much lower bias andMSE than the maximum likelihood estimator, which is due to the fact that the post-breaksample consists of too few observations for an accurate maximum likelihood estimation.28able 7: Bias and MSE of breakpoint estimators T=100 T=200Bias MSE Bias MSE τ ∗ ML BQ ML BQ ML BQ ML BQ0.5 0.000 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − Note: The Bias and MSE results for the breakdate estimators (10) and (11) are obtainedfrom 100,000 Monte Carlo repetitions, where model (14) is simulated for t = 1 , . . . , T . MLdenotes the maximum likelihood estimator (cid:98) τ ∗ ret ML and BQ denotes the estimator (cid:98) τ ret , whichis based on backwardly cumulated recursive residuals. In this paper we propose two alternatives to the conventional CUSUM detectors by Brownet al. (1975) and Chu et al. (1996). It has been demonstrated that cumulating the recursiveresiduals backwardly result in much higher power than using forwardly cumulated recursiveresiduals, in particular if the break is located at the end of the sample. Accordingly, thebackward scheme is especially attractive for on-line monitoring. To this end the stackedtriangular array of backwardly cumulated recursive residuals is employed and we ﬁnd thatthis approach yields a much lower detection delay than the monitoring procedure by Chuet al. (1996). Due to the multivariate nature of our tests, they also have power againststructural breaks that do not aﬀect the unconditional mean of the dependent variable.We also suggest a new estimator for break date based on backwardly cumulated recursiveresiduals. This estimator outperforms the conventional estimator constructed by the sumof squared residuals whenever the break occurs close to the end of the sample, which is therelevant scenario for on-line monitoring. 29 cknwoledgements

We are thankful to Holger Dette, Josua G¨osmann and Dominik Wied for very helpfulcomments and suggestions. Further, we would like to thank the participants of the RMSEmeeting 2018 in Vallendar, the econometrics research seminar at the UC3M in Madrid, andthe DAGStat Conference 2019 in Munich. 30 ppendix: Proofs

We ﬁrst present some auxiliary lemmas which we require for the proofs of Theorems 1 and2.

Lemma 1.

Under Assumption 1, there exists a k -dimensional standard Brownian motion W ( r ) , such that the following statements hold true:(a) For any ﬁxed m < ∞ , as T → ∞ , √ T (cid:98) rT (cid:99) (cid:88) t =1 x t u t ⇒ σ C / W ( r ) , r ∈ [0 , m ] . (b) lim t →∞ (cid:107) (cid:80) tj =1 x j u j − σ C / W ( t ) (cid:107)√ t = 0 (a.s.) . Proof.

For (a), note that a direct consequence of the functional central limit theorem formultiple time series on the space D ([0 , k given by Theorem 2.1 in Phillips and Durlauf(1986) is that M − / (cid:80) (cid:98) sM (cid:99) t =1 x t u t ⇒ σ C / W ( s ), s ∈ [0 , M → ∞ (see also Lemma 3in Kr¨amer et al. 1988). Then, on the space D ([0 , m ]) k ,1 √ T (cid:98) rT (cid:99) (cid:88) t =1 x t u t = √ m √ M (cid:98) ( r/m ) M (cid:99) (cid:88) t =1 x t u t ⇒ √ mσ C / W ( r/m ) d = σ C / W ( r ) , r ∈ [0 , m ] . To show (b), note that { x t u t } t ∈ N is a stationary and ergodic martingale diﬀerence sequencewith E [ x t u t ] = 0 and E [( x t u t )( x t u t ) (cid:48) ] = σ C . We apply the strong invariance principlegiven by Theorem 3 in Wu et al. (2007). Then,lim t →∞ (cid:107) σ − C − / (cid:80) tj =1 x j u j − W ( t ) (cid:107) t /q (cid:112) ln( t )(ln(ln( t ))) / < ∞ , (a.s.) , where q = min { κ, } (see also Strassen 1967), and the assertion follows from the fact thatlim t →∞ t /q (cid:112) ln( t )(ln(ln( t ))) / / √ t = 0. Lemma 2.

Let { ( x t , u t ) } t ∈ N satisfy Assumption 1, let β t = β for all t ∈ N , and let m ∈ (0 , ∞ ) . Let X t = (cid:80) tj =1 x j w j , Y t = (cid:80) tj =1 x j u j , and Z t = (cid:80) t − j =1 (cid:80) ji =1 j − x i u i . Then,as T → ∞ , sup ≤ t ≤ mT (cid:107) X t − ( Y t − Z t ) (cid:107)√ T = o P (1) , and sup T k let f t = (1 + ( t − − x (cid:48) t C − t − x t ) / bethe denominator of w t . Then, f t w t = y t − x (cid:48) t (cid:98) β t − = u t − x (cid:48) t (cid:16) t − (cid:88) j =1 x j x (cid:48) j (cid:17) − (cid:16) t − (cid:88) j =1 x j u j (cid:17) = u t − x (cid:48) t C − t − (cid:16) t − t − (cid:88) j =1 x j u j (cid:17) . Furthermore, let (cid:101) Y t = (cid:80) tj = k +1 f − j x j u j , and (cid:101) Z t = (cid:80) t − j = k (cid:80) ji =1 j − f − j − x j +1 x (cid:48) j +1 C − j x i u i .Then, X t = (cid:80) tj = k +1 f − j x j ( u j − ( j − − x (cid:48) j C − j − (cid:80) j − i =1 x i u i ) = (cid:101) Y t − (cid:101) Z t . Hence, it remainsto show, that sup ≤ t ≤ mT (cid:107) (cid:101) Y t − Y t (cid:107)√ T = o P (1) , and sup T k } − { T ≤ k } ), which is O P (1), since √ T ( f T −

1) = O P (1), as T → ∞ , and let a t = t − / (cid:80) tj =1 a j x j u j , where (cid:107) a T (cid:107) = O P (1). Furthermore, note that j − / − ( j + 1) − / < j − / . Then, (cid:101) Y t − Y t = t (cid:88) j =1 ( a j x j u j ) j − / = a t + t − (cid:88) j =1 (cid:16) a j j / (cid:2) j − / − ( j + 1) − / (cid:3)(cid:17) < a t + t − (cid:88) j =1 j a j , which implies thatsup ≤ t ≤ mT (cid:107) (cid:101) Y t − Y t (cid:107)√ T < sup ≤ t ≤ mT (cid:16) (cid:107) a t (cid:107)√ T + mT / t − (cid:88) j =1 (cid:107) a j (cid:107) j / (cid:17) = o P (1) , and sup T ξ , such that (cid:107) (cid:101) a j (cid:107) ≤ j − (cid:15) ξ . Thus,sup ≤ t ≤ mT (cid:107) (cid:101) Z t − Z ∗ t (cid:107)√ T ≤ mξT (cid:15) ∞ (cid:88) j =1 j (cid:15) = o P (1) , sup T ζ , such that (cid:107) B ∗ t b ∗ t (cid:107) ≤ t − / − γ ζ , (cid:107) B ∗ t b ∗ t +1 (cid:107) ≤ t − / − γ ζ ,and (cid:107) (cid:80) tj =1 B ∗ j x j +1 u j +1 (cid:107) ≤ t / − γ ζ , which yields (cid:107) Z ∗ t − Z t (cid:107) ≤ ζ (cid:104) ( t − t − / − γ + t − (cid:88) j =1 j / − γ j + 1 + ( t − / − γ (cid:105) ≤ ζ (cid:104) t / − γ + t / − γ/ t − (cid:88) j =1 j γ/ (cid:105) ≤ ζKt / − γ/ for some constant K < ∞ . Consequently,sup ≤ t ≤ mT (cid:107) Z ∗ t − Z t (cid:107)√ T = o P (1) , and sup T
Let W ( r ) be a k -dimensional standard Brownian motion and let B ( r ) be a k -dimensional standard Brownian bridge. Then,(a) W ( r ) − (cid:82) r z − W ( z ) d z d = W ( r ) , for r ≥ ,(b) W ( r/ (1 − r )) d = B ( r ) / (1 − r ) , for r ∈ (0 , .Proof. Let W j ( r ) and B j ( r ) be the j -th component of W ( r ) and B ( r ), respectively. Weshow the identities for each j = 1 , . . . , k , separately. Using Cauchy-Schwarz and Jensen’s33nequalities, we obtain (cid:82) r z − E [ | W j ( z ) | ] d z < ∞ as well as (cid:82) r z − E [ | W j ( r ) W j ( z ) | ] d z < ∞ ,which justiﬁes the application of Fubini’s theorem in the subsequent steps. Since both W j ( r ) and F ( W j ( r )) = W j ( r ) − (cid:82) r z − W j ( z ) d z are Gaussian with zero mean, it remainsto show that their covariance functions coincide. Let w.l.o.g. r ≤ s . Then, E [ F ( W j ( r )) F ( W j ( s ))] − E [ W j ( r ) W j ( s )]= (cid:90) r (cid:90) s E [ W j ( z ) W j ( z )] z z d z d z − (cid:90) s E [ W j ( r ) W j ( z )] z d z − (cid:90) r E [ W j ( s ) W j ( z )] z d z = (2 r + r ln( s ) − r ln( r )) − ( r + r ln( s ) − r ln( r )) − r = 0 , and (a) has been shown. The second result follows from the fact that both processes areGaussian with zero mean and E (cid:20) B j ( r )1 − r B j ( s )1 − s (cid:21) = min { r (1 − s ) , s (1 − r ) } (1 − r )(1 − s ) = min (cid:110) r − r , s − s (cid:111) = E (cid:2) W j ( r − r ) W j ( s − s ) (cid:3) . Lemma 4.

Let { ( x t , u t ) } t ∈ N satisfy Assumption 1, let β t = β for all t ∈ N , and let m ∈ (0 , ∞ ) . Then, as T → ∞ , √ T (cid:98) rT (cid:99) (cid:88) t =1 x t w t ⇒ σ C / W ( r ) , r ∈ [0 , m ] , where W ( r ) is a k -dimensional standard Brownian motion.Proof. From Lemma 2, we have sup r ∈ [0 ,m ] T − / (cid:107) X (cid:98) rT (cid:99) − ( Y (cid:98) rT (cid:99) − Z (cid:98) rT (cid:99) ) (cid:107) = o P (1). Let F ( Y (cid:98) rT (cid:99) ) = Y (cid:98) rT (cid:99) − (cid:82) r z − Y (cid:98) zT (cid:99) d z . Then, lim T →∞ (cid:107) ( Y (cid:98) rT (cid:99) − Z (cid:98) rT (cid:99) ) − F ( Y (cid:98) rT (cid:99) )) (cid:107) = 0, andsup r ∈ [0 ,m ] (cid:107) T − / X (cid:98) rT (cid:99) − F ( T − / Y (cid:98) rT (cid:99) ) (cid:107) = o P (1). Lemma 1(a) and the continuous mappingtheorem imply F ( T − / Y (cid:98) rT (cid:99) ) ⇒ F ( σ C − / W ( r )) = σ C − / F ( W ( r )). Furthermore, fromLemma 3, it follows that F ( W ( r )) d = W ( r ). Consequently, T − / X (cid:98) rT (cid:99) ⇒ σ C / W ( r ). Lemma 5.

Let (cid:107) · (cid:107) M be the induced matrix norm of (cid:107) · (cid:107) . Let h be a R k -valued func-tion of bounded variation, and let { A t } t ∈ N be a sequence of random ( k × k ) matrices with sup r ∈ [0 ,m ] (cid:107) T − (cid:80) (cid:98) rT (cid:99) t =1 ( A t − A ) (cid:107) M = o P (1) , where m ∈ (0 , ∞ ) . Then, as T → ∞ , sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) T (cid:98) rT (cid:99) (cid:88) t =1 ( A t − A ) h ( tT ) (cid:13)(cid:13)(cid:13) = o P (1) . roof. By the application of Abel’s formula of summation by parts, which is given in (18),it follows that (cid:98) rT (cid:99) (cid:88) t =1 ( A t − A ) h ( tT ) = (cid:98) rT (cid:99) (cid:88) t =1 ( A t − A ) h ( (cid:98) rT (cid:99) T ) + (cid:98) rT (cid:99)− (cid:88) t =1 t (cid:88) j =1 ( A j − A )( h ( tT ) − h ( t +1 T )) . The fact that h ( r ) is of bounded variation yieldssup r ∈ [0 ,m ] (cid:107) h ( r ) (cid:107) = O (1) , sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) (cid:98) rT (cid:99)− (cid:88) t =1 tT ( h ( tT ) − h ( t +1 T )) (cid:13)(cid:13)(cid:13) = O (1) . Consequently,sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) T (cid:98) rT (cid:99) (cid:88) t =1 ( A t − A ) h ( (cid:98) rT (cid:99) T ) (cid:13)(cid:13)(cid:13) ≤ sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) T (cid:98) rT (cid:99) (cid:88) t =1 ( A t − A ) (cid:13)(cid:13)(cid:13) M (cid:13)(cid:13)(cid:13) h ( (cid:98) rT (cid:99) T ) (cid:13)(cid:13)(cid:13) = o P (1)and sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) T (cid:98) rT (cid:99)− (cid:88) t =1 t (cid:88) j =1 ( A j − A )( h ( tT ) − h ( t +1 T )) (cid:13)(cid:13)(cid:13) ≤ sup r ∈ [0 ,m ] (cid:98) rT (cid:99)− (cid:88) t =1 tT (cid:13)(cid:13)(cid:13) t t (cid:88) j =1 ( A j − A ) (cid:13)(cid:13)(cid:13) M (cid:13)(cid:13)(cid:13) h ( tT ) − h ( t +1 T ) (cid:13)(cid:13)(cid:13) = o P (1) . Then, by the triangle inequality, the assertion follows.

Proof of Theorem 1

Let w ∗ t = f − t ( y ∗ t − x (cid:48) t (cid:98) β ∗ t − ), which are recursive residuals from a regression without anystructural break, where f t = (1 + ( t − − x (cid:48) t C − t − x t ) / , y ∗ t = x (cid:48) t β + u t , and (cid:98) β ∗ t − = (cid:16) t − (cid:88) j =1 x j x (cid:48) j (cid:17) − (cid:16) t − (cid:88) j =1 x j y ∗ j (cid:17) . Then, y t = x (cid:48) t β t + u t = y ∗ t + T − / x (cid:48) t g ( t/T ), and (cid:98) β t − = (cid:98) β ∗ t − + 1 √ T ( t − C − t − t − (cid:88) j =1 x j x (cid:48) j g ( j/T ) . w t = w ∗ t + f − t T − / x (cid:48) t g ( t/T ) − f − t T − / ( t − − C − t − (cid:80) t − j =1 x j x (cid:48) j g ( j/T ). Wecan decompose the partial sum process as T − / (cid:80) (cid:98) rT (cid:99) t =1 x t w t = S ,T ( r ) + S ,T ( r ) + S ,T ( r ),where S ,T ( r ) = 1 √ T (cid:98) rT (cid:99) (cid:88) t =1 x t w ∗ t , S ,T ( r ) = 1 T (cid:98) rT (cid:99) (cid:88) t =1 f − t x t x (cid:48) t g ( tT ) , (19) S ,T ( r ) = − T (cid:98) rT (cid:99) (cid:88) t =1 f t ( t − x t x (cid:48) t C − t − t − (cid:88) j =1 x j x (cid:48) j g ( jT ) . (20)Let (cid:107) · (cid:107) M be the induced matrix norm of (cid:107) · (cid:107) . Lemma 4 yields S ,T ( r ) ⇒ σ C / W ( r ). Forthe second term, note that, from Assumption 1(a) and the fact that √ T ( f − T −

1) = O P (1),it follows that sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) T (cid:98) rT (cid:99) (cid:88) t =1 ( f − t x t x (cid:48) t − C ) (cid:13)(cid:13)(cid:13) M = o P (1) . (21)Since g ( r ) is piecewise constant and therefore of bounded variation, Lemma 5 yieldssup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) S ( r ) − (cid:90) r C g ( s ) d s (cid:13)(cid:13)(cid:13) = sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) T (cid:98) rT (cid:99) (cid:88) t =1 ( f − t x t x (cid:48) t − C ) g ( tT ) (cid:13)(cid:13)(cid:13) = o P (1) . (22)For the third term, let p ( r ) = 1 (cid:98) rT (cid:99) C − (cid:98) rT (cid:99) (cid:98) rT (cid:99) (cid:88) j =1 x j x (cid:48) j g ( jT ) , p ( r ) = 1 (cid:98) rT (cid:99) C − (cid:98) rT (cid:99) (cid:98) rT (cid:99) (cid:88) j =1 C g ( jT ) , p ( r ) = 1 (cid:98) rT (cid:99) (cid:98) rT (cid:99) (cid:88) j =1 g ( jT ) . From Assumption 1(a), it follows that sup r ∈ [0 ,m ] (cid:107) p ( r ) − p ( r ) (cid:107) M = o P (1). Furthermore,from Lemma 5 and from the fact that sup r ∈ [0 ,m ] (cid:107) (cid:98) rT (cid:99) (cid:80) (cid:98) rT (cid:99) t =1 ( x t x (cid:48) t − C ) (cid:107) M = o P (1), itfollows that sup r ∈ [0 ,m ] (cid:107) p ( r ) − p ( r ) (cid:107) = o P (1). Thus, sup r ∈ [0 ,m ] (cid:107) p ( r ) − p ( r ) (cid:107) = o P (1).Consequently, sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) S ,T ( r ) + 1 T (cid:98) rT (cid:99) (cid:88) t =1 f − t x t x (cid:48) t h ( t − T ) (cid:13)(cid:13)(cid:13) ≤ sup r ∈ [0 ,m ] T (cid:98) rT (cid:99) (cid:88) t =1 (cid:107) f − t x t x (cid:48) t (cid:107) M (cid:107) p ( t − T ) − p ( t − T ) (cid:107) , (23)36hich is o P (1). Since p is a partial sum of a piecewise constant function, it is of boundedvariation, and, together with (21), we can apply Lemma 5. Then,sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) T (cid:98) rT (cid:99) (cid:88) t =1 ( f − t x t x (cid:48) t − C ) p ( t − T ) (cid:13)(cid:13)(cid:13) = o P (1) , which yields sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) S ,T ( r ) + (cid:90) r (cid:90) s s C g ( v ) d v d s (cid:13)(cid:13)(cid:13) = sup r ∈ [0 ,m ] (cid:13)(cid:13)(cid:13) S ,T ( r ) + 1 T C (cid:98) rT (cid:99) (cid:88) t =1 p ( t − T ) (cid:13)(cid:13)(cid:13) + o P (1) = o P (1) . Finally, Slutsky’s theorem implies that S ,T ( r ) + S ,T ( r ) + S ,T ( r ) ⇒ σ C / W ( r ) + σ C h ( r ),which yields Q T ( r ) = (cid:98) σ − C − / T ( S ,T ( r ) + S ,T ( r ) + S ,T ( r )) ⇒ W ( r ) + C / h ( r ) , since (cid:98) σ is consistent for σ (see Kr¨amer et al. 1988). Proof of Theorem 2

Lemma 2 yields sup t ≥ T (cid:107) (cid:80) tj =1 x j w j − (cid:80) tj =1 ( x j u j − j − (cid:80) ji =1 x i u i ) (cid:107)√ t = o P (1) . Let W ( r ) be the k -dimensional standard Brownian motion given by Lemma 1(b). Then, A T = sup t ≥ T (cid:107) (cid:80) tj =1 x j u j − σ C / W ( t ) (cid:107)√ t = o P (1) , Furthermore, (cid:107) (cid:80) tj =1 x t u t − W ( t ) (cid:107) ≤ ξt / − (cid:15) , for some (cid:15) > ξ ,for all t ∈ N . It follows thatsup t ≥ T (cid:107) ( (cid:80) tj =1 x j u j − j − (cid:80) ji =1 x i u i ) − σ C / ( W ( t ) − (cid:80) tj =1 j − W ( j )) (cid:107)√ t ≤ A T + sup t ≥ T t (cid:88) j =1 (cid:107) (cid:80) ji =1 x i u i − W ( j ) (cid:107) j √ t ≤ A T + ξ · (cid:16) sup t ≥ T t (cid:88) j =1 j / − (cid:15) j √ t (cid:17) = o P (1) , t ≥ T t (cid:88) j =1 j / − (cid:15) j √ t ≤ sup t ≥ T t (cid:88) j =1 j (cid:15) T (cid:15) ≤ T (cid:15) ∞ (cid:88) j =1 j (cid:15) = o P (1) . Consequently, sup t ≥ T (cid:107) (cid:80) tj =1 x j w j − σ C − / ( W ( t ) − (cid:80) tj =1 j − W ( j )) (cid:107)√ t = o P (1) . From the fact that T − / W ( t ) d = W ( t/T ) it follows that there exists some k -dimensionalstandard Brownian motion W ∗ ( t ), such thatsup r ≥ (cid:107) T − / (cid:80) (cid:98) rT (cid:99) j =1 x j w j − σ C − / ( W ∗ ( r ) − (cid:80) (cid:98) rT (cid:99) j =1 j − W ∗ ( j/T )) √ t = o P (1) . Moreover, from Lemma 3 and the fact that lim T →∞ (cid:80) (cid:98) rT (cid:99) j =1 j − W ∗ ( j/T ) = (cid:82) r z − W ∗ ( z ) d z ,there exists some k -dimensional standard Brownian motion W ∗∗ ( t ), such thatsup r ≥ (cid:107) T − / (cid:80) (cid:98) rT (cid:99) j =1 x j w j − σ C / W ∗∗ ( r ) (cid:107)√ r = o P (1) , and, therefore, sup r ≥ (cid:107) σ − C − / T − / (cid:80) (cid:98) rT (cid:99) j =1 x j w j − W ∗∗ ( r ) (cid:107)√ r = o P (1) . Since ˆ σ is consistent for σ (see Kr¨amer et al. 1988) and { x t } t ∈ N is ergodic, we have (cid:107) ˆ σ − C − / T − σ − C − / (cid:107) M = o P (1) , where (cid:107) · (cid:107) M denotes the matrix norm induced by (cid:107) · (cid:107) . Consequently,sup r ≥ (cid:107) Q T ( r ) − W ∗∗ ( r ) (cid:107)√ r = o P (1) . Proof of Theorem 3

For any ﬁxed m ∈ (1 , ∞ ), Theorem 1 yields Q T ( r ) ⇒ W ( r ), r ∈ [0 , m ], under H . Then,(a) follows with the continuous mapping theorem. For (b), the continuous mapping theoremimplies that M mon Q,m = sup r ∈ (1 ,m ) (cid:107) Q T ( r ) − Q T (1) (cid:107) d ( r − d −→ sup r ∈ (1 ,m ) (cid:107) W ( r ) − W (1) (cid:107) d ( r − d = sup r ∈ (0 ,m − (cid:107) W ( r ) (cid:107) d ( r ) .

38e transform the supremum to a supremum over a subset of the unit interval. Considerthe bijective function g : (0 , ( m − /m ) → (0 , m −

1) that is given by g ( η ) = η/ (1 − η ).Furthermore, note that W ( g ( η )) d = B ( η ) / (1 − η ), which follows from Lemma 3. Conse-quently, sup r ∈ (0 ,m − (cid:107) W ( r ) (cid:107) d ( r ) = sup η ∈ (0 , m − m ) (cid:107) W ( g ( η )) (cid:107) d ( g ( η )) d = sup η ∈ (0 , m − m ) (cid:107) B ( η ) (cid:107) (1 − η ) d (cid:0) η − η (cid:1) . For the last result, Theorem 2 and Assumption 2 implysup r> (cid:107) Q T ( r ) − Q T (1) (cid:107) d ( r − − sup r> (cid:107) W ( r ) − W (1) (cid:107) d ( r − ≤ sup r> (cid:107) Q T ( r ) − Q T (1) − ( W ( r ) − W (1)) (cid:107) d ( r − ≤ sup r> (cid:107) Q T ( r ) − W ( r ) (cid:107) d ( r −

1) + sup r> (cid:107) Q T (1) − W (1) (cid:107) d ( r − ≤ sup r> (cid:18) (cid:107) Q T ( r ) − W ( r ) (cid:107)√ r · √ rd ( r − (cid:19) + (cid:107) Q T (1) − W (1) (cid:107) · sup r> d ( r − ≤ (cid:18) sup r> √ rd ( r − (cid:19) · (cid:18) sup r> (cid:107) Q T ( r ) − W ( r ) (cid:107)√ r (cid:19) = o P (1)for some k -dimensional standard Brownian motion W ( r ). Then, M mon Q, ∞ = sup r ∈ (1 , ∞ ) (cid:107) Q T ( r ) − Q T (1) (cid:107) d ( r − d −→ sup r ∈ (1 , ∞ ) (cid:107) W ( r ) − W (1) (cid:107) d ( r − g : (0 , → (0 , ∞ ) that is given by g ( η ) = η/ (1 − η ),which yieldssup r ∈ (1 , ∞ ) (cid:107) W ( r ) − W (1) (cid:107) d ( r − d = sup r ∈ (0 , ∞ ) (cid:107) W ( r ) (cid:107) d ( r ) = sup η ∈ (0 , (cid:107) W ( g ( η )) (cid:107) d ( g ( η )) d = sup η ∈ (0 , (cid:107) B ( η ) (cid:107) (1 − η ) d (cid:0) η − η (cid:1) . Proof of Theorem 4

Theorem 1 and the continuous mapping theorem imply that M ret BQ = sup r ∈ (0 , (cid:107) Q T (1) − Q T ( r ) (cid:107) d (1 − r ) d −→ sup r ∈ (0 , (cid:107) W (1) − W ( r ) (cid:107) d (1 − r ) d = sup r ∈ (0 , (cid:107) W ( r ) (cid:107) d ( r ) . roof of Theorem 5 Analogously to the proof of Theorem 3, M ret SBQ d −→ sup r ∈ (0 , sup s ∈ (0 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) , M mon SBQ,m d −→ sup r ∈ (1 ,m ) sup s ∈ (1 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s )follow with Theorem 1 and the continuous mapping theorem. Furthermore, let the function g : (0 , ( m − /m ) → (0 , m −

1) be given by g ( η ) = η/ (1 − η ). With Lemma 3(b), we havesup r ∈ (1 ,m ) sup s ∈ (1 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) d = sup r ∈ (0 ,m − sup s ∈ (0 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s )= sup η ∈ (0 , m − m ) sup s ∈ (0 ,g ( η )) (cid:107) W ( g ( η )) − W ( s ) (cid:107) d ( g ( η ) − s ) = sup η ∈ (0 , m − m ) sup ζ ∈ (0 ,η ) (cid:107) W ( g ( η )) − W ( g ( ζ )) (cid:107) d ( g ( η ) − g ( ζ )) d = sup η ∈ (0 , m − m ) sup ζ ∈ (0 ,η ) (cid:107) B ( η ) / (1 − η ) − W ( ζ ) / (1 − ζ ) (cid:107) d (cid:0) η − η − ζ − ζ (cid:1) = sup η ∈ (0 , m − m ) sup ζ ∈ (0 ,r ) (cid:107) (1 − ζ ) B ( η ) − (1 − η ) B ( ζ ) (cid:107) (1 − η )(1 − ζ ) d (cid:0) η − ζ (1 − η )(1 − ζ ) (cid:1) . Finally, for (c), Theorem 2 and Assumption 2 implysup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) Q T ( r ) − Q T ( s ) (cid:107) d ( r − s ) − sup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) ≤ sup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) Q T ( r ) − Q T ( s ) − ( W ( r ) − W ( s )) (cid:107) d ( r − s ) ≤ sup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) Q T ( r ) − W ( r ) (cid:107) d ( r − s ) + sup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) Q T ( s ) − W ( s ) (cid:107) d ( r − s ) ≤ sup r ∈ (1 , ∞ ) (cid:107) Q T ( r ) − W ( r ) (cid:107) d ( r −

1) + sup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) Q T ( s ) − W ( s ) (cid:107) d ( r − ≤ (cid:18) sup r ∈ (1 , ∞ ) √ rd ( r − (cid:19) · (cid:18) sup r ∈ (1 , ∞ ) (cid:107) Q T ( r ) − W ( r ) (cid:107)√ r (cid:19) = o P (1)for some k -dimensional standard Brownian motion W ( r ). Then, M mon SBQ,m = sup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) Q T ( r ) − Q T ( s ) (cid:107) d ( r − s ) d −→ sup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) . Consider now the bijective function g : (0 , → (0 , ∞ ) that is given by g ( η ) = η/ (1 − η ).Analogously to the derivations above, we obtainsup r ∈ (1 , ∞ ) sup s ∈ (1 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s ) d = sup r ∈ (0 , ∞ ) sup s ∈ (0 ,r ) (cid:107) W ( r ) − W ( s ) (cid:107) d ( r − s )= sup η ∈ (0 , sup ζ ∈ (0 ,η ) (cid:107) W ( g ( η )) − W ( g ( ζ )) (cid:107) d ( g ( η ) − g ( ζ )) d = sup η ∈ (0 , sup ζ ∈ (0 ,r ) (cid:107) (1 − ζ ) B ( η ) − (1 − η ) B ( ζ ) (cid:107) (1 − η )(1 − ζ ) d (cid:0) η − ζ (1 − η )(1 − ζ ) (cid:1) . roof of Theorem 6 Adopting the notation of the local break in Theorem 1, we have β t = β + T − / g ( t/T )with g ( t/T ) = T / δ { t ≥ T ∗ } . Unlike in Theorem 1, the alternative does not converge to thenull as the sample size grows. Following equations (19)–(23), we have1 T (cid:98) rT (cid:99) (cid:88) t =1 x t w t = 1 T / (cid:0) S ,T ( r ) + S ,T ( r ) + S ,T ( r ) (cid:1) , where sup r ∈ [0 , (cid:107) T − / S ,T ( r ) (cid:107) = o P (1), andsup r ∈ [0 , (cid:13)(cid:13)(cid:13)(cid:13) S ,T ( r ) + S ,T ( r ) − C (cid:16) (cid:90) r g ∗ ( z ) d z − (cid:90) r (cid:90) z z g ∗ ( v ) d v d z (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) = o P (1) , where g ∗ ( r ) = δ { r ≥ τ ∗ } . Note that (cid:90) r g ∗ ( z ) d z − (cid:90) r (cid:90) z z g ∗ ( v ) d v d z = δ (cid:90) r (cid:16) { s ≥ τ ∗ } − (cid:90) s s { v ≥ τ ∗ } (cid:17) d s = δ (cid:90) rτ ∗ (cid:16) − s − τ ∗ s (cid:17) d s = δ (cid:90) rτ ∗ s d s = τ ∗ δ (cid:0) ln( r ) − ln( τ ∗ ) (cid:1) { r ≥ τ ∗ } , which implies that σT − / Q T ( r ) ⇒ τ ∗ C / δ (cid:0) ln( r ) − ln( τ ∗ ) (cid:1) { r ≥ τ ∗ } . Then, (cid:98) τ ret = 1 T · argmax ≤ t ≤ T (cid:13)(cid:13)(cid:13) (cid:98) σ √ T √ T − t + 1 (cid:0) Q T (1) − Q T ( t +1 T ) (cid:1)(cid:13)(cid:13)(cid:13) , (cid:98) τ mon = 1 T · argmax T
Journal of Time Series Econometrics , 10:1941–1928.Andrews, D. W. (1993). Tests for parameter instability and structural change with unknownchange point.

Econometrica , 61:821–856.Aue, A., Horv´ath, L., Huˇskov´a, M., and Kokoszka, P. (2006). Change-point monitoring inlinear models.

Econometrics Journal , 9:373–403.Bai, J. (1997). Estimation of a change point in multiple regression models.

Review ofEconomics and Statistics , 79:551–563.Bauer, P. and Hackl, P. (1978). The use of MOSUMS for quality control.

Technometrics ,20:431–436.Berkes, I., Liu, W., and Wu, W. B. (2014). Koml´os–major–tusn´ady approximation underdependence.

The Annals of Probability , 42:794–817.Billingsley, P. (1999).

Convergence of probability measures, second edition . New York:Wiley.Brown, R. L., Durbin, J., and Evans, J. M. (1975). Techniques for testing the constancyof regression relationships over time.

Journal of the Royal Statistical Society. Series B ,37:149–192.Chu, C.-S. J., Hornik, K., and Kaun, C.-M. (1995). Mosum tests for parameter constancy.

Biometrika , 82:603–617.Chu, C.-S. J., Stinchcombe, M., and White, H. (1996). Monitoring structural change.

Econometrica , 64:1045–65.Dette, H. and G¨osmann, J. (2019). A likelihood ratio approach to sequential change pointdetection for a general class of parameters.

Journal of the American Statistical Associa-tion , 0:1–17. 42remdt, S. (2015). Page’s sequential procedure for change-point detection in time seriesregression.

Statistics , 49:128–155.G¨osmann, J., Kley, T., and Dette, H. (2019). A new approach for open-end sequentialchange point monitoring. https://arxiv.org/abs/1906.03225 .Hansen, B. E. (1992). Testing for parameter instability in linear models.

Journal of PolicyModeling , 14:517–533.Horv´ath, L. (1995). Detecting changes in linear regressions.

Statistics: A Journal ofTheoretical and Applied Statistics , 26:189–208.Horv´ath, L., Huˇskov´a, M., Kokoszka, P., and Steinebach, J. (2004). Monitoring changes inlinear models.

Journal of Statistical Planning and Inference , 126:225–251.Kirch, C. and Kamgaing, J. T. (2015). On the use of estimating functions in monitoringtime series for change points.

Journal of Statistical Planning and Inference , 161:25–49.Koml´os, J., Major, P., and Tusn´ady, G. (1975). An approximation of partial sums ofindependent rv’-s, and the sample df. i.

Zeitschrift f¨ur Wahrscheinlichkeitstheorie undverwandte Gebiete , 32:111–131.Kr¨amer, W., Ploberger, W., and Alt, R. (1988). Testing for structural change in dynamicmodels.

Econometrica , 56:1355–1369.Kuan, C.-M. and Hornik, K. (1995). The generalized ﬂuctuation test: A unifying view.

Econometric Reviews , 14:135–161.Leisch, F., Hornik, K., and Kuan, C.-M. (2000). Monitoring structural changes with thegeneralized ﬂuctuation test.

Econometric Theory , 16:835–854.Nyblom, J. (1989). Testing for the constancy of parameters over time.

Journal of theAmerican Statistical Association , 84:223–230.Page, E. S. (1954). Continuous inspection schemes.

Biometrika , 41:100–115.43erron, P. (2006). Dealing with structural breaks.

Palgrave handbook of econometrics ,1:278–352.Phillips, P. C. and Durlauf, S. N. (1986). Multiple time series regression with integratedprocesses.

The Review of Economic Studies , 53:473–495.Ploberger, W. and Kr¨amer, W. (1990). The local power of the cusum and cusum of squarestests.

Econometric Theory , 6:335–347.Ploberger, W. and Kr¨amer, W. (1992). The cusum test with ols residuals.

Econometrica ,60:271–285.Ploberger, W., Kr¨amer, W., and Kontrus, K. (1989). A new test for structural stability inthe linear regression model.

Journal of Econometrics , 40:307–318.Robbins, H. and Siegmund, D. (1970). Boundary crossing probabilities for the wienerprocess and sample sums.

The Annals of Mathematical Statistics , 41:1410–1429.Sen, P. K. (1982). Invariance principles for recursive residuals.

The Annals of Statistics ,10:307–312.Strassen, V. (1967). Almost sure behavior of sums of independent random variables andmartingales.

Proceedings of the Fifth Berkeley Symposium on Mathematical Statisticsand Probability , 2:315–343.Wied, D. and Galeano, P. (2013). Monitoring correlation change in a sequence of randomvariables.

Journal of Statistical Planning and Inference , 143:186–196.Wu, W. B. et al. (2007). Strong invariance principles for dependent random variables.

TheAnnals of Probability , 35:2294–2320.Zeileis, A. (2004). Alternative boundaries for cusum tests.

Statistical Papers , 45:123–131.Zeileis, A., Leisch, F., Kleiber, C., and Hornik, K. (2005). Monitoring structural change indynamic econometric models.

Related Researches

Optimal transportation and the falsifiability of incompletely specified economic models

by Ivar Ekeland

A note on global identification in structural vector autoregressions

by Emanuele Bacchiocchi

Duality in dynamic discrete-choice models

by Khai Xiang Chiong

A test of non-identifying restrictions and confidence regions for partially identified parameters

by Alfred Galichon

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

by Falco J. Bargagli Stoffi

Extreme dependence for multivariate data

by Damien Bosc

Dilation bootstrap

by Alfred Galichon

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

by Federico A. Bugni

Identification of Matching Complementarities: A Geometric Viewpoint

by Alfred Galichon

Hypothetical bias in stated choice experiments: Part I. Integrative synthesis of empirical evidence and conceptualisation of external validity

by Milad Haghani

Hypothetical bias in stated choice experiments: Part II. Macro-scale analysis of literature and effectiveness of bias mitigation methods

by Milad Haghani

The Econometrics and Some Properties of Separable Matching Models

by Alfred Galichon

Discretizing Unobserved Heterogeneity

by Stéphane Bonhomme Thibaut Lamadon Elena Manresa

Permutation Tests at Nonparametric Rates

by Marinho Bertanha

General Bayesian time-varying parameter VARs for predicting government bond yields

by Manfred M. Fischer

Quasi-maximum likelihood estimation of break point in high-dimensional factor models

by Jiangtao Duan

A Control Function Approach to Estimate Panel Data Binary Response Model

by Amaresh K Tiwari

Set Identification in Models with Multiple Equilibria

by Alfred Galichon

Inference in Incomplete Models

by Alfred Galichon

Non-stationary GARCH modelling for fitting higher order moments of financial series within moving time windows

by Luke De Clerk

Bridging factor and sparse models

by Jianqing Fan

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

by Charles F Manski

A Novel Multi-Period and Multilateral Price Index

by Consuelo Rubina Nava

Cointegrated Solutions of Unit-Root VARs: An Extended Representation Theorem

by Mario Faliva

Estimation and Inference by Stochastic Optimization: Three Examples

by Jean-Jacques Forneron

«

1

2

3

4

»

Submitted on 5 Mar 2020 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar