[PDF] Moving sum data segmentation for stochastics processes based on invariance

Abstract

The segmentation of data into stationary stretches also known as multiple change point problem is important for many applications in time series analysis as well as signal processing. Based on strong invariance principles, we analyse data segmentation methodology using moving sum (MOSUM) statistics for a class of regime-switching multivariate processes where each switch results in a change in the drift. In particular, this framework includes the data segmentation of multivariate partial sum, integrated diffusion and renewal processes even if the distance between change points is sublinear. We study the asymptotic behaviour of the corresponding change point estimators, show consistency and derive the corresponding localisation rates which are minimax optimal in a variety of situations including an unbounded number of changes in Wiener processes with drift. Furthermore, we derive the limit distribution of the change point estimators for local changes - a result that can in principle be used to derive confidence intervals for the change points.

Full PDF

MMoving sum data segmentation forstochastics processes based on invariance

Claudia Kirch ∗ , Philipp Klein † January 13, 2021

Abstract

The segmentation of data into stationary stretches also known as mul-tiple change point problem is important for many applications in timeseries analysis as well as signal processing. Based on strong invarianceprinciples, we analyse data segmentation methodology using moving sum(MOSUM) statistics for a class of regime-switching multivariate processeswhere each switch results in a change in the drift. In particular, thisframework includes the data segmentation of multivariate partial sum,integrated diﬀusion and renewal processes even if the distance betweenchange points is sublinear. We study the asymptotic behaviour of thecorresponding change point estimators, show consistency and derive thecorresponding localisation rates which are minimax optimal in a varietyof situations including an unbounded number of changes in Wiener pro-cesses with drift. Furthermore, we derive the limit distribution of thechange point estimators for local changes – a result that can in principlebe used to derive conﬁdence intervals for the change points.

Keywords:

Data segmentation, Change point analysis, moving sum statistics, mul-tivariate processes, invariance principle, regime-switching processes

MSC2020 classiﬁcation: ∗ Institute for Mathematical Statistics, Department of Mathematics, Otto-von-Guericke Univer-sity Magdeburg, Center for Behavioral Brain Sciences (CBBS); [email protected] † Institute for Mathematical Statistics, Department of Mathematics, Otto-von-Guericke Univer-sity Magdeburg, [email protected] a r X i v : . [ s t a t . M E ] J a n Introduction

The detection and localisation of structural breaks has a long tradition in statistics,dating back to Page 1954. Nevertheless, there is still a large maybe even increasinginterest in this topic surely also because change point analysis is broadly applicablein a number of ﬁelds such as neurophysiology (see Messer et al. 2014), genomics(compare Olshen et al. 2004, Niu and Zhang 2012, Li, Munk, and Sieling 2016, Chanand Chen 2017), ﬁnance (Aggarwal, Inclan, and Leal 1999, Cho and Fryzlewicz 2012),astrophysics (see Fisch, Eckley, and Fearnhead 2018) or oceanographics (Killick et al.2010).A large amount of research deals with the detection of changes in univariate timeseries in particular changes in the mean (compare Cs¨org¨o and Horv`ath 1997 for anoverview) where recently also applications to continuous time stochastic processes,functional or high-dimensional panel data (see e.g. Horv´ath and Rice 2014). However,extensions to the multiple change point problem that aims at segmenting the datainto stationary stretches beyond changes in the mean in time series data are muchmore scarce.Generally, data segmentation methods can roughly be split up in two approaches:The ﬁrst approach ﬁrst introduced by Yao 1988 in the context of i.i.d. normallydistributed data using the Schwarz’ criterion aims at optimizing suitable objectivefunctions. K¨uhn 2001 extended this approach to processes in a setting closely relatedto the one in this paper albeit only allowing for univariate processes and a ﬁnitenumber of change points. Further approaches include e. g. least-squares (Yao and Au1989) or the quasi-likelihood-function (Braun, Braun, and M¨uller 2000). Generally,such approaches are computationally expensive, such that there is another body ofwork proposing fast algorithms e.g. using dynamic programming (Killick, Fearnhead,and Eckley 2012, Maidstone et al. 2017).A second approach is based on hypothesis testing, where e.g. binary segmentationintroduced by Vostrikova 1981 recursively uses tests constructed for the at-most-one-change situation. This arises several problems including the observation that detec-tion power can be poor if the set of change points is unfavourable, such that severalextensions have been proposed in the literature such as circular binary segmentation(Olshen et al. 2004) or wild binary segmentation (Fryzlewicz 2014).

Connection to existing work

Another class of test-based methods uses moving sum (MOSUM) statistics whichwere ﬁrst introduced by Bauer and Hackl 1980. They are particularly useful in the2ontext of localising multiple change points and recently have broadly been used forthe detection and the estimation of change points, see e.g. Yau and Zhao 2016 forchanges in autoregressive time series, Eichinger and Kirch 2018 in a hidden Markovframework and Cho, Kirch, and Meier 2019 as well as Cho and Kirch 2019 whoproposed a two-stage data segmentation procedure based on multiscale MOSUMstatistics. This work extends the results of Eichinger and Kirch 2018 to a moregeneral setting including multivariate mean changes, changes in diﬀusion as well asin renewal processes. Our results also lays the foundations for the analysis of atwo-step procedure as in Cho and Kirch 2019.Messer et al. 2014 propose a bottom-up-approach combining several moving-sum statistics to obtain change point estimators in univariate renewal processes.Our work extends these results in several ways: First, Messer et al. 2014 do notshow consistency of the change point estimators neither do they derive localisationrates, which is one of the main results of this work. Furthermore, in addition toresults for MOSUM procedures with linear bandwidth (in the sample size) as inMesser et al. 2014 we obtain results for sublinear bandwidths allowing in particular toobtain consistent estimators in situations where the distance between change points issublinear. Additionally, we go beyond the univariate case including some multivariatepoint processes based on renewal processes in our analysis. Sequential change pointmethodology for renewal processes has been proposed by Gut and Steinebach 2002and Gut and Steinebach 2009, for diﬀusion processes by Mihalache 2011.We analyse a more general model of regime-switching multivariate processes in-cluding multivariate partial sum, renewal as well as diﬀusion processes. We requirethe processes to fulﬁll a multivariate invariance principle, where processes switch(possibly with a number increasing to inﬁnity with increasing sample size) betweenﬁnitely many regimes with each switch resulting in a change in the drift. A univariateversion of that model with at-most-one change point has been considered by Horv´athand Steinebach 2000 and K¨uhn and Steinebach 2002. A univariate version for ﬁnitelymany change points has been considered by K¨uhn 2001 where consistency for thenumber of change points has been shown. Those results are now extended to includeMOSUM methodology for the estimation of a multiple (possibly unbounded) num-ber of change points in a multivariate setting, where we achieve a minimax optimalseparation rate in addition to a minimax optimal localisation rate (for the changepoint estimators) in case of a bounded number of change points as well as for Wienerprocesses with drift (see Remark 4.2 below).3 rganization of the material

In Subsection 2.1, we introduce the multiple change point model we consider followedby some examples of processes fulﬁlling the model in Subsection 2.2. In Section 3,we describe how to estimate change points based on MOSUM statistics: First, weintroduce the MOSUM statistics in 3.1, before presenting the estimators for thestructural breaks in 3.2. In 3.3 we derive some asymptotic results for the MOSUMstatistics that are required for threshold selection and can also be used in a testingcontext. In Section 4 we show that the corresponding data segmentation procedure isconsistent. Finally, we derive the localisation rates in addition to the correspondingasymptotic distribution of the change point estimators for local changes. In Section5, we present some results from a small simulation study. The proofs can be foundin Appendix A.

In this section we introduce the general multiple change model for which we derive thetheoretic results. In particular, this model includes changes in multivariate renewalprocesses as a special case which was the original motivation for this work.

Consider

P < ∞ time-continuous p -dimensional stochastic processes { R ( j ) t,T : 0 ≤ t ≤ T } with (unknown) drift ( µ ( j ) T · t ) and (unknown) covariance ( Σ j,T · t ) fulﬁlling thefollowing joint invariance principle.The observed process is then assumed to switch between these P processes(states). Assumption 2.1.

Denote the joint process by R t,T = (cid:16) R (1) t,T , . . . , R ( P ) t,T (cid:17) as well the joint drift by µ T = (cid:16) µ (1) T , . . . , µ ( P ) T (cid:17) , where indicates the matrix transpose. For every T > there exist ( p · P ) -dimensional Wiener processes W t,T with covariance matrix Σ T and Σ ( i ) T = ( Σ T ( l, k )) l,k = p ( i − ,...,p i with k Σ ( i ) T k = O (1) , k Σ ( i ) T − k = O (1) , uch that, possibly after a change of probability space, it holds that for some sequence ν T → ≤ t ≤ T k e R t,T − W t,T k = sup ≤ t ≤ T k ( R t,T − µ T t ) − W t,T k = O P (cid:16) T ν T (cid:17) , where e R t,T = R t,T − µ T t denotes the centered process. If these P processes are independent, which is a reasonable assumption in aswitching context, the joint invariance principle reduces to the validity of an invari-ance principle for each process.The assumption on the norm of the covariance matrices is equivalent to thesmallest eigenvalue of Σ ( i ) T being bounded in addition to being bounded away fromzero (both uniformly in T ). In many situations, the covariance matrices will notdepend on T , in which case this assumption is automatically fulﬁlled under positivedeﬁniteness. The convergence rate ν T in the invariance principle typically dependson the number of moments that exist. Roughly speaking, the more moments theoriginal process has, the faster ν T converges.We now observe a process Z t,T with increments switching between the aboveprocesses at some unknown change points 0 = c < c < . . . < c q T < c q T +1 = T ,where q T can be bounded or unbounded. More precisely, we observe for c ‘ < t ≤ c ‘ +1 Z t,T = (cid:16) R ( c ‘ +1) t,T − R ( c ‘ +1) c ‘ ,T (cid:17) + ‘ X j =1 (cid:16) R ( c j ) c j ,T − R ( c j ) c j − ,T (cid:17) . (2.1)The upper index ( c j ) at the process R · ,T indicates (with a slight abuse of notation)the active regime between the ( j − j -th change point, from which theincrements come in that stretch. Because we concentrate on the detection of changesin the drift, we need to assume that the drift changes between two neighboringregimes, i.e. d i,T := µ ( c i +1) T − µ ( c i ) T = 0 for all i = 1 , . . . , q T , where d i,T is bounded but we allow for d i,T → T for all above quantities except q T in the following except in situations where it helpsclarify the argument. The aim of this paper is to estimate the number and locationof the change points and prove consistency of the estimator for the number of changepoints in addition to deriving localisation rates for the change point estimators.5he corresponding univariate model with at most one change was ﬁrst consid-ered by Horv´ath and Steinebach 2000 and extended to a gradual change setting bySteinebach 2000. Kirch and Steinebach 2006 prove validity for corresponding per-mutation tests. Furthermore, Gut and Steinebach 2002 develop sequential changepoint tests and analyse the corresponding stopping time (Gut and Steinebach 2009).A related univariate multiple change situation with a bounded number of changepoints has been considered by K¨uhn and Steinebach 2002, who propose to use aSchwarz information criterion for change point estimation. However, this method-ology is computationally expensive with quadratic computational complexity, whichis one of the reasons why we propose an alternative methodology based on a single-bandwidth moving sum (MOSUM) statistic in order to estimate the change points.We will show that the rescaled change point estimators are consistent and derive thecorresponding localisation rates. In this section, we give three important examples fulﬁlling the above model assump-tions, namely partial sum-processes, renewal processes as well as integrals of diﬀusionprocesses including Ornstein-Uhlenbeck and Wiener processes with drift. A detailedanalysis of MOSUM procedures for detecting changes in (univariate) renewal pro-cesses extending the work by Messer et al. 2014 was the original motivation for thiswork and is covered by this much broader framework.

This ﬁrst example extends the classical multiple changes in the mean model:Let X ( i )1 , X ( i )2 , . . . be a time series with E h X ( i )1 i = 0 and Cov h X ( i )1 i = I p and all i = 1 , . . . , P . Let R ( i ) t = b t c X j =1 (cid:16) µ ( i ) + Σ ( i ) T / X ( i ) j (cid:17) . The corresponding process fulﬁlls Assumption 2.1 in a wide range of situations.For example, Einmahl 1987 shows the validity in the case that X , X , . . . with X j = (cid:16) X (1) j , . . . , X ( P ) j (cid:17) are i.i.d. with E (cid:2) k X k δ (cid:3) < ∞ for some δ >

0. Additionally,Kuelbs and Philipp 1980 state an invariance principle for mixing random vectors inTheorem 4, additionally there are many corresponding univariate results under manydiﬀerent weak-dependency formulations. 6or X ( i ) = X (1) (and Σ ( i ) = Σ (1) ) for all i , then we are back to the classicalmultiple mean change problem that has been considered in many papers in particularfor the univariate situation, see e.g. the recent survey papers by Fearnhead and Rigaill2020 or Cho and Kirch 2020. The second example aims at ﬁnding structural breaks in the rates of renewal andsome related point processes:We consider P independent sequences of p -dimensional point processes that arerelated to renewal processes in the following way: For each i = 1 , . . . , P we startwith ˜ p ≥ p independent renewal processes e R ( i ) t,j , j = 1 , . . . , ˜ p , from which we derivea p -dimensional point process R ( i ) t = B ( i ) (cid:16) e R ( i ) t, , . . . , e R ( i ) t, ˜ p (cid:17) , where B ( i ) is a ( p × ˜ p ) -matrix with non-negative integer-valued entries. By Lemma 4.2 in Steinebach andEastwood 1996 Assumption 2.1 is fulﬁlled for a block-diagonal Σ T with Σ ( i ) T = B ( i ) D (cid:18) σ ( i ) µ ( i ) (cid:19) B ( i ) , with D (cid:18) σ ( i ) µ ( i ) (cid:19) = diag (cid:18) σ ( i ) µ ( i ) , . . . , σ p ( i ) µ p ( i ) (cid:19) , where µ j ( i ) and σ j ( i ) are the mean and variance of the corresponding inter-eventtimes. Steinebach and Eastwood 1996 and Csenki 1979 consider ˜ p = p but use inter-event times that are dependent for j = 1 , . . . , p . In such a situation, the invarianceprinciple in Assumption 2.1 still holds if the intensities are the same across compo-nents with Σ ( i ) T = Σ ( i )IET /µ ( i ), where Σ ( i )IET is the covariance of the vector of inter-eventtimes – a setting that we adopt in the simulation study. If the intensities diﬀer, thenby Steinebach and Eastwood 1996 an invariance principle towards a Gaussian processcan still be obtained, but this is no longer a multivariate Wiener process. While eachcomponent is a Wiener process, the increments from one component may depend onthe past of another. Many of the below results can still be derived in such a situation,however, such a model does not seem to be very realistic for most applications asthe stochastic behavior of the increments of one component depends on the laggedbehavior of the other components, where the lag increases with time. While a laggeddependence is realistic in many situations, in most situations one would expect thislagged-dependence to be constant across time.Messer et al. 2014 consider this model for univariate renewal processes with vary-ing variance. They propose a multiscale procedure based on MOSUM statistics re-lated to those we will discuss in the next section using linear bandwidths. In Messer7t al. 2017, they extend the procedure to processes with weak dependencies. Theyshow convergence in distribution of the MOSUM statistics to functionals of Wienerprocesses similar to the results that we obtain and analyze the behaviour of the signalterm in Messer and Schneider 2017. However, they have not derived any consistencyresults for the change point estimators. In this paper, we extend their results tosublinear bandwidths and prove the consistency of the corresponding estimators aswell as their localisation rates. Clearly, switching between independent (or components of a multivariate) Brownianmotion with drift is included in this framework. Additionally, Heunis 2003 and Miha-lache 2011 derive invariance principles in the context of diﬀusion processes includingOrnstein-Uhlenbeck processes among others. Let ( X t ) t ≥ be a stochastic process in R N satisfying a stochastic diﬀerential equation d X t = µ ( X t ) dt + Σ ( X t ) d B t with respect to an n -dimensional standard Wiener process ( B t ) t ≥ and let µ , Σ beglobally Lipschitz-continuous. Under some conditions on f : R N → R p , as givenby Heunis 2003, relating to µ , Σ , which in particular guarantee that the function f applied to the (invariant) diﬀusion results in a centered process, there exists a p -dimensional Wiener process ( W t ) t ≥ and some η > (cid:13)(cid:13)(cid:13)(cid:13)Z T f ( X s ) ds − W T (cid:13)(cid:13)(cid:13)(cid:13) = O (cid:0) T / − η (cid:1) , where ( X t ) t ≥ either is a solution to the SDE with ﬁxed starting value X = y or astrictly stationary solution with respect to an invariant distribution.Furthermore, in the case of a one-dimensional stochastic diﬀusion process, Mihalache2011 showed for some L -functions fulﬁlling constraints depending on µ , Σ that thereexists a strong invariance principle for the integrals of diﬀusion processes with a rateof O (( T log T ) / √ log T ). Now, we are ready to introduce a MOSUM-based data segmentation procedure forstochastic processes following the above model:8 .1 Moving sum statistics

At every change point, the drift of the process that is active to the left of that changepoint is diﬀerent from the drift of the process that is active to the right of the changepoint. Consequently, the increment of the process R to the left will systematicallydiﬀer from the one to the right. On the other hand, in a stationary stretch awayfrom any change point both increments will be approximately the same as they areestimating the same drift. It is this observation that gives rise to the following movingsum (MOSUM) statistic that is based on the moving diﬀerence of increments withbandwidth h = h T M t = M t,T,h T ( Z ) = 1 √ h [( Z t + h − Z t ) − ( Z t − Z t − h )]= 1 √ h ( Z t + h − Z t + Z t − h ) . (3.1)If there is no change, then this diﬀerence will ﬂuctuate around 0. On the other handclose to a change point, this diﬀerence will be diﬀerent from 0. Ideally, the bandwidthshould be chosen to be as large as possible (to get a better estimate obtained froma larger ’eﬀective sample size’ of the order h ). On the other hand, the incrementsshould not be contaminated by a second change as this can lead to situations wherethe change point can no longer be reliably localised by the signal. Indeed, we needthe following assumptions on the bandwidth for a change to be detectable: Assumption 3.1.

For ν T as in Assumption (2.1) the bandwidth h < T / fulﬁlls ν T T log Th → . Furthermore, it isolates the i -th change point in the sense of h ≤

12 ∆ i , where ∆ i = min( c i +1 − c i , c i − c i − ) . (3.2) Additionally, the signal needs to be large enough to be detectable by this bandwidth,i.e. k d i k h log (cid:0) Th (cid:1) → ∞ . (3.3)Combining (3.2) and (3.3) shows that – with an appropriate bandwidth h –changes are detectable as soon as k d i k ∆ i log (cid:16) T ∆ i (cid:17) → ∞ . (3.4)9n case of the classical mean change model as in Subsection 2.2.1 this is known to bethe minimax-optimal separation rate that cannot be improved (see Proposition 1 ofArias-Castro, Candes, and Durand 2011).The assumption on the distance of the ﬁrst and last change point to the boundaryof the process in (3.2) can be relaxed as no boundary eﬀects can occur there. The MOSUM statistic M t = m t + Λ t as in (3.1) decomposes into a piecewise linearsignal term m t = m t,h,T and a centered noise term Λ t = Λ t,h,T with √ h m t =  ( h − t + c i ) d i , for c i < t ≤ c i + h, , for c i + h < t ≤ c i +1 − h, ( h + t − c i +1 ) d i +1 , for c i +1 − h < t ≤ c i +1 , (3.5) √ h Λ t = √ h Λ t ( e R ) (3.6)=  e R ( c i +1 ) t + h − e R ( c i +1 ) t + e R ( c i +1 ) c i − e R ( c i ) c i + e R ( c i ) t − h , for c i < t ≤ c i + h, e R ( c i +1 ) t + h − e R ( c i +1 ) t + e R ( c i +1 ) t − h , for c i + h < t ≤ c i +1 − h, e R ( c i +2 ) t + h − e R ( c i +2 ) c i +1 + e R ( c i +1 ) c i +1 − e R ( c i +1 ) t + e R ( c i +1 ) t − h , for c i +1 − h < t ≤ c i +1 , where e R t := R t − t µ for i = 0 , . . . , q T and the upper index c j denotes the activeregime between the ( j − j -th change point (with a slight abuse of notation).The signal term is a piecewise linear function that takes its extrema at the changepoints and is 0 outside h -intervals around the change points. Additionally, the noiseterm is asymptotically negligible compared to the signal term (see Theorem 3.1 forthe corresponding theoretic statement and Figure 3.1 for an illustrative example).These observations motivate the following data segmentation procedure, thatconsiders local extrema that are big enough (in absolute value) as change pointestimators:To this end, a suitable threshold β = β h,T is needed to deﬁne signiﬁcant timepoints, where a point t ∗ is signiﬁcant if M t ∗ b A − t ∗ M t ∗ ≥ β (3.7)with b A t ∗ is some symmetric positive deﬁnite matrix that may depend on the datafulﬁlling 10 c T−h T − T=100 c T−h T − T=1000 c T−h T − T=10000

Figure 3.1: Univariate MOSUM statistic with T = 100 , ,

10 000 (from left toright), where the noise term (ﬂuctuating around the signal) becomes smaller andsmaller relative to the signal term.

Assumption 3.2. sup h ≤ t ≤ T − h (cid:13)(cid:13)(cid:13) b A − t,T (cid:13)(cid:13)(cid:13) = O P (1) , sup i =1 ,...,q T sup | t − c i |≤ h (cid:13)(cid:13)(cid:13) b A t,T (cid:13)(cid:13)(cid:13) = O P (1) . A good (non data-driven) choice fulﬁlling this assumption is given by Σ t = Σ t,T = Σ ( c i ) T (3.8)for c i − < t ≤ c i , which guarantees scale-invariance of the procedure and allowsfor nicely interpretable thresholds (see Section 3.3). The latter remains true forestimators as long as they fulﬁllsup i =1 ,...,q T sup | t − c i | >h (cid:13)(cid:13)(cid:13) b Σ − / t,T − Σ − / t (cid:13)(cid:13)(cid:13) = o P (cid:18) log Th (cid:19) − ! (3.9)in addition to the above boundedness assumptions. In particular, this permits localestimators that are consistent only away from change points but contaminated by thechange in a local environment thereof. The latter is typically the case for covarianceestimators, think e.g. of the sample variance contaminated by a change point. Inorder to not reduce detection power in small samples, it is beneﬁcial if the estimatoris additionally consistent directly at the change point, which is also achievable (seee.g. Eichinger and Kirch 2018). 11igure 3.2: In the upper panel, the observed event times of a univariate renewalprocess with 3 change points (i.e. 4 stationary segments) are displayed (where theplot needs to be read like a text: It starts in the upper row on the left, then continuesin the ﬁrst row and jumps to the second row and so on). The grey and white regionsmark the estimated segmentation of the data while the red intervals mark the truesegmentation.In the lower panel, the corresponding MOSUM statistic with (relative) bandwidth h/T = 0 .

07 is displayed. The grey areas are the regions where the threshold ( α = 0 . signiﬁcant ). The true change points are indicated by the red dashedlines. The choice of the threshold β will be discussed in the next section, where we canmake use of asymptotic considerations in the no-change situation.Typically, there are intervals of signiﬁcant points (due to the continuity of thesignal) such that only local extrema of such intervals actually indicate a changepoint. To deﬁne what a local extremum is, we require a tuning parameter 0 < η < t ∗ is a local extremum if it maximizes the absolute MOSUM statistic within its12 h -environment, i.e. if t ∗ = min (cid:26) argmax t ∗ − ηh ≤ t ≤ t ∗ + ηh k M t k . (cid:27) (3.10)The threshold β distinguishes between signiﬁcant and spurious local extrema thatare purely associated with the noise term. The set of all signiﬁcant local extrema isthe set of change point estimators with its cardinality an estimator for the numberof the change points.Figure 3.2 shows an example illustrating these ideas: Away from the change pointsthe MOSUM statistic ﬂuctuates around 0 (within the white area that is beneath thethreshold in absolute value) while it falls within the grey area close to the changepoints – making corresponding local extrema signiﬁcant. Furthermore, the statisticdoes not need to return to the white area in order to have all changes estimated,as can be seen between the ﬁrst and second change point. This is one of the majoradvantages of the η -criterion based on signiﬁcant local maxima as described here (incomparison to the (cid:15) -criterion originally investigated by Eichinger and Kirch 2018 inthe context of mean changes, see also the discussion in Cho, Kirch, and Meier 2019).Nevertheless, results for the (cid:15) -criterion can be obtained along the lines of our proofsbelow. As pointed out above we need to choose a threshold β = β h,T that can distinguishbetween signiﬁcant and spurious local extrema. The following theorem gives themagnitudes of signal as well as noise terms: Theorem 3.1.

Let the Assumptions 2.1, 3.1 and 3.2 hold.(a) For the signal m t with c i − h < t < c i + h , it holds m t b A − t m t ≥ k b A t k ( h − | t − c i | ) h k d i k . At other time points the noise term is equal to zero.(b) For the noise term it holds for q T = 0 , i.e. in the no-change situation(i) for a linear bandwidth h = γT with < γ < / γT ≤ t ≤ T − γt Λ t Σ − T Λ t −→ sup γ ≤ s ≤ − γ γ ( B s + γ − B s + B s − γ ) ( B s + γ − B s + B s − γ ) , where B denotes a multivariate standard Wiener process.In particular, the squared noise term is of order O P (1) in this case.(ii) for a sublinear bandwidth h/T → but Assumption 3.1 fulﬁlled, it holdsthat a (cid:18) Th (cid:19) sup h ≤ t ≤ T − h q Λ t Σ − T Λ t − b (cid:18) Th (cid:19) D −→ E, where E follows a Gumbel distribution with P ( E ≤ x ) = e − e − x and a ( x ) = p xb ( x ) = 2 log x + p x + log 32 − log Γ (cid:16) p (cid:17) . In particular, the above squared noise term is of order O P (log( T /h )) inthis case.The assertions remain true if an estimator for the covariance is used fulﬁlling (3.9) uniformly over all h ≤ t ≤ T − h .(c) In the situation of multiple change points, it holds that sup h ≤ t ≤ T − h k Λ t k = O P ( p log( T /h )) . In order to obtain consistent estimators, on the one hand, the threshold needs tobe asymptotically negligible compared to the squared signal term as in Theorem 3.1(a). This guarantees that every change is detected with asymptotic probability 1.On the other hand, the threshold needs to grow faster than the squared noise term inTheorem 3.1 (c) so that false positives occur with asymptotic probability 0. Hence,both conditions are fulﬁlled under the following assumption:

Assumption 3.3.

The threshold fulﬁlls: β h,T h T min i =1 ,...,q T k d i k → , log Th T β h,T → T → ∞ ) . The following remark introduces a threshold that has a nice interpretation inconnection with change point testing: 14 emark 3.1.

A common choice for the threshold is obtained as the asymptotic α T -quantile obtained from Theorem 3.1 (b) for some sequence α T →

0. In the simulationstudy in Section 5 we use this threshold with α T = 0 .

05. Using the asymptotic α -quantile (from the no-change situation) guarantees that all change point estimatorsare signiﬁcant at a global level α , where signiﬁcant is meant in the usual testing sense.This gives this choice a nice interpretability. In fact, Theorem 3.1 shows that such athreshold with a constant sequence α yields an asymptotic test at level α which hasasymptotic power one by Theorem 4.1. Nevertheless, often, tests designed for the at-most-one-change as in Huˇskov´a and Steinebach 2000, Huˇskov´a and Steinebach 2002have a better power, but are not as good at localising change points (see Figure 1 inCho and Kirch 2020 for an illustration). In this section, we will show consistency of the above segmentation procedure forboth the estimators of the number and locations of the change points. Furthermore,we derive localisation rates for the estimators of the locations of the change pointsfor some special cases showing that they cannot be improved in general. This iscomplemented by the observations that these localisation rates are indeed minimax-optimal if the number of change points is bounded in addition to observing Wienerprocesses with drift. Otherwise the generic rates that are obtained based solely onthe invariance principle will not be tight in the sense that the proposed procedurecan provide better rates than suggested by the invariance principle.The following theorem shows that the change point estimators deﬁned in (3.10)are consistent for the number and locations of the change points.

Theorem 4.1.

Let Assumptions 2.1, 3.1 – 3.3 hold. Let < ˆ c < . . . < ˆ c ˆ q T be thechange point estimators of type (3.10) . Then for any τ > it holds lim T →∞ P (cid:18) max i =1 ,..., min(ˆ q T ,q T ) | ˆ c i − c i | ≤ τ h, ˆ q T = q T (cid:19) = 1 . The theorem shows in particular that the number of change points is estimatedconsistently. For the linear bandwidth we additionally get consistency of the changepoint locations in rescaled time, while for the sublinear bandwidths we already geta convergence rate of h/T towards the rescaled change points.Under the following stronger assumptions, the localisation rates can be furtherimproved: 15 ssumption 4.1. (a) It holds for any of the centered processes e R ( j ) as in (3.6) andany value θ i = θ i,T (which will be c i or c i ± h when the assumption is applied)for any sequence D T ≥ (bounded or unbounded) sup DT k d i k ≤ s ≤ h √ D T (cid:13)(cid:13)(cid:13) e R ( j ) θ i − e R ( j ) θ i ± s (cid:13)(cid:13)(cid:13) s k d i k = O P ( ω T ) . (b) Let now the upper index θ i denote the active stretch in the stationary segment ( θ i , θ i + s ) respectively ( θ i − s, θ i ) . Then, it holds for any sequence D T > i =1 ,...,q T sup DT k d i k ≤ s ≤ h √ D T (cid:13)(cid:13)(cid:13) e R ( θ i ) θ i − e R ( θ i ) θ i ± s (cid:13)(cid:13)(cid:13) s k d i k = O P (˜ ω T ) . The localisation rates of the MOSUM procedure are determined by the rates ω n , ˜ ω n which need to be derived for each example separately (at least for the tightones). In the context of partial sum processes these results are well known. Forexample, the suprema in (a) are stochastically bounded by the H´aj´ek-R´enyi inequalitywhich has been shown for partial sum processes even with weakly dependent errors.In that context, the assertion in (b) is fulﬁlled with a polynomial rate in q T (see Choand Kirch 2019, Proposition 2.1 (c)(ii)). Remark 4.1. (a) For Wiener processes with drift ω T = 1 and ˜ ω T = p log( q T ) (seeProposition A.1 below).(b) By the invariance principle in Assumption 2.1, all rates are clearly dominatedby T / ν T . However, this is often far too liberal a bound (see Proposition 2.1 inCho and Kirch 2019 for some tight bounds in case of partial sum processes).(c) Often, for each regime there exist forward and backwards invariance principlesfrom some arbitrary starting value θ i . This is the case for partial sum processesand for (backward and forward) Markov processes due to the Markov property.For renewal processes, this can be shown along the lines of the original proof forthe invariance principle (Cs¨org¨o, Horv´ath, and Steinebach 1987) because the timeto the next (previous) event is asymptotically negligible; see also Example 1.2in K¨uhn and Steinebach 2002). In this case, the H´aj´ek-R´enyi results for Wienerprocesses carry over (see Proposition A.1) to the diﬀerent processes underlyingeach regime, resulting in ω T = 1. For the situation with a bounded number ofchange points this carries over to ˜ ω T .16 heorem 4.2. Let Assumptions 2.1, 3.1 – 3.3 in addition to 4.1 hold. For ˆ q T < q T deﬁne ˆ c i = T for i = ˆ q T + 1 , . . . , q T .(a) For a single change point estimator the following localisation rate holds k d i k | ˆ c i − c i | = O P (cid:0) ω T (cid:1) . (b) The following uniform rate holds true: max i =1 ,...,q T k d i k | ˆ c i − c i | = O P (cid:0) ˜ ω T (cid:1) . Remark 4.2 (Minimax optimality) . We have already mentioned beneath (3.4) thatthe separation rate given there is minimax optimal (see Proposition 1 of Arias-Castro,Candes, and Durand 2011).Minimax optimal localisation rates (derived in the context of changes in the meanof univariate time series, which is covered by the partial sum processes in our frame-work) are known for a few special cases: First, the minimax optimal localisationrate for a single change point and in extension also for a bounded number of changepoints is given by ω T = 1 in the above notation (see e.g. Lemma 2 in Wang, Yu,Rinaldo, et al. 2020). In particular this shows that our procedures achieves theminimax optimality in case of a bounded number of change points under weak as-sumptions (as pointed out in Remark 4.1 (c)). Secondly, the optimal localisationrate for unbounded change points under sub-Gaussianity (attained for partial sumprocess of i.i.d. errors) is given by ˜ ω T = √ log T (see Proposition 6 in Verzelen et al.2020 and Proposition 2.3 in Cho and Kirch 2019). Indeed, we match this rate forWiener processes with drift.The following theorem derives the limit distribution of the change point estima-tors for local changes which shows in particular that the rates are tight. In principle,this result can be used to obtain asymptotically valid conﬁdence intervals for thechange point locations. In case of ﬁxed changes, the limit distribution depends onthe underlying distribution of the original process (see Antoch and Huˇskov´a 1999 forthe case of partial sum processes), where the proof can be done along the same lines.We need the following assumption: Assumption 4.2.

Let d i = d i,T = k d i k u i + o ( k d i k ) with k u i k = 1 and k d i,T k → .Assume that Y ( j ) s = Y ( j ) s ( c i , D ) with Y (1) s = e R ( c i ) c i − h + s − D k di k − e R ( c i ) c i − h − D k di k , (21) s = e R ( c i ) c i + s − D k di k − e R ( c i ) c i − D k di k , Y (22) s = e R ( c i +1 ) c i + s − D k di k − e R ( c i +1 ) c i − D k di k , Y (3) s = e R ( c i +1 ) c i + h + s − D k di k − e R ( c i +1 ) c i + h − D k di k fulﬁll the following multivariate functional central limit theorem for any constant D > in an appropriate space equipped with the supremum norm (cid:8) k d i k ( Y (1) s , Y (21) s , Y (22) s , Y (3) s ) : 0 ≤ s ≤ D (cid:9) w −→ nf W s : 0 ≤ s ≤ D o , where f W is a Wiener process with covariance matrix Ξ (not depending on D ). For − D ≤ t ≤ D denote W t = ( W (1) t , W (21) t , W (22) t , W (3) t ) = f W D + t − f W D . By Assumption 3.1 it holds h k d i k → ∞ , such that the distance h − D k d i k be-tween Y (1) and Y (2 j ) (resp. between Y (2 j ) and Y (3) ) diverges to inﬁnity. As such forprocesses with independent increments the processes Y (1) , ( Y (21) , Y (22) ) , Y (3) areindependent for T large enough. Additionally, under weak assumptions such as mix-ing conditions this independence still holds asymptotically in the sense that W (1) ,( W (21) , W (22) ) , W (3) are independent.Functional central limit theorems for these processes follow from invariance prin-ciples as in Assumption 2.1 with Σ T → Σ as long as such invariance principlesstill hold with an arbitrary (moving) starting value, which is typically the case (seealso Remark 4.1 (c)). As such, it typically holds that Ξ (1) = Ξ (21) = Σ ( c i ) and Ξ (3) = Ξ (22) = Σ ( c i +1 ) where Ξ j = Cov( W ( j )1 ) and Σ ( c i ) is the covariance matrixassociated with the regime between the ( i − i th change point.The following theorem gives the asymptotic distribution for the change pointestimators in case of local change points. Theorem 4.3.

Let Assumptions 2.1, 3.1 – 3.3, 4.1 (a) with ω T = 1 and 4.2 hold. For ˆ q T < q T deﬁne ˆ c i = T for i = ˆ q T + 1 , . . . , q T . Let Ψ ( i ) t := − | t | + ( u i W (1) t − u i W (21) t + u i W (3) t , t < u i W (1) t − u i W (22) t + u i W (3) t , t ≥ . Then, for all i = 1 , . . . , q T , it holds that for T → ∞k d i k (ˆ c i − c i ) D −→ argmax n Ψ ( i ) t (cid:12)(cid:12)(cid:12) t ∈ R o If there is a ﬁxed number of changes q T = q with q ﬁxed and a functional centrallimit theorem as in Assumption 4.2 holds jointly for all q change points, then theresult also holds jointly. { Ψ ( i ) t : t ≥ } is independentof { Ψ ( i ) t : t ≤ } . Remark 4.3. (a) If W (1) , ( W (21) , W (22) ) , W (3) are independent which is typicallythe case (see discussion beneath Assumption 4.2), then Ψ ( i ) t simpliﬁes toΨ ( i ) t := − | t | + q σ + 4 σ + σ B t , t < q σ + 4 σ + σ B t , t ≥ , where B is a (univariate) standard Wiener process and σ j ) = u i Ξ ( j ) u i . Usually(see discussion beneath Assumption 4.2) σ (21) = σ (1) and σ (22) = σ (3) furthersimplifying the expression. For some examples such as partial sum processesit holds Σ t = Σ for all t , such that all σ ( j ) coincide. In this case this furthersimpliﬁes to Ψ ( i ) t := − | t | + √ σ (1) B t . For univariate partial sum processes this result has already been obtained inTheorem 3.3 of Eichinger and Kirch 2018. However, the assumption of Σ t = Σ is typically not fulﬁlled for renewal processes because the covariance depends onthe changing intensity of the process.(b) If W (1) , ( W (21) , W (22) ) , W (3) are independent and M t in (3.10) is replaced by Σ − / t M t , then the Wiener processes W ( j ) are standard Wiener processes, suchthat Ψ ( i ) t simpliﬁes to Ψ ( i ) t := − | t | + √ B t . This shows that in this case the limit distribution of ˆ c i − c i does only depend onthe magnitude of the change d i but not on its direction u i .Statistically, however, this is diﬃcult to achieve as it requires a uniformly (in t )consistent estimator for the usually unknown covariance matrices Σ t . In this section, we illustrate the performance of our procedure for multivariate re-newal processes by means of a simulation study. Related simulations in addition to avariety of data examples for partial sum processes have been conducted by Eichinger19nd Kirch 2018; Cho, Kirch, and Meier 2019 and for univariate renewal processes byMesser et al. 2014; Messer et al. 2017.More precisely, we analyse three-dimensional renewal processes with T = 1600,where the increments of the inter-event times for each component are Γ − distributedwith intensity changes at 250, 500, 900 and 1150, where the expected time µ betweenevents is given by 1 .

3, 0 .

9, 0 .

6, 0 . .

3. We use a bandwidth of h = 120 andthe parameter η = 0 .

75. Smaller values of η as suggested by Cho, Kirch, and Meier2019 for partial sum processes tend to produce duplicate change point estimators byhaving two or more signiﬁcant local maxima for each change point if the variance istoo large. For a single-bandwidth MOSUM procedure as suggested here, this shouldbe avoided but can be relaxed if a post-processing procedure is applied as e.g. byCho and Kirch 2019 for partial sum processes.In contrast to partial sum processes, it is natural for renewal processes thatthe variances change with the intensity, therefore we consider the following threescenarios: (i) standard deviations of constant value 0.7 (referred to as constvar ),(ii) standard deviations being 5 / µ (referred to as smallvar ) and (iii) multivariatePoisson processes (referred to as Poisson ).We consider both the case of independence and dependence between the threecomponents. In the latter case, we generate for each regime i an independent (intime) sequence of Γ − distributed inter-event-times Y j = Y ( i ) j , j = 1 , ,

3, with acorrelation of 0 . Y j = X j + X , where X j ∼ Γ( s, λ ) for j = 1 , , X ∼ Γ( s/ , λ ) for appropriate values of s and λ (resulting in the above intensitiesand standard deviations for each regime).In the simulations, we use a threshold as in Remark 3.1 with α T = 0 .

05. BySection 2.2.2 and (3.8) it holds that Σ t = Cov [( Y , Y , Y ) ] / E [ Y ] while we usethe following choices for the matrix b A t as in (3.7): (A) Diagonal matrix with locallyestimated variances b Σ t ( j, j ) on the diagonal, j = 1 , ,

3, (B) with the true variances Σ t ( j, j ) on the diagonal and (C) in case of dependent components (non-diagonal) truecovariance matrix Σ t . While only (A) is of relevance in applications, this allows us tounderstand the inﬂuence of estimating the variance on the procedure. For dependentdata, the distinction between (B) and (C) is important for applications, because agood enough estimator (resulting in a reasonable estimator for the inverse) is oftennot available for the full covariance matrix as in (C) for moderately high or highdimensions, while it is much less problematic to estimate (B). In (A) the variancesat location t are estimated as b Σ t ( j, j ) = min (cid:26) b σ j, − ( t ) b µ j, − ( t ) , b σ j, + ( t ) b µ j, + ( t ) (cid:27) , (5.1)20 a) constvar : Constant standard deviation of 0 . Change point at 250 500 900 1150 spurious duplicateindependent, estimator (A) 0.9992 0.9985 0.9253 0.9998 0.0243 0.0035independent, estimator (B) 0.9962 0.9727 0.6149 0.9998 0.0033 0.0003dependent, estimator (A) 0.9966 0.9959 0.9052 1 0.0326 0.0030dependent, estimator (B) 0.9879 0.9551 0.6245 0.9993 0.0072 0.0004dependent, estimator (C) 0.9439 0.8360 0.3534 0.9975 0.0049 0.0004 (b) smallvar : Standard deviation of 5 / Change point at 250 500 900 1150 spurious duplicateindependent, estimator (A) 0.9790 1 0.9707 1 0.0273 0.0023independent, estimator (B) 0.9354 1 0.9302 0.9999 0.0038 0.0002dependent, estimator (A) 0.9657 0.9999 0.9527 0.9996 0.0347 0.0026dependent, estimator (B) 0.9197 0.9982 0.9089 0.9986 0.0071 0.0013dependent, estimator (C) 0.7421 0.9882 0.7137 0.9896 0.0043 0.0003 (c)

Poisson -distributed inter-event times.

Change point at 250 500 900 1150 spurious duplicateindependent, estimator (A) 0.8913 0.9963 0.8615 0.9976 0.0390 0.0042independent, estimator (B) 0.7338 0.9844 0.7174 0.9876 0.0029 0.0009dependent, estimator (A) 0.8654 0.9910 0.8331 0.9923 0.0457 0.0047dependent, estimator (B) 0.7138 0.9756 0.6961 0.9749 0.0056 0.0012dependent, estimator (C) 0.4525 0.8883 0.4217 0.8963 0.0042 0.0001Table 5.1: Detection rates for each change point as well as the average number ofspurious and duplicate estimators for diﬀerent distributions of the inter-event times.where b σ j, ± ( t ) and b µ j, ± ( t ) are the sample variance and sample mean respectively basedon the inter-event times of the j th-component within the windows ( t − h, t ] for − respectively ( t, t + h ] for + . The ﬁrst and last inter-event times that have beencensored by the window are not included. Using the minimum of the left and rightlocal estimators takes into account that the variance can (and typically will) changewith the intensity which has already been discussed by Cho, Kirch, and Meier 2019in the context of partial sum processes.The results of the simulation study can be found in Table 5.1, where we considera change point to be detected if there was an estimator in the interval [ c i − h, c i + h ].21

200 400 600 800Time Figure 5.1: MOSUM statistics with bandwidths of h = 30 , , ,

120 (top to bot-tom) for a three-dimensional renewal process with multiscale changeswith increasingdistance between change points in combination with decreasing magnitude of thechanges in intensity. The dashed vertical lines indicate the location of the truechanges, while the solid lines indicate the change point estimators.In this multiscale situation no single bandwidth can detect all changes: The changesto the left are well estimated by smaller bandwidth, the ones in the middle bymedium-sized bandwidths and the one to the right by the largest bandwidth.Additional signiﬁcant local maxima in such an interval are called duplicate changepoint estimators, while additional signiﬁcant local maxima outside any of these in-tervals are called spurious . The procedure performs well throughout all simulationswith high detection rate, few spurious and very few duplicate estimators. The resultsimprove further for smaller variance, in which case the signal-to-noise ratio is better.When the diagonal matrix with the estimated variance is being used, the detec-tion power is larger in all cases than when the true variance is being used. In caseof the changes at location 900 this is a substantial improvement, such that the useof this local variance estimator can help boost the signal signiﬁcantly. This comesat the cost of having an increased but still reasonable amount of spurious and du-plicate change point estimators. This eﬀect stems from using the minimum in (5.1),22hich was introduced to gain detection power if the variance changes with the in-tensity. Additionally, the use of the true (asymptotic) covariance matrix leads toworse results than only using the corresponding diagonal matrix possibly becausethe asymptotic covariances do not reﬂect the small sample (with respect to h ) co-variances well enough. From a statistical perspective this is advantageous becausethe local estimation of the inverse of a covariance matrix in moderately large or largedimensions is a very hard problem leading to a loss in precision, while the diagonalelements are far less diﬃcult to estimate consistently.In the above situation the changes are homogeneous in the sense that the small-est change in intensity is still large enough compared to the smallest distance toneighboring change points (for a detailed deﬁnition we refer to Cho and Kirch 2019,Deﬁnition 2.1, or Cho and Kirch 2020, Deﬁnition 2.1). In particular, this guaranteesthat all changes can be detected with a single bandwidth only.In some applications with multiscale signals, where frequent large changes as wellas small isolated changes are present, this is no longer the case as Figure 5.1 shows. Insuch cases, several bandwidths need to be used and the obtained candidates pruneddown in a second step (see Cho and Kirch 2019 for an information criterion basedapproach for partial sum processes as well as Messer et al. 2014 for a bottom-up-approach for renewal processes). Similarly, if the distance to the neighboring changepoints is unbalanced MOSUM procedures with asymmetric bandwidths as suggestedby Cho, Kirch, and Meier 2019 may be necessary. In this paper, we considered a class of multivariate processes that, possibly aftera change of probability space fulﬁll a uniform strong invariance principle. We as-sumed that the process switches possibly inﬁnitely many times between ﬁnitely manyregimes, with each switch inducing a change in the drift. This setup includes severalimportant examples, including multivariate partial sum processes, diﬀusion processesas well as renewal processes. In order to localise these changes, we extended the workof Eichinger and Kirch 2018 and Messer et al. 2014 and proposed a single-bandwidthprocedure using MOSUM statistics in order to estimate changes, allowing for localchanges. We were able to show consistency for the estimators. Further, we were ableto derive (uniform) localisation rates in the form of exact convergence rates, whichare indeed minimax-optimal.One drawback of the procedure is the use of a single bandwidth. In practice, theidentiﬁcation of the optimal bandwidth turns out to be rather diﬃcult as pointed oute. g. by Cho and Kirch 2019 and Messer et al. 2014: On the one hand, one wants to23hoose a large bandwidth in order to have maximum power, while on the other hand,choosing a too large bandwidth may lead to misspeciﬁcation or nonidentiﬁcation ofchanges. Furthermore, as can be seen in the simulation study, in a multiscale changepoint situation (see Deﬁnition 2.1 of Cho and Kirch 2019) no single bandwidth candetect all change points. Therefore, one future topic of interest would be to extendthe proposed procedure to a true multiscale setup as in Cho and Kirch 2019.

Acknowledgements

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, GermanResearch Foundation) - 314838170, GRK 2297 MathCoRe.

References

Aggarwal, R., Inclan, C., and Leal, R. (1999). “Volatility in emerging stock markets.”In:

Journal of Financial and Quantitative Analysis

The Annals of Statistics

39, pp. 278–304.Bauer, P. and Hackl, P. (1980). “An extension of the MOSUM technique for qualitycontrol.” In:

Technometrics

Biometrika

Multi-sequence segmentation via score and higher-criticism tests. eprint: arXiv:1706.07586 .Cho, H. and Fryzlewicz, P. (2012). “Multiscale and multilevel technique for consistentsegmentation of nonstationary time series.” In:

Statistica Sinica

22, pp. 207–229.Cho, H. and Kirch, C. (2019). “Two-stage data segmentation permitting multiscalechangepoints, heavy tails and dependence”. In: eprint: arXiv:1910.12486v3 .Cho, H., Kirch, C., and Meier, A. (2019). “mosum: A package for moving sums inchange point analysis.” In.Cho, Haeran and Kirch, Claudia (2020). “Data segmentation algorithms: Univariatemean change and beyond”. In: arXiv preprint arXiv:2012.12814 .24senki, A. (1979). “An Invariance Principle in k-dimensional Extended RenewalTheory.” In:

J. Appl. Prob.

16, pp. 567–574.Cs¨org¨o, M. and Horv`ath, L. (1997).

Limit theorems in change-point analysis . vol. 18.John Wiley & Sons Inc.Cs¨org¨o, M., Horv´ath, L., and Steinebach, J. (1987). “Invariance principles for renewalprocesses.” In:

Ann. Probab.

14, pp. 1441–1460.Eichinger, B. and Kirch, C. (2018). “A MOSUM procedure for the estimation ofmultiple random change points.” In:

Bernoulli

24, pp. 526–564.Einmahl, Uwe (1987). “Strong invariance principles for partial sums of independentrandom vectors.” In:

Ann. Probab.

15, 1419–1440.Fearnhead, Paul and Rigaill, Guillem (2020). “Relating and Comparing Methods forDetecting Changes in Mean”. In:

Stat , e291.Fisch, A. T. M., Eckley, I. A., and Fearnhead, P. (2018).

A linear time method forthe detection of point and collective anomalies. eprint: arXiv:1806.01947. .Fryzlewicz, P. (2014). “Wild Binary Segmentation for multiple change-point detec-tion.” In:

The Annals of Statistics

42, pp. 2243–2281.Gut, A. and Steinebach, J. (2002). “Truncated Sequential Change-point Detectionbased on Renewal Counting Processes.” In:

Scandinavian Journal of Statistics

29, pp. 693–719.— (2009). “Truncated Sequential Change-point Detection based on Renewal Count-ing Processes II.” In:

Journal of Statistical Planningand Inference

Stochas-tic Processes and their Applications.

Test

J. Statist. Plann. Infer.

Journal of Statistical Planning and Inference

89, pp. 57–77.— (2002). “Asymptotic tests for gradual changes.” In:

Statistics & Decisions

Statistics & Probability Letters

51, pp. 189–196.K¨uhn, C. and Steinebach, J. (2002). “On the estimation of change parameters basedon weak invariance principles.” In:

Limit Theorems in Probability and Statistics II, . Berkes, E. Cs´aki, M. Cs¨org¨o, eds., J´anos Bolyai Math. Soc. Budapest , pp. 237–260.Killick, R., Eckley, I. A., Ewans, K., and Jonathan, P (2010). “Detection of changesin variance of oceanographic time-series using changepoint analysis.” In: OceanEngineering

37, pp. 1120–1126.Killick, R., Fearnhead, P., and Eckley, I. A. (2012). “Optimal detection of change-points with a linear computational cost.” In:

Journal of the American StatisticalAssociation

Journal of Computationaland Applied Mathematics B -valued random variables”. In: Ann. Probab.

8, pp. 1003–1036.Li, H., Munk, A., and Sieling, H. (2016). “FDR-control in multiscale change-pointsegmentation.” In:

Electronic Journal of Statistics

10, pp. 918–959.Maidstone, R., Hocking, T., Rigaill, G., and Fearnhead, P. (2017). “On optimalmultiple changepoint algorithms for large data.” In:

Statistics & Computing

Journal ofComputational Neuroscience

42, pp. 187–201.Messer, M., Kirchner, M., Schiemann, J., Roeper, J., Neininger, R., and Schneider, G.(2014). “A multiple ﬁlter test for the detection of rate changes in renewal processeswith varying variance.” In:

The Annals of Applied Statistics

StatisticalInference for Stochastic Processes

20, pp. 253–272.Mihalache, Stefan-R. (2011). “Sequential Change-Point Detection for Diﬀusion Pro-cesses”. dissertation. Universit¨at zu K¨oln.Niu, Y. S. and Zhang, H. (2012). “The screening and ranking algorithm to detectDNA copy number variations.” In:

The Annals of Applied Statistics

6, pp. 1306–1326.Olshen, A. B., Venkatraman, E., Lucito, R., and Wigler, M. (2004). “Circular bi-nary segmentation for the analysis of array-based DNA copy number data.” In:

Biostatistics

5, pp. 557–572.Page, E. S. (1954). “Continuous inspection schemes.” In:

Biometrika

41, pp. 100–115.26teinebach, J. (2000). “Some remarks on the testing of smooth changes in the lineardrift of a stochastic process”. In:

Theory of Probability and Mathematical Statistics

61, pp. 173–185.Steinebach, J. and Eastwood, V. R. (1996). “Extreme Value Asymptotics for Multi-variate Renewal Processes.” In:

Journal of multivariate analysis

Optimalchange point detection and localization. eprint: arXiv:2010.11470 .Vostrikova, L. (1981). “Detecting disorder in multidimensional random processes.”In:

Soviet Mathematics Doklady

24, pp. 55–59.Wang, Daren, Yu, Yi, Rinaldo, Alessandro, et al. (2020). “Univariate mean changepoint detection: Penalization, cusum and optimality”. In:

Electronic Journal ofStatistics

Statistics & Probability Letters

Sankhya: The Indian Journal of Statistics , pp. 370–381.Yau, Chun Yip and Zhao, Zifeng (2016). “Inference for multiple change points intime series via likelihood ratio scan statistics”. In:

Journal of the Royal StatisticalSociety: Series B (Statistical Methodology)

A Appendix: Proofs

We ﬁrst prove some bounds for the limiting Wiener process that will be used through-out the proofs (for (i)) or are related to the bounds in Assumption 4.1 (for (ii) and(iii)).

Proposition A.1.

Let Assumption 2.1 hold with a rate of convergence as in As-sumption 3.1 with the notation of Assumption 4.1. Let < ξ T ≤ h T and D T ≥ bearbitrary sequences (bounded or unbounded).(a) The following bounds hold for the Wiener processes as in Assumption 2.1: ( i ) max i =1 ,...,q T sup ≤ t ≤ ξ T √ ξ T k W ( θ i +1 ) θ i − W ( θ i +1 ) θ i ± t k = O P (cid:16)p log 2 q T (cid:17) , ( ii ) sup DT k d i k ≤ s ≤ h T √ D T (cid:13)(cid:13)(cid:13) W ( θ i ) θ i − W ( θ i ) θ i ± s (cid:13)(cid:13)(cid:13) s k d i k = O P (1) , iii ) max i =1 ,...,q T sup DT k d i k ≤ s ≤ h T √ D T (cid:13)(cid:13)(cid:13) W ( θ i ) θ i − W ( θ i ) θ i ± s (cid:13)(cid:13)(cid:13) s k d i k = O P (cid:16)p log 2 q T (cid:17) . (b) The bound in (i) carries over to the centered increments of the original process: max i =1 ,...,q T sup ≤ t ≤ ξ T √ ξ T k e R ( θ i +1 ) θ i − e R ( θ i +1 ) θ i ± t k = O P (cid:16)p log 2 q T (cid:17) . The bound in (ii) carries over if a forward and backward invariance principleas above exists starting in an arbitrary point θ i , in this case (iii) carries over if q T = O (1) .For a single change point (instead of taking the maximum over all) the bound in (a)(i) and (b) is given by O P (1) .Proof. (a) Let B ( j ) t = ( Σ ( j ) T ) − / W ( j ) t , then by the self-similarity of Wiener processesit holds max i =1 ,...,q T sup ≤ t ≤ ξ T √ ξ T k W ( c i +1 ) c i − W ( c i +1 ) c i + t k≤ O (1) max j =1 ,...,P k ( Σ ( j ) T ) / k max i =1 ,...,q T sup ≤ t ≤ k B ( c i +1 ) c i − B ( c i +1 ) c i + t k . By the uniform boundedness of the covariance matrices as in Assumption 2.1,max j =1 ,...,P k ( Σ ( j ) T ) / k = max j =1 ,...,P q k Σ ( j ) T k = O (1) . The reﬂection principle in combination with tail probabilities for Gaussian ran-dom variables shows that with appropriate constants D , D (not depending on i ) itholds for all D ≥ P (cid:18) sup ≤ t ≤ k B ( c i +1 ) c i − B ( c i +1 ) c i + t k ≥ D p D log(2 q T ) (cid:19) ≤ D D q DT , which in combination with subadditivity shows thatmax i =1 ,...,q T sup ≤ t ≤ k B ( c i +1 ) c i − B ( c i +1 ) c i + t k = O P ( p log 2 q T ) . The assertion without the maximum follows analogously.28learly, (ii) follows from (iii) so we will only prove the latter. As above it is suf-ﬁcient to prove the assertion for { B t } . Due to the self-similarity of Wiener processesand its stationary and independent increments, it holdsmax i =1 ,...,q T sup DT k d i k ≤ s ≤ h T √ D T (cid:13)(cid:13)(cid:13) B ( c i ) c i + s − B ( c i ) c i (cid:13)(cid:13)(cid:13) s k d i k D = max j =1 ,...,q T sup ≤ t ≤ h T k d j k /D T (cid:13)(cid:13)(cid:13) B ( j ) t (cid:13)(cid:13)(cid:13) t , where { B ( j ) t } , j = 1 , , . . . , are independent standard Wiener processes. Similar asser-tions hold for the other expressions. By the reﬂection principle and tail probabilitiesfor Gaussian random variables it holds for any C > P (cid:18) sup t ≥ k B t k t ≥ p C log 2 q T (cid:19) ≤ X l ≥ P sup l ≤ t< l +1 k B t k t ≥ p C log 2 q T ! ≤ X l ≥ P (cid:18) sup ≤ t ≤ k B t k ≥ l √ l +1 p C log 2 q T (cid:19) ≤ O (1) X l ≥ ( O (1)2 Cq T ) − l = O (cid:18) C q T (cid:19) , which shows the assertion in combination with the sub-additivity.(b) By the invariance principle and (a) (i) it holdsmax i =1 ,...,q T sup c i − h T

Proof of Theorem 3.1. (a) Because b A t is symmetric and positive deﬁnite such that the minimal eigenvalueof A − t is given by 1 / k b A t k it holds m t b A − t m t ≥ k b A t k k m t k = 1 k b A t k ( h − | c i − t | ) h k d i k . h ≤ t ≤ T k Λ t − Λ t ( W ) k = O P (cid:18) T / ν T √ h (cid:19) = o P (cid:16)p log( T /h ) − (cid:17) , (A.1)where Λ t ( W t ) is the MOSUM statistics deﬁned in (3.1) with { Z t } there replaced by { W t } . Assertion (b)(i) follows immediately by the 1 / B t = Σ − / t W t .For the sub-linear case as in (ii) we get by (A.1) a (cid:18) Th (cid:19) sup h ≤ t ≤ T − h (cid:13)(cid:13)(cid:13) Σ − / t Λ t (cid:13)(cid:13)(cid:13) = a (cid:18) Th (cid:19) sup h ≤ t ≤ T − h k Λ t ( B ) k + o P (1) D = 1 √ ≤ s ≤ Th − k B s +2 − B s +1 + B s k + o P (1) , where ( Λ t ) t ≥ is a stationary process. Assertion (b)(ii) follows by Steinebach andEastwood 1996, Lemma 3.1 in combination with Remark 3.1 with α = 1 and C = . . . = C p = .Replacing Σ t by b Σ t does not change any of the above assertions by standardarguments.(c) By splitting Λ t ( e R ) into increments of length at most 2 h anchored at thechange points c i we get by Proposition A.1(b)(i)max i =1 ,...,q T sup c i − h

We ﬁrst prove consistency of the segmentation procedure.

Proof of Theorem 4.1.

Deﬁne for 0 < τ < S T = S (1) T ∩ S (2) T ∩ q T \ j =1 (cid:16) S (3) T ( j, τ ) ∩ S (4) T ( j, τ ) (cid:17) , (A.2)where S (1) T = ( max j =1 ,...,q T sup | t − c j | >h M t b A − t M t < β ) ,S (2) T = (cid:26) min j =1 ,...,q T M c j b A − c j M c j ≥ β (cid:27) ,S (3) T ( j, τ ) = d τ e− \ k =1 n sup c j − h ≤ t ≤ c j − kτh k M t k < k M c j − ( k − τh k o ,S (4) T ( j, τ ) = d τ e− \ k =1 n sup c j + kτh ≤ t ≤ c j + h k M t k < k M c j +( k − τh k o . On S (1) T there are asymptotically no signiﬁcant points outside of h -environments ofthe change points. On S (2) T there is at least one signiﬁcant time point for each changepoint. On S (3) T ( j, τ ) ∩ S (4) T ( j, τ ) with τ < η/

2, there are no local extrema (within the h -environment of c j ) that are outside the interval ( c j − τ h, c j + τ h ). Additionally, on S (2) T ∩ S (3) T ( j, τ ) ∩ S (4) T ( j, τ ) the global extremum within that interval will be the onlysigniﬁcant local extremum within the h -environment of c j such that (cid:26) max i =1 ,..., min(ˆ q T ,q T ) | ˆ c i − c i | ≤ τ h, ˆ q T = q T (cid:27) ⊃ S T . We will conclude the proof by showing that S T is an asymptotic one set.Indeed, P ( S (1) T ) → M t b A − t M t ≤ k b A − t k k M t k and P ( S (2) T ) → c i − h ≤ t ≤ c i , we obtain that k M c i − ( k − τh k − k M c i − kτh k ≥ k m c i − ( k − τh k − k m c i − kτh k + O P (cid:16)p log( T /h ) (cid:17) ≥ τ √ √ h k d i k (1 + o P (1)) , where the o P -term is uniform in i . This shows that P (cid:16)T q T j =1 S (3) T ( j, τ ) (cid:17) →