Autocovariance Estimation in the Presence of Changepoints
AA UTOCOVARIANCE E STIMATION IN THE P RESENCE OF C HANGEPOINTS
A P
REPRINT
Colin Gallagher
Department of Statistics and Operation ResearchClemson University
Rebecca Killick
Department of StatisticsLancaster University
Robert Lund
Department of StatisticsUniversity of California, Santa Cruz
Xueheng Shi
Department of StatisticsUniversity of California, Santa Cruz [email protected]
February 26, 2021 A BSTRACT
This article studies estimation of a stationary autocovariance structure in the presence of an unknownnumber of mean shifts. Here, a Yule-Walker moment estimator for the autoregressive parameters ina dependent time series contaminated by mean shift changepoints is proposed and studied. Theestimator is based on first order differences of the series and is proven consistent and asymptoticallynormal when the number of changepoints m and the series length N satisfies m/N → as N → ∞ . Keywords
Autoregressions, Differencing, Robustness, Rolling Windows, Segmentation, Yule-Walker Estimates
Time series dynamics often change due to external events or internal systematic fluctuations. Changepoint analyseshave become commonplace in identifying whether and when abrupt changes take place.Evolving from the original treatise for a single location parameter shift in Page [1954], the majority (but not all)changepoint analyses check for shifts in the series’ mean. Since then, considerable work on changepoints has beenconducted, including applications to climatatology [Hewaarachchi et al., 2017], economics [Norwood and Killick,2018], and disease modelling [Hall et al., 2000]. This paper studies approaches for estimating an unknown autocorre-lation function when an unknown number of mean shift changepoints are present.Many changepoint techniques assume independent and identically distributed (IID) model errors; however, time seriesdata are typically correlated (e.g., daily temperaturs and stock prices and DNA sequences). Changepoint techniquesmay incorrectly segment the data (estimate too many or too few changepoints) should non-zero autocorrelation beignored. Two primary ways of tackling this issue have been pursued: 1) including autocorrelation in the multiplechangepoint model [Li and Lund, 2012], and 2) decorrelating the time series prior to any changepoint analysis [Chakaret al., 2017, Shi et al., 2021]. In either case, one needs the autocovariance function of the data. With a good estimateof the series’ autocovariance, one-step-ahead prediction residuals can be estimated — and these residuals are alwaysuncorrelated (independent up to estimation for Gaussian series). Indeed, a principle of [Shi et al., 2021, Robbins et al.,2011] is that good multiple changepoint detection routines can be devised by applying IID methods to one-step-aheadprediction residuals.Some multiple changepoint models for time series allow all model parameters, including those governing the correla-tion structure of the series, to change at each changepoint time. These scenarios are easier to handle (computationallyquicker) as dynamic programming techniques can quickly optimize penalized objective functions; see Killick et al. a r X i v : . [ m a t h . S T ] F e b utocovariance Estimation in the Presence of Changepoints A P
REPRINT [2012] and Maidstone et al. [2017]. The key here is that the objective function that is optimized needs to be additivein its segments (regimes).A more parsimonious setup allows means to shift with each changepoint time, but keeps error autocovariances constantacross all regimes. As this breaks the objective function additivity required, fast dynamic programming techniquescannot be directly applied. Thus, estimates of global autocovariances governing all regimes need to be developed.Standard estimation schemes degrade unless the changepoint locations are a priori known (known changepoint timesallow the segment means to be better estimated and subtracted from the series a priori to autocovariance calculation).Should process autocovariance estimators be off (due to bias or inconsistency), then decorrelation techniques will alsosuffer. As such, there is a need to estimate a stationary autocovariance function in the presence of mean shifts.This paper studies autocovariance estimation in the presence of mean shifts in more detail. We devise a method basedon first order differencing that beats rolling window methods. The scenario is asymptotically quantified when modelerrors are drawn from a causal autoregressive process. The rest of this paper proceeds as follows. The next sectionnarrates our setup and discusses approaches to the problem. Section 3 then develops an estimation technique basedon lag one differences of the series. Section 4 proves consistency and asymptotic normality of these estimators andSection 5 assesses their performance in simulations. Section 6 applies the results to an annual precipitation series andSection 7 concludes the paper with comments.
Suppose that { X t } Nt =1 is a time series having an unknown number of mean shift changepoints, denoted by m , occurringat the unknown ordered times < τ < τ < · · · < τ m ≤ N . These m changepoints partition the series into m + 1 distinct segments, each segment having its own distinct mean. The model can be written as X t = κ s ( t ) + (cid:15) t . (1)If s ( t ) denotes the series’ segment at time t , which takes on values in { , , · · · , m } , then κ s ( t ) = µ i is constant forall times in the i th regime: κ s ( t ) = µ , ≤ t ≤ τ ,µ , τ + 1 ≤ t ≤ τ , ... µ m , τ m + 1 ≤ t ≤ N. We assume that { (cid:15) t } is a stationary causal AR ( p ) time series that applies to all regimes. Then { (cid:15) t } obeys (cid:15) t = φ (cid:15) t − + · · · + φ p (cid:15) t − p + Z t , t ∈ Z , (2)where { Z t } is IID white noise with zero mean, variance σ , and a finite fourth moment (this enables consistent es-timation of the autoregressive parameters φ , . . . , φ p ). While more general ARMA( p, q ) { (cid:15) t } could be considered,we work with AR( p ) errors because this model class is dense in all stationary short-memory series and estimation,prediction, and forecasting are easily conducted. Adding a moving-average component q ≥ induces considerablymore work and is less commonly found in changepoint applications. Few authors have considered autocovariance es-timation in this changepoint corrupted setting. Chakar et al. [2017], in studying the AR(1) case, develops an estimatorthat is robust to mean shifts: ˆ φ = (cid:18) median ≤ t ≤ N − | X t +2 − X t | (cid:19) (cid:18) median ≤ t ≤ N − | X t +1 − X t | (cid:19) − . (3)This is claimed to be consistent and satisfy the central limit theorem. It is not clear how to extend this work to caseswhere p > ; moreover, we show that this estimator does not work particularly well in AR (1) settings.A second way to handle the problem jointly estimates the AR model and the number and locations of the changepoints[Li and Lund, 2012]. For one way to do this, a penalized likelihood function can be optimized over all admissiblechangepoint configurations. If the changepoint configuration is specified (known), the sequence is easily centered to azero mean by subtracting segment sample means. Any ARMA ( p, q ) error model can be fitted to the centered series. Agenetic algorithm (GA) can be used to estimate the optimal penalized likelihood over all changepoint configurationsunder a variety of penalization schemes [Lund and Shi, 2020]. Unfortunately, the GA may take significant computing2utocovariance Estimation in the Presence of Changepoints A P
REPRINT X t t X t − − X t - X t−1 t d t − − Figure 1: An AR (1) series { X t } with three changepoints at the times t = 51 , , (top panel). Its first orderdifference (bottom panel) contains only three points with a non zero mean.time to find the optimum: there are N − different multiple changepoint configurations in a series of length N tosearch over. Sometimes, the GA may fail to converge at all.A third approach employs rolling windows. See Marriott and Pope [1954] and Orcutt and Winokur Jr. [1969] forestimation of autocorrelations via window methods and Beaulieu and Killick [2018] for applications of these methodsto changepoint problems in climatology. For the window length w , with w ≤ N , a moving window scheme generates N − w + 1 subsegments, the i th subsegment containing the points at the times i, . . . , i + w − . Each subsegmentis treated as a stationary time series (even though some may contain mean shifts and are thus truly nonstationary) andtime series parameters are estimated in subsegment i via the data in this subsegment only. The final estimates aretaken as medians of the estimates over all subsegments. The hope is that most windows will be changepoint free, andmedians over all subsegments will not be heavily influenced by the few windows that contain changepoints. Of course,such a scheme may not use all data efficiently in estimation. Moreover, Beaulieu and Killick [2018] demonstrated thatthe success of this estimation depends heavily on the choice of w . This section derives a system of linear equations that relates the autocorrelations of the differenced series to the AR( p )coefficients. First order differencing a series eliminates any piece-wise constant mean except at times where shiftsoccur; Figure 1 illustrates the simple idea. Authors have previously used differencing to estimate global parameters inthe changepoint literature. For examples, Fryzlewicz [2014] uses differencing to get an estimate of Var ( X t ) , althoughIID errors are assumed in this work; (3) from Chakar et al. [2017] is also based on differencing. However, there seemsto be no previous literature using differencing to estimate AR( p ) parameters in a setting corrupted by mean shifts. Asan aside, differencing also detrends a time series; the estimators below perform well if a time series contains bothchangepoints and a linear trend.A system of Yule-Walker moment equations can be used to estimate the AR( p ) parameters. Our equations have theform ρ d = M φ , (4)3utocovariance Estimation in the Presence of Changepoints A P
REPRINT where ρ d = [ ρ d (1) + 1 / , ρ d (2) , ρ d (3) , . . . , ρ d ( p )] T , ρ d ( h ) is the autocorrelation of the differenced series at lag h , φ = [ φ , φ , φ , · · · , φ p ] T , and M = − − (cid:0) + ρ d (1) (cid:1) · · · − (cid:16) + (cid:80) p − j =1 ρ d ( j ) (cid:17) ρ d (1) ρ d (0) ρ d (1) · · · ρ d ( p − ρ d (2) ρ d (1) ρ d (0) · · · ρ d ( p − ... ... ... . . . ... ρ d ( p − ρ d ( p − ρ d ( p − · · · ρ d (0) (5)This system is shown below to have a unique solution.To derive the system of equations in 4, make the notations γ X ( h ) = Cov ( X t , X t − h ) , d t = (cid:15) t − (cid:15) t − , and γ d ( h ) = Cov ( d t , d t − h ) . The covariance of { d t } and { X t } are related via γ d ( h ) = 2 γ X ( h ) − γ X ( h + 1) − γ X ( h − . Define r ( k ) = γ X ( k ) − γ X ( k − . Then r (1) = − γ d (0) / and r ( k ) = r ( k − − γ d ( k − for k ≥ . Solving thisdifference equation recursively yields r ( k ) = − γ d (0)2 − k − (cid:88) (cid:96) =1 γ d ( (cid:96) ) , which implies that r ( k ) γ d (0) = − (cid:34)
12 + k − (cid:88) (cid:96) =1 ρ d ( (cid:96) ) (cid:35) . (6)Combining with the AR ( p ) definition in (2), we extract ( a ) γ X ( h ) = φ γ X ( h −
1) + φ γ X ( h −
2) + · · · + φ p γ X ( h − p ) , h = 1 , . . . , p ;( b ) d t = φ d t − + · · · + φ p d t − p + Z t − Z t − , t = 0 , ± , . . . ;( c ) γ d ( h ) = φ γ d ( h −
1) + φ γ d ( h −
2) + · · · + φ p γ d ( h − p ) , h = 2 , . . . , p ; where (c) follows by multiplying both sides of (b) by d t − h and taking expectations; (b) establishes { d t } as a non-invertible ARMA( p, sequence.Subtracting (a) with h = 2 from (a) with h = 1 produces γ X (1) − γ X (2) = φ [ γ X (0) − γ X (1)] + φ [ γ X (1) − γ X (0)] + · · · + φ p [ γ X ( p − − γ X ( p − . Dividing this relation by γ d (0) and using (6) gives ρ d (1) + 12 = φ − φ − φ (cid:20)
12 + ρ d (1) (cid:21) − · · · − φ p (cid:20)
12 + ρ d (1) + · · · + ρ d ( p − (cid:21) , (7)where ρ d ( h ) = γ d ( h ) /γ d (0) denotes the lag h autocorrelation of { d t } . To get another p − moment equations, divide(c) by γ d (0) : ρ d (2) ... ρ d ( p ) = φ ρ d (1) ... ρ d ( p − + A φ ... φ p , (8)where A = ρ d (0) ρ d (1) · · · ρ d ( p − ρ d (1) ρ d (0) · · · ρ d ( p − ... ... . . . ... ρ d ( p − ρ d ( p − · · · ρ d (0) . Equation (4) simply writes (7) and (8) into a single linear system.We now show that (7) and (8) provide a unique solution for the AR coefficients in terms of the autocorrelations of thedifferenced series. To do this, note that A is invertible, implying that φ ... φ p = A − ρ d (2) ... ρ d ( p ) − φ ρ d (1) ... ρ d ( p − . (9)4utocovariance Estimation in the Presence of Changepoints A P
REPRINT
To combine this with (7), define the vector c T = − , − (cid:18)
12 + ρ d (1) (cid:19) , · · · , −
12 + p − (cid:88) j =1 ρ d ( j ) , and observe that φ
12 + c T A − ρ d (1) ... ρ d ( p − = ρ d (1) + 12 + c T A − ρ d (2) ... ρ d ( p ) . (10)We see that given the first p autocorrelations of the differenced series, there is a unique set of parameters φ , . . . , φ p satisfying (9) and (10), and thus satisfying (4).To estimate the AR( p ) parameters in practice, the sample autocorrelations of { d t } are calculated for the lags , , . . . , p .Our estimator has the form ˆ φ = ˆ M − ˆ ρ d , (11)where the elements of ˆ M and ˆ ρ d simply replace ρ d ( h ) with its estimator ˆ ρ d ( h ) = ˆ γ d ( h )ˆ γ d (0) = (cid:80) N − ht =2 ( d t − ¯ d )( d t + h − ¯ d ) (cid:80) Nt =1 ( d t − ¯ d ) . Here, ¯ d = ( N − − (cid:80) N − t =1 d t is the sample mean, which is included here to allow for the possibility of a linear trendin the original series.Invertibility of ˆ M follows an argument nearly identical to that for M , and is based on the fact that the matrix of sampleautocorrelations ˆ A is invertible with probability one (see Chapter 8 of Brockwell and Davis [1991]).If the number of changepoints m is small relative to N , then the mean shifts will have a negligible impact on theestimated covariance of the differences, since X t − X t − = d t − d t − except at the changepoint times τ , . . . , τ m .We end this section by estimating σ . Multiplying both sides of (b) by d t − , taking expectations, and solving forVar ( Z t ) = σ , yields σ = p (cid:88) j =1 φ j γ d ( j − − γ d (0) . A moment based estimator of the variance is hence ˆ σ = p (cid:88) j =1 ˆ φ j ˆ γ d ( j − − ˆ γ d (0) . (12)The next section shows that ˆ σ is a consistent estimator of σ . This section shows that if m = m ( N ) grows slowly enough in N , the estimators in the last section will be consistentand asymptotically normal. In particular to obtain asymptotic normality, we will assume that as n → ∞ :A.1 max ≤ k ≤ m ( n ) | µ k +1 − µ k | ≤ B (the maximum change size is bounded),A.2 m ( n ) = O ( √ n ) .We begin with asymptotic normality of the autocorrelations for first differences in the general ARMA( p, q ) case, whichmay be of distinct interest. The asymptotic normality of the AR( p ) estimators is a corollary to Theorem 1. Theorem 1. If { X t } Nt =1 follows (1) with { (cid:15) t } satisfying (2), then for each fixed positive integer k , as N → ∞ , √ N ˆ ρ d (1) − ρ d (1) ... ˆ ρ d ( k ) − ρ d ( k ) D −−→ N k ( , BWB T ) . A P
REPRINT
Here, the elements in W are given by Bartlett’s formula, (see Chapter 8 of Brockwell and Davis [1991]) and B hasform B = 12(1 − ρ (1)) − · · · − − · · · − − · · · ... ... ... ... . . . ... · · · − . Proof
We first show that the changepoints have negligible impact on estimated autocorrelations in the limit. To dothis, write d t = X t − X t − = ( (cid:15) t − (cid:15) t − ) + δ t , with δ t = ( µ k − µ k − ) I t = τ k +1 . If we let ˜ γ d ( h ) = (cid:80) N − ht =2 ( (cid:15) t − (cid:15) t − )( (cid:15) t + h − (cid:15) t + h − ) N , then √ N | ˆ γ d ( h ) − ˜ γ d ( h ) | ≤ m √ N m − (cid:88) t = τ j δ t ( (cid:15) t + h − (cid:15) t + h − + (cid:15) t − h − (cid:15) t − h − + δ t ) . The term on the right hand side converges to zero if N − / m → as N → ∞ and the sum is bounded in probability.The assumptions at the start of this section are sufficient to ensure this convergence.Now we find the asymptotic distribution of the autocorrelation of the differences of { (cid:15) t } . Note that ρ d ( h ) = − ρ ( h −
1) + 2 ρ ( h ) − ρ ( h + 1)2(1 − ρ (1)) and ˆ ρ d ( h ) = − ˆ ρ ( h −
1) + 2ˆ ρ ( h ) − ˆ ρ ( h + 1)2(1 − ˆ ρ (1)) . Theorem 1 now follows by the mapping theorem and well known results for asymptotic normality for sample autoco-variances for ARMA processes (see Chapter 8 of Brockwell and Davis [1991]).
Corollary 1.
Suppose that { X t } follows (1) with { (cid:15) t } satisfying (2) with q = 0 . For the estimator given in (11), as N → ∞ , √ N ˆ φ − φ ... ˆ φ p − φ p D −−→ N p ( , Σ ) , Here, Σ = MBWMB T . Proof of Corollary 1 . Since { d t } is stationary and ergodic, the elements of ˆ M converge to those of M in the almostsure sense. The conclusion of Corollary 1 now follows by application of the continuous mapping theorem. A simulation study with AR ( p ) errors is now conducted. In each simulation, N = 1000 and m is randomly generatedby the discrete uniform distribution Uniform { , , · · · , } , which roughly corresponds to the level of changepoint fre-quency in our ensuing data example. Any changepoint times are generated randomly within { , , . . . , N } with equalprobability — we do not impose any minimal spacing between successive changepoint times. The segment means µ i are randomly generated from a Uniform ( − . , . distribution. Ten thousand independent runs are conducted in allcases.We start with AR (1) errors, simulating φ randomly from Uniform ( − . , . and { Z t } is Gaussian white noisewith a unit variance. The estimator in (3) is denoted by AR1seg and the Yule-Walker difference estimator in (11) isdenoted by Diff. Raw rolling window estimators, using different window lengths are denoted by their lengths: N, N/2,N/5, N/10, N/20 and N/50.Our simulation results are summarized in Figure 2. The obvious winner is the Yule-Walker estimator based on firstorder differencing which is unbiased and has a smaller variance. The AR1seg estimator is unbiased as claimed; how-ever, it has larger variability than the Yule-Walker difference estimators. The performance of the raw rolling-windowestimators depends on the choice of the window length, but appears to be inferior to the difference based estimator,even with the optimal window size (which is likely somewhere between N/ and N/ in this simulation). It is hardto decide the optimal window length in practice and smaller window lengths considerably increase computation time.6utocovariance Estimation in the Presence of Changepoints A P
REPRINT
Estimators f ^ -f Figure 2: Box Plots of Estimates of the AR(1) Parameter φ −0.10.00.10.2 1000 2000 5000 10000 20000 Length of AR(2) Series f ^ -f AR(2) Coefficients f f Figure 3: Box Plots of AR(2) Coefficient EstimatesWe now move to AR(2) errors. In each AR(2) run, the AR coefficients were randomly generated (uniformly) from thetriangular region that guarantees causality: φ + φ < , φ − φ < , and | φ | < . In this setting, the changepointtotal is fixed at m = 9 and all segments are set to have equal length. All mean shifts alternate in sign with an absolutemagnitude of 2, the first shift moving the series upwards. The series length varies from N = 1000 to N = 20000 .Since p > , the AR1seg estimator is not applicable. The rolling-window estimator is dropped from consideration dueto its poor AR(1) performance and computational time. The simulation results show that estimator bias and variancedecreases towards zero as the length of the series increases, reinforcing the consistency result proven in the last section.Moving to AR (4) simulations, to meet causality requirements, the AR(4) characteristic polynomial is factored into itsfour roots /r , /r , /r , and /r : φ ( z ) = (1 − r z )(1 − r z )(1 − r z )(1 − r z ) . Causality implies that all r i should lie outside the complex unit circle. To meet this, r and r will be randomlygenerated from the Uniform ( − . , . , and r is a randomly generated complex number with modulus (cid:107) r (cid:107) < . . r is taken as the complex conjugate of r . This mixes real and complex roots in the AR polynomial. Other simulationsettings are identical to those in the AR (2) case. Figure 4 shows our results, which exhibit the same pattern in theAR (2) case, with decreasing bias and variance as N increases.For our final simulation, we return to the AR(1) setting and conduct a sensitivity analysis to the mean shift size. When m/N → as N → ∞ , the accuracy of this estimator is not influenced by changepoint locations but rather the mag-nitude of the mean shifts — the signal-to-noise ratio, defined as the absolute mean shift magnitude over the standard7utocovariance Estimation in the Presence of Changepoints A P
REPRINT −0.10.00.10.2 1000 2000 5000 10000 20000
Length of AR(4) Series f ^ -f AR(4) Coefficients f f f f Figure 4: Box Plots of AR(4) Coefficient Estimates −0.2−0.10.00.10.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5
Mean Shift Sizes f ^ -f Figure 5: An AR(1) Mean Shift Size Sensitivity Plotdeviation. For simplicity, σ is set to unity. The number of changepoints is fixed at m = 9 and their locations ran-domly generated over { , . . . N } with N = 1000 . In each run, the true φ is simulated from the Uniform ( − . , . distribution. The nine mean shifts are alternating, with fixed shift sizes between zero to considered across runs. Thedifference between the estimated ˆ φ and the true φ is presented in Figure 5, aggregated by mean shift size.The horizontal line in Figure 5 demarcates zero bias in ˆ φ ; the red curve depicts the average differences between ˆ φ and φ . Obviously, the larger the mean shift magnitude, the more our estimator degrades. However, larger mean shift sizescan easily be identified as outliers in the differenced series [Chen and Liu, 1993, McQuarrie and Tsai, 2003]. As such,the essential challenge lies with estimating the AR( p ) parameters in the presence of the smaller mean shifts. p Series
Some techniques can mistakenly flag changepoints when underlying positive dependence is ignored. For example,Lund and Shi [2020] argues that the London house price series shifts identified may be more attributable to thepositive correlations in the series than to actual mean shifts. CUSUM based techniques are severely affected bypositive correlation Shi et al. [2021]. To remedy this, authors recommend detecting changepoints from estimated8utocovariance Estimation in the Presence of Changepoints
A P
REPRINT φ = 0 . ˆ m /( SE ˆ m ) ˆ m D /( SE ˆ m D ) ˆ m /( SE ˆ m ) ˆ m D /( SE ˆ m D )Zero 3.85/(2.43) 0.17/(0.54) 0.02/(0.17) 0.00/(0.02)Three 5.46/(2.01) 3.03/(0.18) 3.07/(0.34) 3.00/(0.03) φ = 0 . ˆ m /( SE ˆ m ) ˆ m D /( SE ˆ m D ) ˆ m /( SE ˆ m ) ˆ m D /( SE ˆ m D )Zero 16.03/(3.92) 0.24/(0.70) 1.36/(1.62) 0.00/(0.04)Three 16.90/(3.81) 3.09/(0.40) 4.28/(1.43) 2.95/(0.37) φ = 0 . ˆ m /( SE ˆ m ) ˆ m D /( SE ˆ m D ) ˆ m /( SE ˆ m ) ˆ m D /( SE ˆ m D )Zero 30.77/(3.98) 0.40/(0.96) 12.65/(3.31) 0.01/(0.13)Three 31.45/(3.86) 2.48/(1.38) 14.33/(3.25) 1.59/(1.44)Table 1: AR (1) series with/without changepoints. The thresholding constant used in WBS is C = 1 . .versions of the one-step-ahead prediction residuals of the series [Bai, 1993, Robbins et al., 2011]. However, thisrequires estimation of the autocovariance structure of the series in the presence of the unknown changepoints. Amajor application of our methods serves to decorrelate series without requiring any knowledge about the changepointconfiguration. IID-based changepoint techniques, applied to the estimated one-step-ahead prediction resdiuals, canthen locate mean shifts in the series. The Yule-Walker difference estimator proposed here is extensively used in Shiet al. [2021] to do just this.Table 1 illustrates the improved performance of two popular multiple changepoint methods, Wild Binary Segmentation(WBS) [Fryzlewicz, 2014] and Pruned Exact Linear Time (PELT) [Killick et al., 2012]. In each run, an AR (1) series oflength N = 500 is simulated with φ fixed in { . , . , . } and white noise variance σ = 1 . The series has eitherzero or three equally spaced changepoints; the mean shift sizes ∆ are chosen to satisfy the constant signal-to-noiseratio requirement SNR = | ∆ | (cid:113) σ − φ = 2 . All simulations employ independent runs. In Table 1, ˆ m and SE ˆ m denote the average and standard error ofthe estimated number of changepoints when WBS and PELT are directly applied to the series. The quantities ˆ m D and SE ˆ m D denote the average and standard error of the estimated number of changepoints from the one-step-aheadprediction residuals.It is apparent that IID based WBS and PELT methods overestimate the number of changepoints in a dependent serieswhen correlation is ignored; PELT seems more resistant to dependence than WBS. In contrast, with the help of theproposed Yule-Walker difference estimator and decorrelation techniques, both WBS and PELT become much moreaccurate. Li and Lund [2012] studied annual precipitations from New Bedford and Boston, Massachusetts. The data are availablefrom the National Oceanic and Atmospheric Administration (NOAA). The ratio of these series (New Bedford toBoston) is displayed in Figure 6 along with a fitted mean of a model that allows for both multiple changepoints andAR(1) errors. Three changepoints, occurring at the years 1886, 1917, and 1967, are indicated.What is interesting here is the AR(1) parameter estimate: it fluctuates wildly over distinct methods. Specifically, ourdifference estimator and AR(1)seg methods produce antipodal estimates, as can be seen in Table 2. Our estimate agreesclosely with an estimate computed by assuming the three changepoint times are known, but the level of autocorrelationis significantly less than that estimated in a Yule-Walker scheme that ignores all three changepoint times. Overall, theresults show that one needs to be careful with changepoint problems with correlated data — mean shifts and correlationcan inject similar features into time series. 9utocovariance Estimation in the Presence of Changepoints
A P
REPRINT
Year P r e c i p i t a t i on R a t i o . . . . . . Figure 6: New Bedford to Boston Annual Precipitation Ratios with Three Changepoints Demarcated.Methods Estimate ( ˆ φ )Yule Walker Estimator (ignoring changepoints) . Robust Estimator (AR1seg) − . Difference Estimator . Yule Walker Estimator (given changepoint times) . Table 2: AR(1) Coefficient Estimation
Differencing can help estimate the autocovariance structure of an AR ( p ) series corrupted by mean shifts. Our Yule-Walker estimator for autoregressive models is easy to implement, computationally fast, consistent, and asymptoticallynormal. The proposed estimator is adversely impacted by large mean shifts; however these large shifts appear asoutliers in the differenced series and can easily be removed. With changepoints present, the difference estimatorsignificantly improves changepoint techniques developed for models with IID errors. In addition, the techniques areapplicable if the series has a linear trend (constant across all regimes) with intercept shifts. References
Bai, J. (1993). On the partial sums of residuals in autoregressive and moving average models.
Journal of Time SeriesAnalysis , 14(3):247–260.Beaulieu, C. and Killick, R. (2018). Distinguishing trends and shifts from memory in climate data.
Journal of Climate ,31:9519–9543.Brockwell, P. J. and Davis, R. A. (1991).
Time Series: Theory and Methods . Springer Series in Statistics. Springer-Verlag, New York, second edition.Chakar, S., Lebarbier, E., L´evy-Leduc, C., Robin, S., et al. (2017). A robust approach for estimating change-points inthe mean of an AR (1) process.
Bernoulli , 23(2):1408–1447.Chen, C. and Liu, L.-M. (1993). Joint estimation of model parameters and outlier effects in time series.
Journal of theAmerican Statistical Association , 88(421):284–297.Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection.
The Annals of Statistics ,42(6):2243–2281.Hall, C. B., Lipton, R. B., Sliwinski, M., and Stewart, W. F. (2000). A change point model for estimating the onset ofcognitive decline in preclinical alzheimer’s disease.
Statistics in Medicine , 19(11-12):1555–1566.Hewaarachchi, A. P., Li, Y., Lund, R., and Rennie, J. (2017). Homogenization of daily temperature data.
Journal ofClimate , 30(3):985–999.Killick, R., Fearnhead, P., and Eckley, I. A. (2012). Optimal detection of changepoints with a linear computationalcost.
Journal of the American Statistical Association , 107(500):1590–1598.10utocovariance Estimation in the Presence of Changepoints
A P
REPRINT
Li, S. and Lund, R. (2012). Multiple changepoint detection via genetic algorithms.
Journal of Climate , 25(2):674–686.Lund, R. and Shi, X. (2020). Commentary on: Detecting possibly frequent change-points: Wild binary segmentation2 and steepest-drop model selection.
Journal of the Korean Statistical Society , 49:1090–1095.Maidstone, R., Hocking, T., Rigaill, G., and Fearnhead, P. (2017). On optimal multiple changepoint algorithms forlarge data.
Statistics and Computing , 27(2):519–533.Marriott, F. and Pope, J. (1954). Bias in the estimation of autocorrelations.
Biometrika , 41(3/4):390–402.McQuarrie, A. D. and Tsai, C.-L. (2003). Outlier detections in autoregressive models.
Journal of Computational andGraphical Statistics , 12(2):450–471.Norwood, B. and Killick, R. (2018). Long memory and changepoint models: A spectral classification procedure.
Statistics & Computing , 28(2):291–302.Orcutt, G. H. and Winokur Jr., H. S. (1969). First order autoregression: inference, estimation, and prediction.
Econo-metrica: Journal of the Econometric Society , pages 1–14.Page, E. S. (1954). Continuous inspection schemes.
Biometrika , 41(1-2):100–115.Robbins, M., Gallagher, C., Lund, R., and Aue, A. (2011). Mean shift testing in correlated data.
Journal of TimeSeries Analysis , 32(5):498–511.Shi, X., Gallagher, C., Lund, R., and Killick, R. (2021). A comparison of single and multiple changepoint techniquesfor time series data. arXiv preprint arXiv:2101.01960arXiv preprint arXiv:2101.01960