[PDF] On predictability of ultra short AR(1) sequences

Abstract

Full PDF

aa r X i v : . [ s t a t . M E ] A ug On predictability of ultra short AR(1) sequences ∗ N ikolai D okuchaev † and L in -Y ee H in Department of Mathematics & Statistics, Curtin University,GPO Box U1987, Perth, 6845 Western AustraliaOctober 17, 2018

Abstract

This paper addresses short term forecast of ultra short AR(1) sequences (4 to 6 terms only)with a single structural break at an unknown time and of unknown sign and magnitude. Asprediction of autoregressive processes requires estimated coe ﬃ cients, the e ﬃ ciency of whichrelies on the large sample properties of the estimator, it is a common perception that predic-tion is practically impossible for such short series with structural break. However, we obtain aheuristic result that some universal predictors represented in the frequency domain allow cer-tain predictability based on these ultra short sequences. The predictors that we use are universalin a sense that they are not oriented on particular types of autoregressions and do not requireexplicit modelling of structural break. The shorter the sequence, the better the one-step-aheadforecast performance of the smoothed predicting kernel. If the structural break entails a modelparameter switch from negative to positive value, the forecast performance of the smoothedpredicting kernel is better than that of the linear predictor that utilize AR(1) coe ﬃ cient esti-mated from the ultra short sequence without taking the structural break into account regardlesswhether the innovation terms in the learning sequences are constructed from independent andidentically distributed random Gaussian or Gamma variables, scaled pseudo-uniform variables,or ﬁrst-order auto-correlated Gaussian process. Keywords: predicting, non-stationarity, structural break, autoregressive process. ∗ We acknowledge provision of ICTsupport and computing resources by Curtin IT Services http: // cits.curtin.edu.au.Curtin Information Technology Services (CITS) provides Information and Communication Technology systems andservices in support of Curtin’s teaching, learning, research and administrative activities. We acknowledge use of com-puting resources from the NeCTAR Research Cloud http: // † Corresponding author. Department of Mathematics and Statistics, Curtin University, GPO Box U1987, Perth,6845 Western Australia; email: [email protected]

Introduction

In this paper, we readdress the problem of one-step-ahead forecast of a ﬁrst order autoregressiveprocess, AR(1), with one structural break, i.e., a permanent change, in the AR(1) model parameter.Speciﬁcally, we consider the scenario where the learning sequence, i.e., the segment of time seriesprocess used for model estimation and forecast, is very short, and where the structural break occursat a random time point in the learning sequence.Forecasting autoregressive process is a well developed area with well known results. One strandof literature addresses this problem via the time series models that are primarily speciﬁed from thetime-domain modelling perspective (see, among many others, Box and Jenkins, 1976; Abraham and Ledolter,1986; Stine, 1987; Cryer et al., 1990; Cortez et al., 2004; Hamilton, 1994; Xia and Zheng, 2015,and the references therein), or via the exponential smoothing and the ﬁltering techniques con-structed based on state-space approach where the smoothers and ﬁlters are primarily character-ized in the time-domain (see, e.g., Roberts, 1982; Williams, 1987; Paige and Saunders, 1977;Chatﬁeld and Yar, 1988; Ord et al., 1997; Chatﬁeld et al., 2001; Hyndman et al., 2002; Berm ´udez et al.,2006; Hyndman et al., 2008, and references therein). A separate yet related strand of litera-ture addresses this problem via smoothing and ﬁltering techniques where the smoothers and ﬁl-ters are primarily characterized in the frequency-domain (see, e.g., Cambanis and Soltani, 1984;Ledolter and Kahl, 1984; Lyman and Edmonson, 2001; Dokuchaev, 2012; 2014; 2016, and refer-ences therein). In this paper, we address this problem via the convolution of a near-ideal causalsmoothing ﬁlter and a predicting kernel that are primarily characterized in the frequency-domain(Dokuchaev, 2012; 2014; 2016).Many strategies have been proposed to address the practical concern of possible model param-eters structural break in the learning sequence that may compromise modelling e ﬃ ciency andforecast performance of the time series model (see, among many others, Bagshaw and Johnson,1977; Sastri, 1986; Andrews, 1993; Bai and Perron, 1998; 2003; Pesaran and Timmermann, 2004;Clements and Hendry, 2006; Davis et al., 2006; Lin and Wei, 2006; Kim et al., 2009; Rossi, 2013;1esaran et al., 2013, and the references therein). Implementation of these strategies require theavailability of learning sequences that are considerably longer that those considered in this pa-per. We cite a few examples. The method proposed by Bai and Perron (1998; 2003) to esti-mate the timing of structural break requires at least 10 observations on either side of the break.Pesaran and Timmermann (2007) simulated random processes that each contain 100 to 200 obser-vations to mimic learning sequences with structural break in AR(1) model parameter in order toassess the performance of their proposed set of cross-validation and forecast combination proce-dures that use pre-break and post-break data to perform time series forecast. Giraitis et al. (2013)simulated time series processes that each contain 200 observations to mimic learning sequenceswith structural break in the mean of the simulated random processes in order to assess the perfor-mance of their proposed one-step-ahead forecast algorithms based on adaptive linear ﬁltering.In this paper, we consider the scenario when the learning sequence only contain 4 to 6 data points,and, as such, are too short to e ﬀ ectively apply structural break timing estimation strategies, andto e ﬃ ciently estimate pre-break and post-break AR(1) model parameters. We consider a familyof linear ﬁlters proposed by Dokuchaev (2016) where the impulse response function is obtainedby inverse Z-transform of the product between the transfer function of a family of near-idealcausal smoothers (Dokuchaev, 2016) and the transfer function of a family of predicting kernels(Dokuchaev, 2014). The Monte Carlo experiments reported in Dokuchaev (2016) have demon-strated clear advantage of using the convolution of the near-ideal causal smoother and the predict-ing kernel compared to using the predicting kernel alone to generate the impulse response functionfor the linear predictor in terms of one-step-ahead forecast performance of AR(2) processes with-out structural break. However, their relative forecast performance have never been assessed in thecontext of AR(1) processes with a single, unknown, random time point structural break in a veryshort learning sequence. Additionally, numerical experiments reported in Dokuchaev (2016) uti-lize 100 observations in the learning sequence. This begs the question whether the proposed linearpredictor will perform well in the context of a very short learning sequence with structural break.This paper seeks to close this gap in the literature.Following the choice of benchmark used in, among others, Pesaran and Timmermann (2007) and2iraitis et al. (2013), we use the one-step-ahead forecasts from an AR(1) model that ignores struc-tural break and utilize all observations, pre-break and post-break, as our benchmark since this isan appropriate model to use in situations with no breaks. The main contribution of this paper isour demonstration via simulation experiment that the one-step-ahead forecast performance of thisfamily of linear ﬁlters is better than that of our chosen benchmark. Additionally, its performanceis comparable to that of the one-step-ahead forecasts from an AR(1) model with the same modelparameter as the synthetic AR(1) model parameter used to simulate the post-break random process.The rest of the paper is as follows. Section 2 details the problem formulation. Section 3 presentsthe Monte Carlo simulation results. Section 4 concludes. Consider a stochastic discrete time process described by AR(1) autoregression x ( t ) = β ( t ) x ( t − + σ η ( t ) , t = , . . . , d − x ( − = , (1) β ( t ) =  β , t < θ ,β , t ≥ θ . where β ∈ ( β min , β max ), β ∈ ( β min , β max ), β min < β max , | β min | < | β max | < σ ∈ (0 , ∞ ), and η ( t ) is theinnovation term of the time series. This model features a single random structural break to takeplace at a random time θ with the values in the set { , . . . , d − } . We assume that η ( t ) are mutuallyindependent for all t and independent on η .We consider predicting problem for this process in the case where an ultra short sequences with nomore than six data points are available. We investigate the performance of linear time-invariant predictors with an output y ( t ) = t X τ = h ( t − τ ) x ( τ ) , t ≤ d − , (2)3here d ≤

6. The process y ( t ) is supposed to approximate the process x ( t + h : Z → R ,where Z is the set of integers, and R is the set of real numbers.In our experiments, we calculate predicting kernels h via their Z-transforms that are representedexplicitly, such that h ( t ) = π Z π − π H (cid:16) e i ω (cid:17) e i ω t d ω , t ∈ Z . (3)Here complex-valued functions H : C → C are transfer functions of the corresponding predictors.In our experiments, we used two di ﬀ erent transfer functions H ( z ) = K ( z ) , (4)and H ( z ) = K ( z ) F ( z ) . (5)Here z ∈ C , K ( z ) = z − exp " − γ z + − γ − r . (6)The function K ( z ) is the transfer function of an one-step predictor from Dokuchaev (2016); r > γ > F ( z ) = exp (1 − a ) p z + a + G ( z ) ! m , (7) G ( z ) = − ξ ( a , p ) + γ ( a , p ) N (cid:16) ( − N z − N − (cid:17) ,ξ ( a , p ) = exp[ − (1 − a ) p − ] ,γ ( a , p ) = | − a | p − ξ ( a , p ) . Here a ∈ (0 , p ∈ (1 / , m ≥

1, and N ≥

1, are the parameters, m , n ∈ Z . The function F ( z ) isthe transfer function for a smoothing ﬁlter introduced in Dokuchaev (2016).It can be noted that these linear predictors were constructed for semi-inﬁnite one-sided sequences,since the corresponding kernels h ( t ) have inﬁnite support on Z . In theory, the performance of these4redictors is robust with respect to truncation; see the discussion on robustness in Dokuchaev(2012) and Dokuchaev (2016). However, we found, as a heuristic result of this paper, that theirapplication to the ultra short series also brings some positive result, meaning that these sequencesfeature some predictability. Worthy of note again is that implementation of these predictors doesnot involve explicit modelling of and adjustment for structural break, including break time andmagnitude, Moreover, these predictors do not even require that the underlying process is an au-toregressive process or any other particular kind of a processes. We compare the performance of our predictors with an “ideal” linear predictor y ideal ( d − = β ( d ) x ( d − , (8)where β ( d ) = β is the post-break AR(1) model parameters that generate the post-break obser-vations x ( d −

1) and x ( d ). This predictor is not feasible unless β ( d ) is supposed to be known.In our setting, β ( d ) is unknown and has to be estimated from the observations. We will use theperformance of this predictor as a benchmark.Additionally. we will compare the performance of our predictors with the performance of thepredictor y AR (1) ( d − = ˆ β ( d ) x ( d − , (9)where ˆ β ( d ) is estimated by ﬁtting an AR(1) model to the sequence { x ( τ ) } d − τ = that involve pre-breakand post-break observations using the build-in function ar.ols() in the R computing environment(R Core Team, 2016) implementing the ordinary least squares model parameter estimation strategy(pp. 368-370, Luetkepohl, 2008). This is an appropriate model estimation procedure to use if thesequence { x ( τ ) } d − τ = does not contain structural break. By choosing this predictor as the benchmarkfor our numerical experiment, we seek to address the question of how costly is it to ignore breakswhen performing one-step-ahead forecasting the direction of a time series using the predictionalgorithms considered, i.e., (4), (5), and (8), relative to (9).5 Simulation experiment

We perform simulation experiments to investigate the one-step ahead forecast performance of (4),(5), and (8), relative to (9) in predicting x ( d ), given { x ( τ ) } d − τ = simulated from (1) using four di ﬀ erentspeciﬁcations of ( β , β )1. β ∈ (0 , , β ∈ (0 , β ∈ ( − , , β ∈ ( − , β ∈ ( − , , β ∈ (0 , β ∈ (0 , , β ∈ ( − , ﬀ erent speciﬁcations of η ( t )1. Independent and identically distributed (IID) Gaussian innovations: In this setting, we spec-ify η ( t ) ∼ N (0 ,

1) (10)as IID random numbers drawn from the standard Gaussian distribution.2. IID shifted Gamma innovation: In this setting, we specify η ( t ) = γ ( t ) − √ { γ ( t ) } d − t = were random numbers drawn randomly from Gamma distribution withshape parameter 2 and scale parameter 2 − / , i.e., Γ (2 , − / ).3. IID scaled pseuo-uniform innovation: In this setting, we specify η ( t ) = √ (cid:0) exp( t + s ) − ⌊ exp( t + s )) ⌋ − / (cid:1) (12)where t = , . . . , d − s = , . . . , N sim , and where N sim is the total number of simulations tobe performed. 6. Auto-correlated Gaussian innovation: In this setting, we specify η ( t ) = − / ( η ( t ) + η ( t − { η ( t ) } d − t = were IID random numbers drawn randomly from N (0 ,

1) and the lag-oneauto-correlation is E[ η ( t ) η ( t − = . y ( d − = d − X τ = h ( t − τ ) x ( τ ) ≈ x ( d ) , (14)where y ( d − = y KH ( d −

1) and y ( d − = y K ( d −

1) are the one-step-ahead forecasts, andwhere h ( t − τ ) = h KH ( t − τ ) and h ( t − τ ) = h K ( t − τ ) are the impulse response functions for(4) and (5) respectively. Following the choice of parameters used in Dokuchaev (2016), we set a = . , p = . , N = , m =

2, for the smoothing ﬁlter (7), and set γ = .

1, for the predictingkernel (6). We investigate the sensitivity of the predicting kernel for some di ﬀ erent values of r ,where r ∈ { . , . , . , } , and we consider three di ﬀ erent lengths of ultra short sequence d , where d ∈ { , , } .The ideal linear predictor (8) is implemented as y ideal ( d − = β ( d ) x ( d − ≈ x ( d ) . (15)For this predictor, one needs to know β ( d ), i.e., the post-break AR(1) model parameters usedto simulate x ( d ). In practice, it is impossible to know β ( d ). We include (15) as it represents atheoretical ideal benchmark.The linear predictor (9) is implemented as y AR (1) ( d ) = ˆ β ( d ) x ( d − ≈ x ( d ) , (16)where ˆ β ( d ) is estimated by ﬁtting an AR(1) model to the learning sequence { x ( τ ) } d − τ = . This is a com-monly used approach in AR(1) time series forecasting, and one that depends on the large-sampleproperties of the available time series for e ﬃ cient model parameter estimation. We are interested7o investigate the ﬁnite-sample properties in terms of forecast performance of (14) relative to (16)in the context of ultra short learning sequences considered in this paper.For each combination of ( β , β ), η ( t ), r , and d , we perform N sim simulations where N sim ∈ { × , × , × } . For each simulation, we simulate an AR(1) processes with a single randomstructural break at random unknown time point following the data generation process (1), eachcontaining d + β ( t ), β ( t ), σ , and η ( t ) are mutually independent. The ﬁrst d observations are used as the sequence based on which we forecast the ( d + E [ · ] denote the sample mean across the N sim Monte Carlo trials performed for each scenarioindexed by s where s = , . . . , N sim . Speciﬁcally, we let e KH = (cid:16) E ( x ( d ) − y KH ( d − (cid:17) / , e K = (cid:16) E ( x ( d ) − y K ( d − (cid:17) / , e ideal = (cid:16) E ( x ( d ) − y ideal ( d − (cid:17) / , e AR (1) = (cid:16) E (cid:0) x ( d ) − y AR (1) ( d − (cid:1) (cid:17) / , be the sample root-mean-squared error (RMSE) for (14) that implement (4) and (5), (15), and (16)respectively.We carry out the simulation experiments in the R computing environment (R Core Team, 2016).Simulation of the learning sequence is carried out by iterative application of (1). The estimationof AR(1) parameter ˆ β for the implementation of (16) is performed using the ar.ols() script in R . Numerical integrations carried out to map (4) and (5) to their respective impulse response func-tions to be used in (14) are implemented via the myintegrate() script in the R add-on package elliptic proposed in Hankin (2006).Table 1 depicts the simulation experiments results for the setting with β , β ∈ (0 ,

1) and η ( t ) ∼ N (0 , d = , ,

6, and for the four di ﬀ erent values of pre-dicting kernel parameter r , r = . , . , . ,

2, the RMSE of the smoothed predicting kernellinear predictor, is smaller than the RMSE of the linear predictor that utilize AR(1) model pa-rameter estimated based on the learning sequence ignoring the presence of structural break (16).8he shorter the learning sequence, the better the performance of the smoothed predicting kernellinear predictor. This trend is consistent across three di ﬀ erent sizes of Monte Carlo simulation N sim ∈ { × , × , × } .Worthy of note is that this smoothed predictor does not require explicit modelling of structuralbreak. In practice, when the available learning sequence is short, and the model parameter struc-tural break time and magnitude uncertain, it is not possible to e ﬃ ciently apply structural breakestimation and adjustment procedures for parameter estimation and time series forecasting dueto series length constraint. In this context, the smoothed predicting kernel (5) appears to be analternative approach that may o ﬀ er satisfactory forecast performance, circumventing the need ofresorting to model parameter estimation that ignore structure break.The fact that the RMSE of the predicting kernel linear predictor without smoothing (4) is largerthan the RMSE of the linear predictor (16) highlights the role of the near-ideal causal smoother(7) in improving the forecast performance of (4). By dampening the high frequency noise, thesmoothed prediction kernel is more able to capture the salient features of the simulated AR(1)process with random structural break based on a short learning sequence in order to deliver betterone-step-ahead forecast performance than the linear predictor (16). Without the aid of the smooth-ing kernel, the performance of the predicting kernel (4) is, in general, even poorer than that of thelinear predictor (16) that relies on model parameter estimate from an AR(1) model that ignoresstructural break.It is not surprising that the performance of the linear predictor (16) is poorer than the ideal predictor(15). Utilizing pre-break and post-break data to estimate post-break model parameter when thebreak time and magnitude are unknown inevitably leads to parameter estimation error. Althoughcross-validation methods have been proposed to utilize pre-break and post-break data to use pre-break data to estimate the parameters of the model used to compute out-of sample forecasts (see,e.g., Pesaran and Timmermann, 2007), the number of observations required to implement thesemethodologies is considerably larger than those we consider in this paper.9able 2 depicts the results of simulations with η ( t ) ∼ N (0 ,

1) for three other model parametersettings, i.e., a wider range of possible model parameters with β , β ∈ ( − , β ∈ ( − , , β ∈ (0 , β ∈ (0 , , β ∈ ( − , β . If β ∈ (0 , η ( t ) is as deﬁnedin (11), while Table 4 depicts those pertaining to the setting where η ( t ) is as deﬁned in (13). Theyshows similar trends as those demonstrated for those of (10) as depicted in Table 1 and Table 2above.However, the numerical results pertaining to simulation scenarios with IID innovation terms are insome ways di ﬀ erent from those with correlated innovation terms. Table 5 depicts a subset of thesimulation results pertaining to the setting where η ( t ) is as deﬁned in (12) where E[ η ( t ) η ( t − = .

5. While the forecast performance of the smoothed predicting kernel (5) is still better than thelinear predictor (16) for β ∈ ( − ,

0) and β ∈ (0 ,

1) in this context where the innovation terms areauto-correlated, it is not so for the remaining three simulation scenarios.10able 1: One-step-ahead forecast performance with β , β ∈ (0 , θ ∈ { , . . . , d − } , and η ( t ) ∼ N (0 , e ideal e AR (1) e K e KH e ideal / e AR (1) e K / e AR (1) e KH / e AR (1) Panel (a): β , β ∈ (0 , , N sim = × r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β , β ∈ (0 , , N sim = × r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β , β ∈ (0 , , N sim = × r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = θ ∈ { , . . . , d − } , η ( t ) ∼ N (0 ,

1) and N sim = × . e ideal e AR (1) e K e KH e ideal / e AR (1) e K / e AR (1) e KH / e AR (1) Panel (a): β , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ ( − , , β ∈ (0 , r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ (0 , , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = θ ∈ { , . . . , d − } ,and η ( t ) = γ ( t ) − √

2, where γ ( t ) ∼ Γ (2 , − / ), and N sim = × . e ideal e AR (1) e K e KH e ideal / e AR (1) e K / e AR (1) e KH / e AR (1) Panel (a): β , β ∈ (0 , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ ( − , , β ∈ (0 , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ (0 , , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = θ ∈ { , . . . , d − } , η ( t ) = √ (cid:0) exp( t + s ) − ⌊ exp( t + s )) ⌋ − / (cid:1) , and N sim = × . e ideal e AR (1) e K e KH e ideal / e AR (1) e K / e AR (1) e KH / e AR (1) Panel (a): β , β ∈ (0 , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ ( − , , β ∈ (0 , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ (0 , , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = θ ∈ { , . . . , d − } ,and θ ∈ { , . . . , d − } , η ( t ) = − / η ( t ) + − / η ( t − η ( t ) ∼ N (0 , N sim = × . e ideal e AR (1) e K e KH e ideal / e AR (1) e K / e AR (1) e KH / e AR (1) Panel (a): β , β ∈ (0 , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ ( − , , β ∈ (0 , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = β ∈ (0 , , β ∈ ( − , r = . , d = r = . , d = r = . , d = r = , d = r = , d = r = , d = Conclusions

This paper addresses the problem of one-step-ahead forecast of an AR(1) process with a singlestructural break at an unknown time and of unknown sign and magnitude within a very short learn-ing sequence. We analysed, via simulation experiments, the forecast performance of a smoothedpredicting kernel algorithm relative to that of a linear predictor that utilize the AR(1) model param-eter estimated from the learning sequence without taking into account the presence of structuralbreak.It appears that the shorter the learnings sequence, the better the forecast performance of thesmoothed predicting kernel relative to the linear predictor. Regardless whether the innovationterms in the learning sequences are constructed from IID random Gaussian variables, IID randomGamma variables, IID scaled pseudo-uniform variables, or ﬁrst-order auto-correlated Gaussianprocess, the forecast performance of the smoothed predicting kernel is better than that of the linearpredictor if the AR(1) model parameter switches from a negative value to a positive value in thelearning sequence, i.e., β ∈ ( − , , β ∈ (0 , β , β ∈ (0 , β , β ∈ ( − , β ∈ (0 , , β ∈ (0 , ﬃ cient AR(1) process (see, among others, Leipus et al., 2006) where theAR(1) model parameter between any two sequential observations are independent and identicallydistributed random variables from the uniform distribution U [0 , eferences Abraham, B. and Ledolter, J. (1986). Forecast Functions Implied by Autoregressive IntegratedMoving Average Models and Other Related Forecast Procedures,

International Statistical Re-view (1): 51–66.Andrews, D. (1993). Tests for Parameter Instability and Structural Change With Unknown ChangePoint, Econometrica (4): 821–856.Bagshaw, M. and Johnson, R. (1977). Sequential procedures for detecting parameter change in atime-series model, (359): 593–597.Bai, J. and Perron, P. (1998). Estimating and Testing Linear Models with Multiple StructuralChanges, Econometrica (1): 47–78.Bai, J. and Perron, P. (2003). Computation and Analysis of Muptile Structural Change Models, Journal of Applied Econometrics (1): 1–2.Berm ´udez, J., Segura, J. and Vercher, E. (2006). Improving Demand Forecasting Accuracy UsingNonlinear Programming Software, The Journal fo the Operational Research Society (1): 94–100.Box, G. and Jenkins, G. (1976). Time Series Analysis: Forecasting and Control , Holden Day, SanFrancisco, USA.Cambanis, S. and Soltani, A. (1984). Prediction of Stable Processes: Spectral and Moving AverageRepresentations,

Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete (4): 593–612.Chatﬁeld, C., Koehler, A., Ord, K. and Snyder, R. (2001). A New Look at Models for ExponentialSmoothing, Journal of the Royal Statistical Society. Series D (The Statistician (2): 147–159.Chatﬁeld, C. and Yar, M. (1988). Holt-Winters Forecasting: Some Practical Issues, Journal of theRoyal Statistical Society. Series D (The Statistician) (2): 129–140.17lements, M. P. and Hendry, D. (2006). Forecasting with breaks, in G. Elliott, C. Granger andA. Timmermann (eds),

Handbook of Economic Forecasting, Volume 1 , Elsevier B.V., Amster-dam, pp. 606–651.Cortez, P., Rocha, M. and Neves, J. (2004). Evolving time series forecasting arma models,

Journalof Heuristics (4): 415–429.Cryer, J., Nankervis, J. and Savin, N. (1990). Forecast error symmetry in arima models, Journalof the American Statistical Association (411): 724–728.Davis, R., Lee, T. and Rodriguez-Yam, G. (2006). Structural Break Estimation for NonstationaryTime Series Models, Journal of the American Statistical Association (473): 223–239.Dokuchaev, N. (2012). Predictor for discrete time processes with energy decay on higher frequen-cies,

IEEE Transaction of Signal Processing (11): 6027–6030.Dokuchaev, N. (2014). Volatility Estimation from Short Time Series of Stock Prices, Journal ofNonparametric statistics (2): 373–384.Dokuchaev, N. (2016). Near-ideal causal smoothing ﬁlters for the real sequences, Signal Process-ing : 285–293.Giraitis, L., Kapetanios, G. and Price, S. (2013). Adaptive Forecasting in the Presence of Recentand Ongoing Structural Change,

Journal of Econometrics (2): 153–170.Hamilton, J. D. (1994).

Time series analysis , Princenton University Press, New Jersey.Hankin, R. (2006). Introducing elliptic, an R package for elliptic and modular functions,

Journalof Statistical Software (7).Hyndman, R., Koehler, A., Snyder, R. and Grose, S. (2002). A state space framework for auto-matic forecasting using exponential smoothing methods, International Journal of Forecasting (3): 439–454. 18yndman, R., Koehler, A., Snyder, R. and Ralph, D. (2008). Forecasting with Exponential Smooth-ing: The State Space Approach , Springer Verlag, Berlin.Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). ℓ Trend Filtering,

SIAM Review (2): 339–360.Ledolter, J. and Kahl, D. (1984). Adaptive Filtering: An Empirical Evaluation, The Journal of theOperational Research Society (4): 337–345.Leipus, R., Paulauskas, V. and Surgailis, D. (2006). On a Random-coe ﬃ cient AR(1) Processwith Heavy-tailed Renewal Switching Coe ﬃ cient and Heavy-tailed Noise, Journal of AppliedProbability (2): 421–440.Lin, J. and Wei, C.-Z. (2006). Lecture Notes-Monograph Series, Time Series and Related Top-ics: In Memory of Ching-Zong Wei, Institute of Mathematical Statistics, chapter ForecastingUnstable Processes, pp. 72–92.Luetkepohl, H. (2008). Introduction to Multiple Time Series Analysis , Springer Verlag, Berlin.Lyman, R. J. and Edmonson, W. (2001). Linear Prediction of Bandlimited Processes with FlatSpectral Densities,

IEEE Transactions on Signal Processing (7): 1564–1569.Ord, J., Koehler, A. and Snyder, R. (1997). Estimation and prediction for a class of dynamicnonlinear statistical models, Journal of the American Statistical Association (440): 1621–1629.Paige, C. and Saunders, M. (1977). Least sequares estimation of discrete linear dynamic systemsusing orthogonal transformations, SIAM Journal on Numerical Analysis (2): 180–193.Pesaran, M., Pick, A. and Pranovich, M. (2013). Optimal forecasts in the presence of structuralbreaks, Journal of Econometrics (2): 134–152.Pesaran, M. and Timmermann, A. (2004). How costly is it to ignore breaks when forecasting thedirection of a time series?,

International Journal of Forecasting (3): 411–425.19esaran, M. and Timmermann, A. (2007). Selection of estimation window in the presence ofbreaks, Journal of Econometrics (1): 134–161.R Core Team (2016).

R: A Language and Environment for Statistical Computing , R Foundationfor Statistical Computing, Vienna, Austria.

URL: https: // / Roberts, S. (1982). A General Class of Holt-Winters Type Forecasting Models,

ManagementScience (7): 808–820.Rossi, B. (2013). Chapter 21 - Advances in Forecasting under Instability, in G. Elliott and A. Tim-mermann (eds),

Handbook of Economic Forecasting , Vol. 2, Part B of

Handbook of EconomicForecasting , Elsevier, pp. 1203–1324.Sastri, T. (1986). A Recursive Algorithm for Adaptive Estimation and Parameter Change Detectionof Time Series Models,

The Journal of the Operational Research Society (10): 987–999.Stine, R. (1987). Estimating Properties of Autoregressive Forecasts, Journal of the AmericanStatistical Association : 1072–1078.Williams, T. (1987). Holt-Winters Forecasting, The Journal of the Operational Research Society (6): 553–560.Xia, Y. and Zheng, W. (2015). Novel Parameter Estimation of Autoregressive Signals in the pres-ence of Noise, Automatica62