[PDF] Improving forecasting by subsampling seasonal time series

Abstract

Time series forecasting plays an increasingly important role in modern business decisions. In today's data-rich environment, people often aim to choose the optimal forecasting model for their data. However, identifying the optimal model requires professional knowledge and experience, making accurate forecasting a challenging task. To mitigate the importance of model selection, we propose a simple and reliable algorithm to improve the forecasting performance. Specifically, we construct multiple time series with different sub-seasons from the original time series. These derived series highlight different sub-seasonal patterns of the original series, making it possible for the forecasting methods to capture diverse patterns and components of the data. Subsequently, we make forecasts for these multiple series separately with classical statistical models (ETS or ARIMA). Finally, the forecasts of these multiple series are combined with equal weights. We evaluate our approach on widely-used forecasting competition data sets (M1, M3, and M4) in terms of both point forecasts and prediction intervals. We observe performance improvements compared with the benchmarks. Our approach is particularly suitable and robust for the data with higher frequency. To demonstrate the practical value of our proposition, we showcase the performance improvements from our approach on hourly load data that exhibit multiple seasonality.

Full PDF

IImproving forecasting with sub-seasonal time series patterns

Xixi Li a , Fotios Petropoulos b , Yanfei Kang c, ∗ a School of Mathematics,University of Manchester, UK. b School of Management, University of Bath, UK. c School of Economics and Management, Beihang University, China.

Abstract

Time series forecasting plays an increasingly important role in modern business decisions.In today’s data-rich environment, people often aim to choose the optimal forecastingmodel for their data. However, identifying the optimal model often requires professionalknowledge and experience, making accurate forecasting a challenging task. To mitigatethe importance of model selection, we propose a simple and reliable algorithm and suc-cessfully improve the forecasting performance. Speciﬁcally, we construct multiple timeseries with diﬀerent sub-seasons from the original time series. These derived series high-light diﬀerent sub-seasonal patterns of the original series, making it possible for the fore-casting methods to capture diverse patterns and components of the data. Subsequently,we make forecasts for these multiple series separately with classical statistical models(ETS or ARIMA). Finally, the forecasts of these multiple series are combined with equalweights. We evaluate our approach on the widely-used forecasting competition datasets(M1, M3, and M4), in terms of both point forecasts and prediction intervals. We observeimprovements in performance compared with the benchmarks. Our approach is partic-ularly suitable and robust for the datasets with higher frequencies. To demonstrate thepractical value of our proposition, we showcase the performance improvements from ourapproach on hourly load data.

Keywords:

Forecasting, Forecast combination, Sub-seasonal patterns, Load forecasting,Empirical study ∗ Corresponding author

Email addresses: [email protected] (Xixi Li), [email protected] (FotiosPetropoulos), [email protected] (Yanfei Kang)

URL: https://orcid.org/0000-0001-5846-3460 (Xixi Li), https://orcid.org/0000-0003-3039-4955 (Fotios Petropoulos), https://orcid.org/0000-0001-8769-6650 (Yanfei Kang)

Preprint submitted to arXiv a r X i v : . [ s t a t . A P ] J a n . Introduction Time series forecasting is an indispensable part of modern businesses as it is a valuableinput in the decision-process for production, ﬁnance, planning, scheduling, and otheractivities (Petropoulos et al., 2020). We are living in a big data era where large collectionsof time series are constantly emerging. For example, large retailers need to forecast tens ofthousands of sales series in each forecast cycle. There is a need to have robust solutions toprocess and forecast these large volume of series in a batch, automatic manner. However,identifying appropriate and reliable models for each series independently is not always astraightforward task, despite the abundance of available candidates.One way to deal with modelling time-series in an automatic way would be to resortto statistical model selection approaches, such as information criteria or cross-validation.Such approaches are able to select the most “appropriate” forecasting model, based on thein-sample ﬁt (penalised for model complexity) or based on past performance on a hold-out set of observations. However, real-life time-series data do not abide to particular datageneration processes, and established patterns may change over time. In a famous quote,Box et al. (1987) points out that all models are wrong and selecting just one of thesemaybe not enough. However, “some are useful”. Wolpert (1996) also argued that thereis never a best method that ﬁts in all situations. Instead of relying on a single modelto perform extrapolations, combining the forecasts across multiple models has shown notonly to perform well but to also reduce the variance of the forecasts (Bates and Granger,1969; Hibon and Evgeniou, 2005).The good performance of forecast combinations is also linked with the three inherentuncertainties that a forecaster needs to tackle (Hyndman et al., 2008; Petropoulos et al.,2018). First, the model uncertainty refers to being able to identify the correct modelform. Within the exponential smoothing (ExponenTial Smoothing or Error, Trend, Sea-sonality; ETS) family of models, this means identifying if trend and seasonal componentsneed to be included, but also their interactions (additive or multiplicative). In the autore-gressive integrated moving average (ARIMA) family, tackling model uncertainty entailscorrectly identifying the appropriate (non-seasonal and seasonal) autoregressive and mov-ing average terms. Second, even if the model form has been accurately selected, there isuncertainty related with the estimates of the model’s parameters, such as the smooth-ing parameters of exponential smoothing models. Finally, data uncertainty refers to the2ariation of the inherent random component of the series itself.While ﬁtting many diﬀerent models on the same data and averaging their forecastsis sometimes suﬃcient to achieve performance improvements (Kolassa, 2011; Petropou-los and Svetunkov, 2020), a better approach would be to manipulate the data and ex-tract additional information from them, if there is any. For example, Assimakopoulosand Nikolopoulos (2000) and Fiorucci et al. (2016) change the local curvatures of theseasonally-adjusted data to be able to predict short and long-term behaviours. The re-sulting method, Theta, performs really well in a variety of real-life settings, includinga ﬁrst place in the M3-competition (Makridakis and Hibon, 2000). Kourentzes et al.(2014) and Athanasopoulos et al. (2017) work with multiple temporal aggregation levelsof the same signal and perform forecast combinations across diﬀerent sampling frequen-cies. Bergmeir et al. (2016) and Petropoulos et al. (2018) perform bootstrapping on theremainder of the time-series decomposition to create several instances of the same series,which are then modelled independently and their forecasts are aggregated.Our work builds on previous studies in an attempt to extract as much informationfrom the data as possible. We particularly focus on the challenge of correctly modellingseasonal patterns. A seasonal pattern of time series exists when the data is inﬂuenced byseasonal factors (e.g., the quarter or the month of the year, Hyndman, 2011). Seasonalityis observed in a ﬁxed and known period, unlike other cyclical patterns. Most of theexisting time series models are designed to adapt to simple seasonal patterns with smallinteger values. Simple such models include the Seasonal Na¨ıve method, Holt-Winters’Exponential Smoothing (Winters, 1960), Damped Trend Seasonal Method (Gardner Jr,1985), and Seasonal ARIMA (SARIMA) models. In the case of multiple, complex seasonalpatterns, more complicated forecasting methods need to be considered (De Livera et al.,2011).For series that may display seasonal patterns, it is reasonable and feasible to use theobservations from one season of the historical data to forecast the corresponding seasonin the future. Intuitively and similarly, not only the value of a single season in the futurecan be predicted by the same season in the historical data, but also the value of severaladjacent seasons could be predicted by the respective past observations. Starting withthe conjecture that the we can use the observations of some adjacent seasons in the pastto predict the value of the corresponding seasons in the future, we construct multiple time3eries which only consist of the observations of one or some adjacent seasons. We referto these new series as sub-seasonal series, as they contain part but not all the possibleseasonal information. These derived series highlight diﬀerent sub-seasonal patterns of theoriginal time series, making it possible for the forecasting models to amplify these diversepatterns by simplifying their modelling.After constructing all possible sub-seasonal series, we extrapolate these sub-seasonalseries as well as the original series using standard forecasting workhorses. In this paper,we focus our empirical investigation on two widely-used forecasting families: ETS andARIMA (Hyndman and Athanasopoulos, 2018), but our framework is model-free andcould be expanded to any other forecasting model suitable to deal with simple seasonalpatterns. In modelling each sub-seasonal series, we select the ‘optimal’ model form andset of parameters separately and independently from other seasonal series. The forecastsof the diﬀerent series are then combined, eﬀectively tackling issues surrounding modeland parameter uncertainty.The key innovations of our approach are as follows: • We zoom in the sub-seasonal patterns of the original series that are simpler tomodel. • We do not rely on a single model and its forecasts that are based on the original se-ries. Instead, we mitigate the importance of model selection by combining forecastsacross many sub-seasonal series. • Our proposed approach is simple, transparent, and model-free. In particular to thelatter, we do not propose a new forecasting model per se, but a framework whichcan be plugged into any existing model.We conduct extensive empirical evaluations of our proposed framework using 75 thou-sand real time series from the Makridakis forecasting competitions (Makridakis et al.,1982; Makridakis and Hibon, 2000; Makridakis et al., 2020). Our framework works asa self-improving mechanism for ETS and ARIMA families of models, resulting in betterpoint-forecast accuracy and uncertainty estimation (captured through prediction inter-vals) than simply applying the families on the original series. We observe that improve-ments in performance are greater for longer forecasting horizons, where uncertainty is alsohigher. More importantly, our proposed approach applied to load forecasting producesreliable forecasts, showcasing its eﬃcacy on time series data with multiple and complex4easonal patterns.The rest of the article is organized as follows: Section 2 oﬀers a short literature reviewon relevant studies. Section 3 describes the methodology for the proposed forecasting ap-proach. Section 4 presents the experimental results and their statistical signiﬁcance.Section 5 shows the applications to electric load forecasting. Section 6 oﬀers our discus-sions and insights. Finally, Section 7 provides our conclusions.

2. Background research

Given the plethora of available models to choose from, several selection strategieshave been developed over the years. Such strategies compare the performance of diﬀer-ent candidate models in order to choose the optimal one, based on some criteria. Someapproaches, namely information criteria, select the model with maximum likelihood byadding penalties to compensate for the overﬁtting of more complex models (Bishop, 2006).Popular information criteria include the Akaike’s Information Criterion (AIC) and theBayesian Information Criterion (BIC). However, Bishop (2006) pointed out these infor-mation criteria could not properly account for the uncertainty of the models’ parametersand tend to prefer simple models to more complex ones.An alternative to selecting with information criteria is judging the models based ontheir past performance over multiple lead times. Such approaches are known as validationand cross-validation for time series, and are closely related to the concepts of ﬁxed androlling origin evaluation (Tashman, 2000). While these approaches are generally betterthan information criteria (Fildes and Petropoulos, 2015), they require longer series andmore computational resources.Other approaches to model selection take a more data-driven approach, essentiallyattempting to answer the question: Which is the best forecasting method for my data?Many attempts have been made on tackling the task of model selection using time-series features. For example, Reid (1972) pointed out that the nature of the data hasan inﬂuence on the performance of the forecasting methods. Collopy and Armstrong(1992) developed a rule-based system containing 99 rules to produce forecasts based onthe features of the data. Adya et al. (2001) identiﬁed time-series features automaticallyfor rule-based forecasting. Petropoulos et al. (2014) explored the factors that aﬀectforecasting accuracy in the ﬁeld of demand forecasting, and proposed related selection5rotocols. Kang et al. (2017) used Principal Component Analysis (PCA) to visualizethe forecasting algorithm performance in the time series instance spaces and had a betterunderstanding of their relative performance. Finally, Talagala et al. (2018) used a decisiontree to select the best forecasting method based on 42 manual features.Regardless the available options to perform model selection between forecasting mod-els, the

No Free Lunch theorem suggests that there does not exist one method that willperform always best across series, nor across time, due to the dynamic nature of theproblem. As such, instead of considering selecting a single model (per series or acrossseries), combinations of forecasts is an alternative way forward. Forecast combinationstend to yield better results when one is averaging forecasts from robust models but alsowhen there is a signiﬁcant diversity among the forecasts (Wang and Petropoulos, 2016;Thomson et al., 2019; Lichtendahl and Winkler, 2020). Combinations of forecasts addresstwo of the sources of uncertainty in forecasting (Petropoulos et al., 2018), model’s formand model’s parameters uncertainty, as we do not rely on a single model anymore.Apart from obtaining diverse and robust forecasts, another critical factor in the eﬃ-ciency of forecast combinations is the estimation of the weights. Recently scholars triedto use advanced machine learning techniques to optimise optimal weights with a focus onthe time-series features of the targeted time series. Montero-Manso et al. (2020) trainedXGBoost (Chen and Guestrin, 2016) to obtain optimal weights for each method basedon the 42 manual time-series features, achieving the second-best performance in the M4Competition. Li et al. (2020) used a similar approach to estimate optimal combinationweights on the automatic image features of the series and produced comparable perfor-mance with the top performers in M4 Competition. Regardless the good performanceof some optimal-weighting combination strategies, equal-weights combinations also per-form remarkably well, and often are not statistically diﬀerent than other, more complexapproaches (Jose and Winkler, 2008; Genre et al., 2013; Petropoulos and Svetunkov,2020).A special case in forecast combinations refers to approaches that, instead of ﬁtting thesame data on diﬀerent models, they manipulate the original data and create other seriesin an attempt to amplify some particular series characteristics while compressing others.One such approach can be found in the core of the theta method (Assimakopoulos andNikolopoulos, 2000). Instead of applying forecasting models on the original data, theta6ethod works on the seasonally-adjusted data. These are then further decomposed intotheta lines, aiming to capture better the short and long-term dynamics. The ﬁtted val-ues of a simple regression model of the seasonally-adjusted data on trend give the thetaline with θ = 0; this is a straight line able to capture long-term behaviours, assumingdeterministic linear trends. The theta line with θ = 1 maintains the curvatures of theseasonally-adjusted series. As the θ value increases, so do the local curvatures, allow-ing for ampliﬁcation of the local, short-term behaviours. The theta method essentiallychanges the data by changing their values through manipulation of the residual of theregression model. The original implementation of the theta method (Assimakopoulosand Nikolopoulos, 2000) involved two theta lines, θ = 0 and θ = 2, extrapolated withlinear regression and simple exponential smoothing respectively, with the ﬁnal deseason-alised forecast being the simple average of the two. Further extensions considered moretheta lines (Petropoulos and Nikolopoulos, 2013) or optimised theta lines (Fiorucci et al.,2016). The theta method is a robust benchmark in forecasting and was the top-performingmethod at the M3 competition (Makridakis and Hibon, 2000).Another approach that also uses data manipulation to extract additional informationis (non-overlapping) temporal aggregation (Nikolopoulos et al., 2011). This techniquedown-samples of the original series towards obtaining more series of lower frequencies.Higher temporal aggregation levels oﬀer smoother, less intermittent series, allowing forbetter extrapolation of trend patterns. Still, lower levels of aggregation allow for betterestimation of seasonal patterns. Producing forecasts for many levels independently andthen combining such forecasts has shown sizable improvements in accuracy both for fastand slow-moving series (Kourentzes et al., 2014; Petropoulos and Kourentzes, 2015). Themost recent advances on combining forecasts from several temporal aggregation levelshave expressed in hierarchical structures (Athanasopoulos et al., 2017; Kourentzes andAthanasopoulos, 2020).A third approach that follows the concept of extracting more information from thedata is bootstrapping (e.g., Bergmeir et al., 2016; Hasni et al., 2019). In particular fortime series forecasting, Bergmeir et al. (2016) proposed decomposing the original signalto the trend, seasonal, and remainder components, bootstrapping the latter towardscreating several series of remainders, and reconstructing the series by putting togethertrend, seasonality, and each series of remainders. Following the above process, one can7onstruct several series of the same structure (trend and seasonality) but with diﬀerentnoise. Instead of simply forecasting the original series, one can forecast all reconstructedseries and then aggregate (average) the forecasts. ‘Bagging’, when applied in conjunctionwith an algorithm that considers a large number of models and automatically selects thebest one (such as automatic exponential smoothing or ARIMA), can tackle the threesources of uncertainty: model form, model’s parameters, and data (Petropoulos et al.,2018).

3. Forecasting through integrating the forecasts of multiple time series withdiﬀerent sub-seasonal patterns

Suppose we observe a quarterly series of three years, and our target is to forecast thenext year (four quarters). The standard approach would be to take all 12 data pointsand forecast using some time series method (such as ETS or ARIMA). In our approach,we construct from the target series various sub-seasonal series (e.g., series consisting onlyof the ﬁrst quarters of each year). The ﬁnal forecasts can be obtained by combining theforecasts produced from the series with diﬀerent sub-series patterns. Figure 1 shows agraphical explanation of our method. For this particular series, the proposed forecastingprocedure is as follows.1. Create a new series that consists only of the ﬁrst quarters ( Q ) of each year. Repeatfor the second ( Q ), third ( Q ) and fourth ( Q ) quarters of the year. This is the ﬁrstlevel of information, where each sub-seasonal series contains exactly one quarter.2. Create a new series that consists of the ﬁrst and second quarters ( Q + Q ) ofeach year. Repeat for Q + Q , Q + Q and Q + Q . This is the second level ofinformation, where each sub-seasonal series contains exactly two adjacent quarters.3. Create a new series that consists of the ﬁrst, second and third quarters ( Q + Q + Q )of each year. Repeat for Q + Q + Q , Q + Q + Q and Q + Q + Q . This isthe third level of information, where each sub-seasonal series contains exactly threeadjacent quarters.4. Forecast the constructed sub-seasonal series, and the original series (which is thefourth level of information), using standard time series forecasting methods (e.g.,ARIMA and ETS). That is, we use the historical observations of some adjacentseasons to forecast the corresponding seasons in the future.8 igure 1. A graphic depiction of the proposed approach applied to a quarterly series with three years ofhistorical observations. The target is to forecast the next year (four quarters). The forecasting procedureconsists of three main steps: (1) create new time series only containing one or several adjacent seasons,i.e., Q , Q Q , Q , Q + Q , Q + Q , Q + Q , Q + Q , Q + Q + Q , Q + Q + Q , Q + Q + Q , Q + Q + Q , and the original series Q + Q + Q + Q , (2) forecast separately each sub-seasonal and theoriginal series, and (3) combine the produced forecasts with equal weights. Note that when combining, Q + Q + Q + Q is repeated four times so that an equal importance is given to all information levels.

5. Combine the obtained forecasts with equal weights. Note that the original series, Q + Q + Q + Q , is repeated four times so that an equal importance is given toall information levels.More generally, assume we are interested in forecasting a seasonal time series { y t , t =1 , , · · · , T } with its sampling frequency m , the forecasting procedure through integratingthe forecasts of multiple time series with diﬀerent sub-seasonal patterns is as follows.1. Construct multiple series with sub-seasonal patterns.

Create new timeseries only containing observations of only one or a few adjacent seasons and thecorresponding frequency is equal to the number of seasons. The total number of9he constructed series is equal to m .2. Forecasting.

Extrapolate using standard time series techniques for the newlyderived series separately.3.

Forecast combination . Combine the forecasts produced from the sub-seasonalseries with equal weights. Since the ‘optimal’ model form and set of parameters areestimated separately and independently when modelling each sub-seasonal series,forecast combination makes it possible to tackle issues around model and parameteruncertainty (Petropoulos et al., 2018).In the following sections, we will introduce each step of our proposed method in detail.

In this section, we use a real example from the widely-used M3 competition dataset todemonstrate the construction of sub-seasonal patterns from the original time series andillustrate the importance of forecasting with sub-seasonal patterns. We use the quarterlytime series Q520 from M3 dataset as an example. Figure 2 shows the constructed serieswith sub-seasons. For this particular time series, we can construct 16 series with varioussub-seasonal patterns containing the observations of only one or a few adjacent quarters.For each sub-seasonal series, the corresponding frequency equals to the number of theadjacent quarters. Note that among the 16 series, the last four series shown in the lastrow of Figure 2 correspond to the original targeted series. In other words, we repeatthe original series four times, regarded as Q + Q + Q + Q , Q + Q + Q + Q , Q + Q + Q + Q , and Q + Q + Q + Q , respectively. The series in each row ofFigure 2 contains the same number of quarters, i.e, the same level of information. Inparticular, the four series in the ﬁrst row only contain observations of one single quarterfrom the original series. The series in the second row contain observations of two adjacentquarters and the third row shows the sub-seasonal patterns containing observations ofthree adjacent quarters.We observe (Figure 2) that each newly created series highlights the diﬀerent sub-seasonal patterns and components of the original series. The series in the ﬁrst row,containing only one quarter, show an upward trend, while the trend in series that contains Q and Q is almost linear. Moreover, the ranges of the observations from diﬀerent sub-seasonal patterns are slightly diﬀerent. From the second row of Figure 2, it is obvious that10he sub-series containing Q + Q , Q + Q and Q + Q exhibit strong seasonality withtheir frequency being the number of seasons, which is equal to two. Besides, all the sub-series demonstrate an increasing trend. The series in the third row demonstrate strongseasonality with their frequency equal to three. As for the original series in the fourthrow, we can observe that there exists a strong seasonal pattern. These constructed serieshighlight the diverse sub-seasonal patterns and components of the target series, whichcan be ampliﬁed and used to improve forecasting performance. Figure 2. An example visualizing the constructed time series by our proposed method. The quarterlytime series Q520 from M3 competition is used to be the targeted series. 16 sub-seasonal series thatcontain the observations of only one or a few adjacent quarters are constructed. The series in the ﬁrstrow only contain observations of one quarter. The series in the second row contain observations of twoadjacent quarters. The series in the third row contain observations of three adjacent quarters. The lastfour series are the original time series that contain the observations of all the four quarters.

After constructing multiple sub-seasonal series, we ﬁt two families of models, namelyETS and ARIMA, for each of the constructed series to forecast the corresponding seasonsin the future. Table 1 provides details on the implementations used.11 able 1. The standard methods used to forecast each sub-seasonal series.

Forecasting method Description R implementationETS The exponential smoothing state space model(Hyndman et al., 2002). forecast::ets() ARIMA The autoregressive integrated moving averagemodel automatically estimated in the R package forecast (Hyndman and Khandakar, 2008). forecast::auto.arima() ETS is well known to be able to capture the components of a time series: error, trendand seasonality (Hyndman et al., 2002). The model form can be abbreviated as ETS(error, trend, seasonality). The error component can be “A” or “M”. The trend andseasonality terms can be “N”, “A” or “M”, while the trend can additionally be dampedor not. In all cases, “N”, “A” and “M” denote “None”, “Additive”, and “Multiplicative”,respectively. For ETS, an optimal model form and set of parameters can be automat-ically selected based on the corrected Akaike’s Information Criterion (AICc) for eachsub-seasonal series using the function ets() in the R package forecast (Hyndman andKhandakar, 2008). We ﬁt ETS models for each constructed series separately and auto-matically select the optimal model forms and parameters based on the ets() function. Asa result, the optimal model and parameter selection can ensure local optimal predictions.In addition to ETS, we also ﬁt ARIMA models using the auto.arima() function inthe R package forecast (Hyndman and Khandakar, 2008). The function automaticallysearches for the optimal model form and estimate the corresponding parameters. Bydefault, it combines unit root tests, minimisation of the AICc and Maximum LikelihoodEstimator (MLE) to obtain the optimal ARIMA model among models with various Au-toRegressive (AR) orders and moving average (MA) orders, up to a maximum of 5. Inorder to validate the eﬀectiveness and robustness of our proposed method, we also useARIMA to extrapolate for each constructed sub-seasonal series.It is worth emphasizing that the constructed sub-seasonal series can also be forecastwith other standard methods. Moreover, one may choose to apply diﬀerent families ofmethods on each constructed series or each information level. In any case, a diﬀerentmodel form and set of parameters are selected for each sub-seasonal series, and a diﬀerentset of forecasts is produced accordingly. The produced sets of forecasts for the sub-seasonal series are used in the next step: forecast combination.12 .3. Forecast combination After extrapolating each series separately, forecasts of the multiple sub-seasonal seriesare averaged with equal weights. In this section, in order to illustrate how our methodworks, we use the sample example as in section 3.1 to compare the combined forecastswith the standard forecasts of the targeted time series, based on the ETS model.Figure 3 visualizes the historical data and testing data of quarterly time series Q520from M3 competition, and their standard forecasts and multiple forecasts, respectively.“Standard” denotes using the ETS method applied on the original data, while “Multi-ple” denotes our proposed method. The right panel of Figure 3 visualizes the forecastsof each sub-seasonal series individually. We can see that our method eﬃciently forecaststhe variation as the information level of the series decreases (less adjacent quarters areconsidered) compared with the standard ETS method. This shows that even if the mod-elling for the original series “fails” and a seasonal model is not selected, the availabilityof further sub-seasonal patterns will render the forecasts more robust due to the capacityof capturing the diverse patterns.

Figure 3. Left: standard and multiple forecast of Q520 from M3 competition on the left. Right: individualETS forecasts for each sub-seasonal series.

The diverse patterns captured by the sub-seasonal patterns are presented in Fig-ure 4. Despite the fact that no sub-seasonal series are identiﬁed as seasonal, the com-bined forecast from our approach displays seasonality due to the proper alignment of thesub-seasonal forecasts to the corresponding periods and the diﬀerent degrees of trendsidentiﬁed (see, for example, the right panel of Figure 3 and the Q + Q series).13 TS Components S ea s on T r end E rr o r Q Q Q Q Q + Q Q + Q Q + Q Q + Q Q + Q + Q Q + Q + Q Q + Q + Q Q + Q + Q Q + Q + Q + Q Q + Q + Q + Q Q + Q + Q + Q Q + Q + Q + Q N N N N N N N N N N N N N N N NA A A A A N N N N A A A A A A AA A A A M A M A M M M M M M M M

Figure 4. Optimal ETS components for each sub-seasonal series of Q520 from M3 competition data

4. Empirical evaluation on M competitions

Table 2 presents the data used in our empirical experiments. We consider the quarterlyand monthly subsets of M1 (Makridakis et al., 1982), M3 (Makridakis and Hibon, 2000)and M4 (Makridakis et al., 2020) competitions, which refer to various categories suchas demographics, industries, ﬁnance, economics, and others. The M1 and M3 datasetsare publicly available in the

Mcomp R package (Hyndman, 2018), and the M4 datasetin the M4comp2018 R package (Montero-Manso et al., 2018). The forecasting horizonsfor quarterly and monthly data are 8 and 18, respectively. The total number of seriesfor quarterly data is 24,959, while the monthly subset includes 50,045 series. We alsoconsider the 414 hourly series from the M4 competition, for which the frequency is equalto 24 and the forecast horizon is 48 period ahead. The inclusion of the hourly data ismade so that we can test the approach on data with high periodicity.14 able 2. M competition data used for the empirical evaluation. Frequency Source Number of series Frequency hQuarterly M1-Competition 203 4 8M3-Competition 756 4 8M4-Competition 24000 4 8Total 24959Monthly M1-Competition 617 12 18M3-Competition 1428 12 18M4-Competition 48000 12 18Total 50045Hourly M4-Competition 414 24 48

To evaluate the point forecasts, we use the commonly employed metrics, namely theMean Absolute Scaled Error (MASE, Hyndman and Koehler, 2006) and the SymmetricMean Absolute Percentage Error (sMAPE, Makridakis and Hibon, 2000). These accuracymetrics were also used in the M4 competition. They are calculated as:sMAPE = 1 h h (cid:88) t =1 | y t − (cid:98) y t || y t | + | (cid:98) y t | , MASE = 1 h (cid:80) ht =1 | y t − (cid:98) y t | n − m (cid:80) nt = m +1 | y t − y t − m | , where y t is the real value of the time series at time point t , n is the length of the historicaldata, ˆ y t is the point forecast, h is the forecasting horizon and m is the frequency of thedata (e.g., 4, 12, and 24 for quarterly, monthly, and hourly series, respectively).In order to quantify the performance on forecasting uncertainty, we use the (1 − α ) × h (cid:80) ht =1 (cid:0) f ut − f lt (cid:1) + α (cid:0) f lt − y t (cid:1) (cid:8) y t < f lt (cid:9) + α ( y t − f ut ) { y t > f ut } n − m (cid:80) nt = m +1 | y t − y t − m | , where f lt and f ut are the lower and upper bounds of the generated prediction intervals, α is the signiﬁcance level, and is the indicator function, which equals to 1 when thecondition is true and returns 0 otherwise. 15 .3. Standard versus multiple forecasts In order to evaluate our proposed method, we compare it with the standard bench-marks (i.e., ETS and ARIMA on the original series) with regard to the MASE, sMAPE,and MSIS values for 95% conﬁdence levels over the quarterly, monthly, and hourlydatasets. Their implementation has been described in Table 1. We report in Table 3and Table 4 the forecasting results for short-term, medium-term, long-term, and overallhorizons. Table 3 presente the forecasting performance regarding the (arithmetic) meanMASE, sMAPE, and MSIS values of diﬀerent horizons over the M1, M3, and M4 quar-terly, monthly, and hourly subsets. Table 4 provided the corresponding median values.“Standard” represents benchmarks, while “Multiple” represents our proposed method.Entries in bold highlight that our method outperforms the corresponding benchmarks.We observe that: • For quarterly data, focusing on the results based on mean in Table 3, we ﬁndthat the proposed method performs better than the standard benchmarks in themost mid and long-term horizons (when h > • For monthly data, “Multiple” performs very competitively against “Standard” foralmost all the forecasting horizons regarding the mean results. For median aggrega-tion results, the performance of “Multiple” is signiﬁcantly better than “Standard”when used to forecast the values on mid and long-term horizons. • For hourly data, “Multiple” performs better than “Standard” for almost all theforecasting horizons for both the mean and median results (Tables 3 and 4). Thediﬀerences are very large for the case of ETS, where the drops in the values of themean MASE and mean MSIS are more than 40%. • From the discussion above, the proposed approach is more suitable for the timeseries with higher frequencies (higher values of m ), which yield a larger numberof sub-seasonal series. The diversity in the sub-seasonal patterns is beneﬁcial inimproving the forecasting performance by forecast combination.16 able 3. Benchmarking the performance of our proposed method against all the benchmark models withregard to the mean of MASE, sMAPE and MSIS values for 95% conﬁdence levels over the M1, M3 andM4 Quarterly, Monthly and Hourly datasets. Quarterly Monthly HourlyMASE h =1 1-3 4-6 7-8 1-8 h =1 1-6 7-12 13-18 1-18 h =1 1-16 17-32 33-48 1-48ETS Standard 0.599 0.774 1.261 1.607 1.165 0.454 0.651 0.965 1.225 0.947 0.390 1.410 1.611 2.450 1.824Multiple 0.644 0.797 ARIMA Standard 0.596 0.779 1.273 1.605 1.171 0.441 0.627 0.953 1.213 0.931 0.366 0.708 0.865 1.269 0.947Multiple 0.641 0.807 1.274 sMAPE h =1 1-3 4-6 7-8 1-8 h =1 1-6 7-12 13-18 1-18 h =1 1-16 17-32 33-48 1-48ETS Standard 5.985 7.469 11.070 13.515 10.331 6.872 10.001 13.650 17.031 13.560 8.821 15.301 17.138 19.481 17.307Multiple 6.265 7.558 ARIMA Standard 5.987 7.602 11.250 13.578 10.464 6.790 9.703 13.627 17.364 13.565 15.718 12.351 15.194 14.653 14.066Multiple 6.326 7.768

MSIS h =1 1-3 4-6 7-8 1-8 h =1 1-6 7-12 13-18 1-18 h =1 1-16 17-32 33-48 1-48ETS Standard 4.729 6.154 10.354 13.585 9.587 3.698 5.137 8.493 11.143 8.258 3.127 11.685 15.107 25.669 17.487Multiple 5.046 6.253 ARIMA Standard 5.543 7.221 12.230 15.787 11.241 4.045 5.470 9.132 11.640 8.747 3.304 5.812 7.437 9.255 7.501Multiple able 4. Benchmarking the performance of our proposed method against all the benchmark models withregard to the median of MASE, sMAPE and MSIS values for 95% conﬁdence levels over the M1, M3 andM4 Quarterly, Monthly and Hourly datasets. Quarterly Monthly HourlyMASE h =1 1-3 4-6 7-8 1-8 h =1 1-6 7-12 13-18 1-18 h =1 1-16 17-32 33-48 1-48ETS Standard 0.360 0.572 0.912 1.129 0.887 0.245 0.501 0.709 0.877 0.736 0.231 0.760 0.865 1.360 1.065Multiple 0.421 0.598 ARIMA Standard 0.363 0.578 0.920 1.126 0.895 0.235 0.481 0.699 0.877 0.728 0.216 0.543 0.711 1.077 0.823Multiple 0.421 0.609 0.927 sMAPE h =1 1-3 4-6 7-8 1-8 h =1 1-6 7-12 13-18 1-18 h =1 1-16 17-32 33-48 1-48ETS Standard 2.177 3.516 5.614 6.866 5.627 1.945 4.337 6.777 8.467 7.131 1.151 4.597 5.408 6.850 5.734Multiple 2.525 3.715 5.634 MSIS h =1 1-3 4-6 7-8 1-8 h =1 1-6 7-12 13-18 1-18 h =1 1-16 17-32 33-48 1-48ETS Standard 3.044 3.975 6.180 7.644 5.955 1.953 3.234 4.914 6.313 5.027 1.568 4.994 7.044 11.428 8.370Multiple 3.556 4.209 6.188 ARIMA Standard 2.701 3.664 5.452 6.596 5.311 1.933 3.220 4.592 5.842 4.727 2.065 4.277 5.695 6.969 5.802Multiple 3.131 3.848 5.552

To investigate the statistical signiﬁcance of the proposed method, we ﬁrst carry outthe Multiple Comparisons from the Best (MCB) test (Koning et al., 2005) that aimsto ﬁnd if the average ranking of our proposed method is signiﬁcantly diﬀerent from thebenchmarks. With MCB test, if the intervals of “Multiple” and “Standard” overlap, thentheir ranking performances are not signiﬁcantly diﬀerent. We implement the MCB testusing tsutils::nemenyi() in R .Figure 5 shows the MCB test for “Standard” and “Multiple” ETS (ﬁrst column)or ARIMA (second column) for the quarterly series with regard to MASE (ﬁrst row),sMAPE (second row), and MSIS (third row). The ranking performance of our proposedmethod is statistically worse than the benchmarks.Figure 6 and Figure 7 depict the MCB test for “Standard” and “Multiple” ETS andARIMA models with regard to MASE, sMAPE, and MSIS values for the monthly and18 uarterly ETS MASE Mean ranks l l

Standard − 1.48Multiple − 1.52 1.48 1.49 1.50 1.51 1.52 l l l ll

Quarterly ETS sMAPE

Mean ranks l l

Standard − 1.48Multiple − 1.52 1.48 1.49 1.50 1.51 1.52 l l l ll

Quarterly ETS MSIS

Mean ranks l l

Standard − 1.39Multiple − 1.61 1.40 1.45 1.50 1.55 1.60 l l l ll

Quarterly ARIMA MASE

Mean ranks l l

Standard − 1.48Multiple − 1.52 1.47 1.49 1.51 1.53 l l l ll

Quarterly ARIMA sMAPE

Mean ranks l l

Standard − 1.48Multiple − 1.52 1.47 1.48 1.49 1.50 1.51 1.52 1.53 l l l ll

Quarterly ARIMA MSIS

Mean ranks l l

Standard − 1.38Multiple − 1.62 1.40 1.45 1.50 1.55 1.60 l l l ll

Figure 5. MCB signiﬁcance tests for standard and multiple ETS and ARIMA over quarterly data withregard to MASE, sMAPE, and MSIS values. hourly data, respectively. We can observe that the ranking performance of our methodis almost always better than the benchmarks, except for the monthly MSIS results usingARIMA models (bottom-right panel of Figure 6). The diﬀerences are also statisticallysigniﬁcant, especially for the hourly frequency. These results further verify our argumentthat the “Multiple” approach is more suitable for higher-frequency time series where moresub-seasonal patterns may be observed.We also carry out Diebold-Mariano (DM) tests (Harvey et al., 1997) to test theperformance of “Multiple” compared with “Standard”. In DM test, the null hypothesisis that the two approaches have the same forecast accuracy. Given a signiﬁcance level(5% is used in this study), if the DM test statistic falls within the lower and upper 2.5%tails of a standard Normal distribution, we reject the null hypothesis. The DM test isimplemented using forecast::dm.test() in R .The entries in Table 5 show the percentage of times the “Multiple” approach is sig-niﬁcantly better or worse than the standard forecasts in diﬀerent horizons. We observethat the percentage of time that our approach signiﬁcantly outperforms the benchmark(“Standard”) signiﬁcantly increases as the horizon and frequency increase. For instance,our approach performs signiﬁcantly better than standard ETS in 51% of the cases for thelonger horizons of the hourly data, and only 21% of the cases worse.19 onthly ETS MASE Mean ranks l l

Multiple − 1.44Standard − 1.56 1.44 1.46 1.48 1.50 1.52 1.54 1.56 l l l ll

Monthly ETS sMAPE

Mean ranks l l

Multiple − 1.44Standard − 1.56 1.44 1.46 1.48 1.50 1.52 1.54 1.56 l l l ll

Monthly ETS MSIS

Mean ranks l l

Multiple − 1.48Standard − 1.52 1.48 1.49 1.50 1.51 1.52 l l l ll

Monthly ARIMA MASE

Mean ranks l l

Multiple − 1.46Standard − 1.54 1.46 1.48 1.50 1.52 1.54 l l l ll

Monthly ARIMA sMAPE

Mean ranks l l

Multiple − 1.46Standard − 1.54 1.46 1.48 1.50 1.52 1.54 l l l ll

Monthly ARIMA MSIS

Mean ranks l l

Standard − 1.45Multiple − 1.55 1.46 1.48 1.50 1.52 1.54 l l l ll

Figure 6. MCB signiﬁcance tests for standard and multiple ETS and ARIMA over monthly data withregard to MASE, sMAPE, and MSIS values.

Hourly ETS MASE

Mean ranks l l

Multiple − 1.26Standard − 1.74 1.2 1.3 1.4 1.5 1.6 1.7 1.8 l l l ll

Hourly ETS sMAPE

Mean ranks l l

Multiple − 1.22Standard − 1.78 1.2 1.3 1.4 1.5 1.6 1.7 1.8 l l l ll

Hourly ETS MSIS

Mean ranks l l

Multiple − 1.21Standard − 1.79 1.2 1.3 1.4 1.5 1.6 1.7 1.8 l l l ll

Hourly ARIMA MASE

Mean ranks l l

Multiple − 1.29Standard − 1.71 1.3 1.4 1.5 1.6 1.7 l l l ll

Hourly ARIMA sMAPE

Mean ranks l l

Multiple − 1.29Standard − 1.71 1.3 1.4 1.5 1.6 1.7 l l l ll

Hourly ARIMA MSIS

Mean ranks l l

Multiple − 1.20Standard − 1.80 1.2 1.3 1.4 1.5 1.6 1.7 1.8 l l l ll

Figure 7. MCB signiﬁcance tests for standard and multiple ETS and ARIMA over hourly data withregard to MASE, sMAPE, and MSIS values. able 5. Diebold-Mariano (DM) tests for comparing the predictive accuracy of standard method withthe multiple method. The entries show the percentage of times the “Multiple” forecasts are signiﬁcantlybetter or worse than the “Standard” forecasts for diﬀerent horizons. Quarterly Monthly Hourly h =1-3 4-6 7-8 1-8 h =1-6 7-12 13-18 1-18 h =1-16 17-32 33-48 1-48Multiple ETS better 2.893 worse 6.431 2.821 3.362 18.611 16.040 14.655 23.039 24.464 12.802 9.903 20.773 16.425Multiple ARIMA better 2.757 worse 5.862 3.337 3.277 17.749 16.759 15.374 22.476 24.886 10.870 13.285 13.043 13.768

5. Applications to load forecasting

In this section, we aim to further validate the good performance of our approach whencomplex seasonal patterns exist. As a case study, we apply our approach to two datasetson electricity load. Considering whether some common features of the series will aﬀectthe stability of our approach, such as sequence length, trend changes, etc., we selecttwo diﬀerent series with varying patterns that are widely used in the previous research(Taylor, 2003; Rendon-Sanchez and de Menezes, 2019). The ﬁrst dataset is the hourlyelectricity demand in England and Wales from Monday 5 June 2000 to Sunday 27 August2000 (Taylor, 2003), as depicted in the top panel of Figure 8. The second one is the thehourly electricity demand in England and Wales from 1 January 2016 to 31 December2016 (Rendon-Sanchez and de Menezes, 2019), as shown in the bottom panel of Figure 8.The ﬁrst series only shows the cyclical demand pattern for several months of the year,with no obvious trend changes, while the second series reﬂects the changes in demand forone year, which shows a strong seasonal pattern with a downward and upward trend.

The top panel of Figure 8 shows the 12 weeks of hourly electricity demand in Englandand Wales from Monday 5 June 2000 to Sunday 27 August 2000. In line with Taylor(2003) who previously used these data, the ﬁrst 8 weeks of data are used as training dataand the remaining 4 weeks are used to evaluate the post-sample forecasting performance21 ours D e m and ( M W ) Hours D e m and ( M W ) Figure 8. Top: Hourly electricity demand in England and Wales from 5 June 2000 to 27 August 2000.Source: (Taylor, 2003). Bottom: Hourly electricity demand in England and Wales, 1 January 2016 to31 December 2016. Source: National Grid (Rendon-Sanchez and de Menezes, 2019). of forecasts up to 24 hours ahead. In all, 1344 hourly observations are used for trainingand 672 for rolling-origin evaluation.Figure 9 shows the forecasting accuracy for each forecasting horizon in terms of MASE,sMAPE, and MSIS using “Standard” and “Multiple” methods. When comparing “Stan-dard” and “Multiple” approaches for point and interval forecasting for all horizons, wecan observe signiﬁcant beneﬁts from using information from sub-seasonal patterns. The22erformance of multiple ETS is on average 5.47%, 4.70% and 4.26% in MASE, SMAPEand MSIS respectively better than the standard ETS over all horizons. Especially, theimprovement of multiple ARIMA in MAS, sMAPE and MSIS is on average 9.22%, 9.28%and 11.83% respectively than the standard ARIMA. In all, for prediction intervals, mul-tiple ARIMA outperforms standard ARIMA and multiple ETS produces better forecastsfor the short-term and mid-term forecasting, while “Standard” and “Multiple” performon par for longer horizons. As can be seen from the Figure 9, with the increase of horizon,the accuracy becomes larger and then lower, which indicates that the predictability ofthe data is enhanced. As a result, the standard method has been able to predict well, soour method is less likely to improve the accuracy even if we use the sub-seasonal patternsto predict. V a l ue MASE sMAPE

MSIS

Method

ARIMA ETS Multiple ARIMA Multiple ETS

Figure 9. Benchmarking the performance of our proposed method against all the benchmark modelswith regard to the mean of MASE, sMAPE and MSIS values for 95% conﬁdence levels for all horizonsover the hourly electricity demand in England and Wales from Monday 5 June 2000 to Sunday 27 August2000.

The bottom panel of Figure 8 visualises the hourly electricity demand in England andWales for the year 2016 (from 1 January 2016 to 31 December 2016). What is diﬀerentfrom the series in the top panel of Figure 8 is that series here is much longer and wewant to validate if our approach can adapt to longer sequences. To be consistent withthe literature (Rendon-Sanchez and de Menezes, 2019), the series is split into a trainingperiod consisting of 35 weeks and a testing period of 17.3 weeks, with rolling forecasts up23o 24 hours ahead being considered. The dashed and solid lines in Figure 10 representthe accuracy of “Standard” and “Multiple” methods, respectively. The performance ofmultiple ETS is on average 9.04%, 7.71% and 13.18% in MASE, SMAPE and MSISrespectively better than the standard ETS over all horizons. Also, the improvement ofmultiple ARIMA in MAS, sMAPE and MSIS is on average 7.57%, 7.14% and 12.53%respectively than the standard ARIMA. V a l ue MASE sMAPE

MSIS

Method

ARIMA ETS Multiple ARIMA Multiple ETS

Figure 10. Benchmarking the performance of our proposed method against all the benchmark modelswith regard to the mean of MASE, sMAPE and MSIS values for 95% conﬁdence levels for all horizonsover the hourly electricity demand in England and Wales for the year 2016 (from Friday, 1 January 2016,to Saturday, 31 December 2016).

6. Discussion

We are living in a big data era where large collections of time series are constantlygenerated. Forecasters often aim to choose best model for their data automatically.However, accurate model selection requires professional knowledge and rich experience,making forecasting a diﬃcult task. In order to mitigate the importance of model selection,we propose a novel forecasting method that constructs multiple series with diverse sub-seasonal patterns and subsequently extrapolates for each new series separately. Finally,the forecasts of these series are averaged with equal weights.In this paper, we choose some real cases to directly visualize diﬀerent sub-seasonalpatterns and some complex components of the targeted series. Our method makes itpossible for forecasting methods to amplify these potential patterns and hence improve theforecast performance. In order to make accurate and automatic extrapolations for theseseries individually, we use two widely-used statistical time-series forecasting benchmarks,24TS and ARIMA, that are avilable in the R package forecast (Hyndman and Khandakar,2008). Automatic and optimal model and parameter selection for these sub-seasonal seriescan ensure local optimal predictions.The application on a large number of real-life series has shown that our proposedmethod can produce better forecasts the targeted series compared with standard ETSmethod. Our better performance can be attributed to the fact that our proposed methodzooms in on the diverse and potential sub-seasonal patterns and components of the orig-inal time series. Moreover, we mitigate the importance of single model selection throughmodel combination and therefore it is possible to produce diverse and reliable forecasts.The widely used datasets M1, M3 and M4 are ﬁrst used for empirical evaluation.It is shown that our proposed method performs better than benchmarks in most of thehorizons, whether in point or interval prediction over monthly and hourly datasets, whichindicates that our method is more suitable for the time series with higher frequencies. Inthe other hand, our method does not improve as much over the statistical benchmarks forthe quarterly datasets. We carry out MCB tests to investigate the ranking performanceof our method and benchmarks. The results show that the ranked performance of ourmethod for monthly and hourly datasets is statistically better than the benchmarks. Inorder to compare the accuracy of standard forecasts with multiple forecasts, we also carryout DM signiﬁcance tests. These further support the insight our method is signiﬁcantlysuperior to the benchmarks over monthly and hourly datasets. The tests for statisticalsigniﬁcance further verify and strengthen the conclusion that our proposed method ismore suitable for the time series with higher frequencies.We apply our proposed method to electricity demand forecasting to further verify theeﬀectiveness and stability of our approach. The empirical results show that our methodindeed produces better and more robust forecasts in terms of both point forecasts andprediction intervals.Why does our proposed method work? We construct multiple time series consisting ofone or several adjacent seasons, amplifying the sub-seasonal patterns and complex com-ponents of the original series when forecasting. Furthermore, extrapolating for each newseries separately can eﬀectively mitigate the importance of model selection on a single(the original) series, making it possible to oﬀer diverse and reliable forecasts. Our ap-proach can signiﬁcantly improve the performance of ETS and ARIMA for high-frequency25ata through constructing multiple sub-seasonal series. More importantly, due to its sim-plicity and transparency, our method can be transferred and applied in diﬀerent contextsand for other families of models.Our approach fo recasting with s ub-seasonal s eries ( foss ) is implemented as a R pack-age. Our code is open-source and publicly available at https://github.com/lixixibj/foss .

7. Conclusions

To mitigate the importance of model selection, we propose a novel forecasting methodthat constructs new series containing the observations of only one or a few adjacent sea-sons of the original time series. Each sub-seasonal time series is used to make extrapola-tion forecasts for the corresponding seasons in the future. Finally, the forecasts producedfor each horizon are combined with equal weights. The main advantage of our proposedmethod is that it makes full use of the sub-seasonal patterns.In our empirical experiments, our proposed approach yields better forecasting perfor-mance than the benchmark methods over the widely used competition datasets M1, M3,and M4. Especially, our method is more suitable and stable for the datasets with higherfrequencies. Also, we apply our forecasting approach to load forecasting and it yieldssigniﬁcantly improved forecasting accuracies.A limitation of our approach is that it is designed for accommodating only seasonaldata. Moreover, our approach just focuses on simple seasonal patterns. While for thetime series that exhibit complex multiple seasonal patterns (e.g., multiple seasonal periodscan be observed in the electric load data shown in Figure 8), how to eﬀectively use suchcomplex multiple sub-seasonal patterns to improve forecasting remains to be studiedin the future (De Livera et al., 2011). On the other hand, in this work, we simplyset equal weights for multiple forecasts and potential research such as calculating theiroptimal unequal combination weights could be explored in the future to further improveforecasting performance.

Acknowledgements

Yanfei Kang is supported by the National Natural Science Foundation of China (No.11701022) and the National Key Research and Development Program (No. 2019YFB1404600).26 eferencesReferences

Adya, M., Collopy, F., Armstrong, J. S. and Kennedy, M. (2001), ‘Automatic iden-tiﬁcation of time series features for rule-based forecasting’,

International Journal ofForecasting (2), 143–157.Assimakopoulos, V. and Nikolopoulos, K. (2000), ‘The Theta model: a decompositionapproach to forecasting’, International Journal of Forecasting (4), 521–530.Athanasopoulos, G., Hyndman, R. J., Kourentzes, N. and Petropoulos, F. (2017),‘Forecasting with temporal hierarchies’, European Journal of Operational Research (1), 60–74.Bates, J. M. and Granger, C. W. J. (1969), ‘The combination of forecasts’,

Journal ofthe Operational Research Society (4), 451–468.Bergmeir, C., Hyndman, R. J. and Ben´ıtez, J. M. (2016), ‘Bagging exponential smooth-ing methods using STL decomposition and Box–Cox transformation’, InternationalJournal of Forecasting (2), 303–312.Bishop, C. M. (2006), Pattern recognition and machine learning , springer.Box, G. E., Draper, N. R. et al. (1987),

Empirical model-building and response surfaces ,Vol. 424, Wiley New York.Chen, T. and Guestrin, C. (2016), Xgboost:a scalable tree boosting system, in ‘ACMSIGKDD International Conference on Knowledge Discovery and Data Mining’, pp. 785–794.Collopy, F. and Armstrong, J. S. (1992), ‘Rule-based forecasting: Development and val-idation of an expert systems approach to combining time series extrapolations’, Man-agement Science (10), 1394–1414.De Livera, A. M., Hyndman, R. J. and Snyder, R. D. (2011), ‘Forecasting time serieswith complex seasonal patterns using exponential smoothing’, Journal of the AmericanStatistical Association (496), 1513–1527.Fildes, R. and Petropoulos, F. (2015), ‘Simple versus complex selection rules for forecast-ing many time series’,

Journal of Business Research (8), 1692–1701.Fiorucci, J. A., Pellegrini, T. R., Louzada, F., Petropoulos, F. and Koehler, A. B. (2016),‘Models for optimising the theta method and their relationship to state space models’, International Journal of Forecasting (4), 1151–1161.27ardner Jr, E. S. (1985), ‘Exponential smoothing: The state of the art’, Journal ofForecasting (1), 1–28.Genre, V., Kenny, G., Meyler, A. and Timmermann, A. (2013), ‘Combining expert fore-casts: Can anything beat the simple average?’, International Journal of Forecasting (1), 108–121.Gneiting, T. and Raftery, A. E. (2007), ‘Strictly proper scoring rules, prediction, andestimation’, Journal of the American Statistical Association (477), 359–378.Harvey, D., Leybourne, S. and Newbold, P. (1997), ‘Testing the equality of predictionmean squared errors’,

International Journal of Forecasting (2), 281–291.Hasni, M., Aguir, M., Babai, M. and Jemai, Z. (2019), ‘Spare parts demand forecasting:a review on bootstrapping methods’, International Journal of Production Research (15-16), 4791–4804.Hibon, M. and Evgeniou, T. (2005), ‘To combine or not to combine: selecting amongforecasts and their combinations’, International Journal of Forecasting (1), 15–24.Hyndman, R. (2018), Mcomp: Data from the M-Competitions . R package version 2.8.

URL: https://CRAN.R-project.org/package=Mcomp

Hyndman, R. J. (2011),

Cyclic and seasonal time series . URL: https://robjhyndman.com/hyndsight/cyclicts/

Hyndman, R. J. and Athanasopoulos, G. (2018),

Forecasting: principles and practice ,OTexts.Hyndman, R. J. and Khandakar, Y. (2008), ‘Automatic time series forecasting: theforecast package for R’,

Journal of Statistical Software (3), 1–22.Hyndman, R. J. and Koehler, A. B. (2006), ‘Another look at measures of forecast accu-racy’, International Journal of Forecasting (4), 679–688.Hyndman, R. J., Koehler, A. B., Snyder, R. D. and Grose, S. (2002), ‘A state space frame-work for automatic forecasting using exponential smoothing methods’, InternationalJournal of Forecasting (3), 439–454.Hyndman, R., Koehler, A. B., Ord, J. K. and Snyder, R. D. (2008), Forecasting withexponential smoothing: the state space approach , Springer Science & Business Media.Jose, V. R. R. and Winkler, R. L. (2008), ‘Simple robust averages of forecasts: Someempirical results’,

International Journal of Forecasting (1), 163–169.Kang, Y., Hyndman, R. J. and Smith-Miles, K. (2017), ‘Visualising forecasting algorithm28erformance using time series instance spaces’, International Journal of Forecasting (2), 345–358.Kolassa, S. (2011), ‘Combining exponential smoothing forecasts using akaike weights’, International Journal of Forecasting (2), 238–251.Koning, A. J., Franses, P. H., Hibon, M. and Stekler, H. O. (2005), ‘The M3 competition:Statistical tests of the results’, International Journal of Forecasting (3), 397–409.Kourentzes, N. and Athanasopoulos, G. (2020), ‘Elucidate structure in intermittent de-mand series’, European Journal of Operational Research (1), 141–152.Kourentzes, N., Petropoulos, F. and Trapero, J. R. (2014), ‘Improving forecasting byestimating time series structural components across multiple frequencies’,

InternationalJournal of Forecasting (2), 291–302.Li, X., Kang, Y. and Li, F. (2020), ‘Forecasting with time series imaging’, Expert Systemwith Applications , 113680.Lichtendahl, K. C. and Winkler, R. L. (2020), ‘Why do some combinations perform betterthan others?’,

International Journal of Forecasting (1), 142–149.Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R.,Newton, J., Parzen, E. and Winkler, R. (1982), ‘The accuracy of extrapolation (timeseries) methods: Results of a forecasting competition’, Journal of Forecasting (2), 111–153.Makridakis, S. and Hibon, M. (2000), ‘The M3-Competition: results, conclusions andimplications’, International Journal of Forecasting (4), 451–476.Makridakis, S., Spiliotis, E. and Assimakopoulos, V. (2020), ‘The M4 competition:100,000 time series and 61 forecasting methods’, International Journal of Forecasting (1), 54–74.Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J. and Talagala, T. S. (2020),‘Fforma: Feature-based forecast model averaging’, International Journal of Forecasting (1), 86–92.Montero-Manso, P., Netto, C. and Talagala, T. S. (2018), ‘M4comp2018: Data from theM4-Competition’. R package version: 0.1.0.Nikolopoulos, K., Syntetos, A. A., Boylan, J. E., Petropoulos, F. and Assimakopoulos,V. (2011), ‘An aggregate - disaggregate intermittent demand approach (ADIDA) toforecasting: An empirical proposition and analysis’, The Journal of the Operational esearch Society (3), 544–554.Petropoulos, F., Apiletti, D., Assimakopoulos, V., Babai, M. Z., Barrow, D. K., Bergmeir,C., Bessa, R. J., Boylan, J. E., Browell, J., Carnevale, C., Castle, J. L., Cirillo, P.,Clements, M. P., Cordeiro, C., Oliveira, F. L. C., De Baets, S., Dokumentov, A.,Fiszeder, P., Franses, P. H., Gilliland, M., Sinan G¨on¨ul, M., Goodwin, P., Grossi,L., Grushka-Cockayne, Y., Guidolin, M., Guidolin, M., Gunter, U., Guo, X., Guseo,R., Harvey, N., Hendry, D. F., Hollyman, R., Januschowski, T., Jeon, J., Jose, V.R. R., Kang, Y., Koehler, A. B., Kolassa, S., Kourentzes, N., Leva, S., Li, F., Litsiou,K., Makridakis, S., Martinez, A. B., Meeran, S., Modis, T., Nikolopoulos, K., ¨Onkal,D., Paccagnini, A., Panapakidis, I., Pav´ıa, J. M., Pedio, M., Pedregal, D. J., Pinson,P., Ramos, P., Rapach, D. E., James Reade, J., Rostami-Tabar, B., Rubaszek, M.,Sermpinis, G., Shang, H. L., Spiliotis, E., Syntetos, A. A., Talagala, P. D., Talagala,T. S., Tashman, L., Thomakos, D., Thorarinsdottir, T., Todini, E., Arenas, J. R. T.,Wang, X., Winkler, R. L., Yusupova, A. and Ziel, F. (2020), ‘Forecasting: theory andpractice’, arXiv .Petropoulos, F., Hyndman, R. J. and Bergmeir, C. (2018), ‘Exploring the sources ofuncertainty: Why does bagging for time series forecasting work?’, European Journal ofOperational Research (2), 545–554.Petropoulos, F. and Kourentzes, N. (2015), ‘Forecast combinations for intermittent de-mand’,

The Journal of the Operational Research Society (6), 914–924.Petropoulos, F., Makridakis, S., Assimakopoulos, V. and Nikolopoulos, K. (2014),“horses for courses’ in demand forecasting’, European Journal of Operational Research (1), 152–163.Petropoulos, F. and Nikolopoulos, K. (2013), Optimizing theta model for monthly data., in ‘ICAART (1)’, pp. 190–195.Petropoulos, F. and Svetunkov, I. (2020), ‘A simple combination of univariate models’, International Journal of Forecasting (1), 110–115.Reid, D. (1972), ‘A comparison of forecasting techniques on economic time series’, Fore-casting in Action. Operational Research Society and the Society for Long Range Plan-ning .Rendon-Sanchez, J. F. and de Menezes, L. M. (2019), ‘Structural combination of sea-sonal exponential smoothing forecasts applied to load forecasting’,

European Journal f Operational Research (3), 916–924.Talagala, T. S., Hyndman, R. J., Athanasopoulos, G. et al. (2018), Meta-learning how toforecast time series, Technical report, Monash University, Department of Econometricsand Business Statistics.Tashman, L. J. (2000), ‘Out-of-sample tests of forecasting accuracy: an analysis andreview’, International Journal of Forecasting (4), 437–450.Taylor, J. W. (2003), ‘Short-term electricity demand forecasting using double seasonalexponential smoothing’, Journal of the Operational Research Society (8), 799–805.Thomson, M. E., Pollock, A. C., Onkal, D. and Gonul, M. S. (2019), ‘Combining forecasts:Performance and coherence’, International Journal of Forecasting (2), 474–484.Wang, X. and Petropoulos, F. (2016), ‘To select or to combine? the inventory perfor-mance of model and expert forecasts’, International Journal of Production Research (17), 5271–5282.Winters, P. R. (1960), ‘Forecasting sales by exponentially weighted moving averages’, Management Science (3), 324–342.Wolpert, D. H. (1996), ‘The lack of a priori distinctions between learning algorithms’, Neural Computation8