[PDF] Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective

Abstract

Choosing the technique that is the best at forecasting your data, is a problem that arises in any forecasting application. Decades of research have resulted into an enormous amount of forecasting methods that stem from statistics, econometrics and machine learning (ML), which leads to a very difficult and elaborate choice to make in any forecasting exercise. This paper aims to facilitate this process for high-level tactical sales forecasts by comparing a large array of techniques for 35 times series that consist of both industry data from the Coca-Cola Company and publicly available datasets. However, instead of solely focusing on the accuracy of the resulting forecasts, this paper introduces a novel and completely automated profit-driven approach that takes into account the expected profit that a technique can create during both the model building and evaluation process. The expected profit function that is used for this purpose, is easy to understand and adaptable to any situation by combining forecasting accuracy with business expertise. Furthermore, we examine the added value of ML techniques, the inclusion of external factors and the use of seasonal models in order to ascertain which type of model works best in tactical sales forecasting. Our findings show that simple seasonal time series models consistently outperform other methodologies and that the profit-driven approach can lead to selecting a different forecasting model.

Full PDF

PProﬁt-oriented sales forecasting: a comparison of forecastingtechniques from a business perspective

Tine Van Calster a , Filip Van den Bossche a , Bart Baesens a, b and Wilfried Lemahieu a a Faculty of Economics and Business, KU Leuven, Naamsestraat 69, 3000 Leuven, Belgium; b University of Southampton, University Road, Southampton SO17 1BJ, United Kingdom

ARTICLE HISTORY

Compiled February 5, 2020

ABSTRACT

Choosing the technique that is the best at forecasting your data, is a problem thatarises in any forecasting application. Decades of research have resulted into an enor-mous amount of forecasting methods that stem from statistics, econometrics andmachine learning (ML), which leads to a very diﬃcult and elaborate choice to makein any forecasting exercise. This paper aims to facilitate this process for high-leveltactical sales forecasts by comparing a large array of techniques for 35 times seriesthat consist of both industry data from the Coca-Cola Company and publicly avail-able datasets. However, instead of solely focusing on the accuracy of the resultingforecasts, this paper introduces a novel and completely automated proﬁt-driven ap-proach that takes into account the expected proﬁt that a technique can create duringboth the model building and evaluation process. The expected proﬁt function thatis used for this purpose, is easy to understand and adaptable to any situation bycombining forecasting accuracy with business expertise. Furthermore, we examinethe added value of ML techniques, the inclusion of external factors and the use ofseasonal models in order to ascertain which type of model works best in tacticalsales forecasting. Our ﬁndings show that simple seasonal time series models consis-tently outperform other methodologies and that the proﬁt-driven approach can leadto selecting a diﬀerent forecasting model.

KEYWORDS

Tactical sales forecasting; Benchmarking; External factors; Forecast evaluation;Forecasting practice

1. Introduction

This paper focuses on one of the most frequently asked questions in forecasting theoryand practice: which technique(s) should I choose to forecast this time series? In litera-ture, this question has been posed many times and has indeed been answered by bench-marks and competitions (Armstrong and Fildes, 2006; Crone, Hibon, and Nikolopou-los, 2011; Petropoulos, Makridakis, Assimakopoulos, and Nikolopoulos, 2014), as fore-casting has been an integral part of the business decision-making process for decadesand is used for this purpose in many industries (Armstrong and Fildes, 2006; Cangand Yu, 2014; Lessmann and Voß, 2017). However, most studies only take one evalua-tion criterion into account, i.e. the performance of the techniques on a test set, whilethe ﬁnal choice of a model in a business context depends on more considerations.

Corresponding author: [email protected] a r X i v : . [ ec on . E M ] F e b ndoubtedly, the costs that are associated with inaccurate forecasts make sure thataccuracy will always remain an important evaluation standard (Kahn, 2003). However,from a decision-making perspective, other questions immediately arise in the mind ofthe business expert as well, such as the potential impact of the forecast on the revenueof the company or the maintenance cost of the model. This paper therefore proposesan expected proﬁt function that can be integrated into several steps of the forecastingprocess, while also taking a closer look at which types of models perform best in asales forecast on a tactical or strategic level.Recent publications have shown a large oﬀering of forecasting techniques, rangingfrom the statistical methods to machine learning techniques. Given all of these theo-retical and technological developments, it is becoming increasingly diﬃcult to selectthe right type of technique for a given use case. Especially the group of Machine Learn-ing (ML) techniques has received a lot of attention recently, as it constitutes one ofthe most popular topics in forecasting literature (Fildes, 2006). Most articles on MLtechniques report favourable results when compared to more traditional methodolo-gies, both for single use cases and more extensive comparisons (Crone et al., 2011),although publications generally have a tendency to only report on positive outcomes(Armstrong, 2006). However, several authors have expressed their reservations con-cerning these complex techniques (Makridakis and Hibon, 2000). In contrast, (Croneet al., 2011) have shown that machine learning has caught up with statistical modellingand should not be dismissed lightly for forecasting exercises. This paper therefore alsoaims to investigate whether these more complex ML techniques truly outperform theclassical models for a tactical sales forecast.In this paper, we will focus on the ﬁeld of sales forecasting, as successful sales fore-casts are vital in both short- and long-term strategic and ﬁnancial planning (Ramos,Santos, and Rebelo, 2015). This research speciﬁcally deals with high-level forecasts,which are primarily meant for decision-making purposes, as opposed to inventoryplanning for speciﬁc products. In practice, this typically entails a monthly time series,which is non-intermittent and is prone to display a trend, a seasonal pattern or a com-bination of these characteristics. This type of time series is common in other ﬁelds aswell, and has therefore frequently been used for benchmarking purposes (Armstrongand Fildes, 2006; Crone et al., 2011; Petropoulos et al., 2014). We therefore take alook at the performance of techniques that model seasonality versus methodologiesthat do not have this ability, as season is a typical characteristic of sales time series.While trying out non-seasonal models might seems counterintuitive for this data, a lotof the more recently developed techniques do not have a seasonal component and stillseem to perform very well for many applications. This paper therefore also investigateswhether this type of model can perform well on these seasonal time series, given thenecessary pre-processing of the data. Furthermore, this high-level data also raises thequestion of the usefulness of incorporating external factors into the forecast. While thaddition of variables has obvious beneﬁts, such as the explanatory value, it frequentlyleads to higher model maintainability costs. Thus, we also compare univariate tech-niques with and without the ability to add external drivers to one another in thispaper.Our contributions are twofold, as we aim to both benchmark a large set of forecastingtechniques and integrate a practical construct into the model building and evaluat-ing process, i.e. proﬁt. Firstly, we propose a new strategy to inject a proﬁt-orientedview into the entire forecasting process without explicitly forecasting proﬁt itself. Inpractice, this constitutes a diﬀerent way of performing feature selection, tuning hy-per parameters and evaluating the forecasting techniques with the goal of achieving2he models that yield the highest expected proﬁt. The expected proﬁt function thatis used for this purpose, is easy to understand and adaptable to any situation bycombining forecasting accuracy with business expertise (Van Calster, Baesens, andLemahieu, 2017). Furthermore, our methodology ensures a completely automated anddata-driven model building process. Secondly, we benchmark a large range of fore-casting techniques according to three diﬀerent categorizations. As mentioned above,we contrast a range of complex techniques and traditional techniques, in order toassess whether the ML techniques are truly able to perform equally in regards to tac-tical sales forecasts. Secondly, we take the seasonal characteristics of sales time seriesinto consideration by distinguishing techniques that model seasonality themselves andmethods that require seasonal dummy variables to achieve the same goal. Finally, wecontrast techniques with and without variables, as we investigate the value of externalfactors in a high-level sales forecast. In terms of evaluating the techniques, we takeaccuracy, expected proﬁt, model complexity and model interpretability into consider-ation in order to integrate the business aspect of forecasting into the benchmark. Inthe end, we aim to quantitatively select the techniques that forecast accurately, leadto the highest expected proﬁt for any business case, and make the most sense froma business perspective. We will address these research questions by means of a totalof 35 monthly sales datasets. The datasets were collected from both The Coca-ColaCompany and from publicly available resources in order to add to the generalizabilityof the study.The paper is organized as follows. Section 2 deals with the related work that providesa necessary background to the research questions. Section 3 describes the datasets, theforecasting techniques and the general methodology of the experiments. Next, section4 focuses on the results of the research, while section 5 includes the conclusion.

2. Related work

This section on related work focuses on the necessary background literature for theresearch questions. We take a closer look at the forecasting literature on benchmarking,while also considering recent literature on proﬁt-oriented analytics.

Forecasts are typically performed by three categories of techniques (Cang and Yu,2014): traditional time series analysis (Aboagye-Sarfo, Mai, Sanﬁlippo, Preen, Stew-art, and Fatovich, 2015; Akın, 2015; Arunraj and Ahrens, 2015; Athanasopoulos, Hyn-dman, Song, and Wu, 2011; Franses and Van Dijk, 2005; Gil-Alana, Cunado, andPerez de Gracia, 2008; Gunter and ¨Onder, 2015; Petropoulos et al., 2014; Ramos et al.,2015; Santos, Nogales, and Ruiz, 2012), causal regression techniques (Akın, 2015; Arun-raj and Ahrens, 2015; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Lessmann,Baesens, Seow, and Thomas, 2015; Ma, Fildes, and Huang, 2016; Nikolopoulos, Good-win, Patelis, and Assimakopoulos, 2007), and more complex artiﬁcial intelligence tech-niques (Akın, 2015; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Crone et al.,2011; Fagiani, Squartini, Gabrielli, Spinsante, and Piazza, 2015; Lessmann et al., 2015;Taylor, De Menezes, and McSharry, 2006). The emergence of new techniques often re-quires a comparison with former methods, which leads to an extensive literature onbenchmarking, both for individual use cases (Aboagye-Sarfo et al., 2015; Arunraj andAhrens, 2015; Bozos and Nikolopoulos, 2011; Gil-Alana et al., 2008; Gunter and ¨Onder,3015; Lessmann et al., 2015) and for larger sets of time series (Athanasopoulos et al.,2011; Cang and Yu, 2014; Crone et al., 2011; Franses and Van Dijk, 2005; Ma et al.,2016; Makridakis and Hibon, 2000; Petropoulos et al., 2014; Weller and Crone, 2012).This research consists of both ﬁeld-speciﬁc (Aboagye-Sarfo et al., 2015; Akın, 2015;Athanasopoulos et al., 2011; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Fa-giani et al., 2015; Lessmann et al., 2015; Ma et al., 2016; Weller and Crone, 2012) andindustry-neutral benchmarks, which are oriented towards general conclusions (Croneet al., 2011; Makridakis and Hibon, 2000; Petropoulos et al., 2014). While some stud-ies use a combination of generated data and industry data (Petropoulos et al., 2014),most use real-life datasets to answer their research questions (Bozos and Nikolopoulos,2011; Cang and Yu, 2014; Lessmann et al., 2015; Weller and Crone, 2012).In terms of the conclusions that have come out of the larger studies, some discrep-ancies arise. While several studies point out that the newer ML techniques do notperform as well as the more traditional methods for classical time series (Makridakisand Hibon, 2000), others claim that these complex techniques have caught up in recentyears (Crone et al., 2011). In this paper, we therefore take a look at a wider range oftechniques from all three categories that were mentioned above. Furthermore, we alsocontrast techniques with and without external factors, which adds another factor thathas not been part of many larger benchmarking studies, except for (Athanasopouloset al., 2011). Our paper combines these elements in an extensive benchmark that isbased on publicly available data and recent sales time series.

Proﬁt-driven analytics has recently become a hot topic in analytics, as businessesare interested in the actual value that predictive models generate or the inﬂuencethat they have on their eventual net proﬁts. Integrating this value-centric view intoanalytics, has led to a growing number of proﬁt-driven methodologies, techniques andmetrics (Verbeke, Baesens, and Bravo, 2017). These proﬁt functions can be used indiﬀerent steps of the model building and model selection process. For example, proﬁthas been used as an evaluation metric for benchmarks in diﬀerent ﬁelds ( ´Oskarsd´ottir,Bravo, Verbeke, Sarraute, Baesens, and Vanthienen, 2017; Verbraken, Verbeke, andBaesens, 2013), while it has inspired entire proﬁt-driven algorithms as well (Stripling,vanden Broucke, Antonio, Baesens, and Snoeck, 2015; Verbeke, Dejaeger, Martens,Hur, and Baesens, 2012). In this paper, we aim to integrate this proﬁt-oriented viewinto multiple steps of the forecasting process instead of only using it as an additionalevaluation criterion.In forecasting, research on the proﬁt aspect is scarcer than in other ﬁelds. While themonetary value of classiﬁcation models has been extensively reviewed, the same cannotbe said for regression models. However, the impact of forecasting accuracy on netproﬁt is an interesting subject, as under- and over-forecasting both lead to completelydiﬀerent costs. The former might lead to a loss in sales and out-of-stock products,while the latter can lead to overstock and storage costs. While both directions forthe error inevitably bring about a loss of proﬁt, they are often not equal. Completelysymmetric proﬁt loss functions that are solely based on accuracy measures are thereforenot representative of the real world. The ultimate goal of proﬁt-oriented analytics isto ﬁnd the model with the best balance between costs and accuracy. While thesetwo concepts are inevitably linked in a forecasting exercise, we cannot state thatthey are exactly the same. Therefore, proﬁt-oriented benchmarking should take into4ccount both traditional accuracy metrics and metrics that point to the costs of theforecast, such as expected proﬁt functions or model complexity. So far, two diﬀerentviews on the integration of proﬁt into forecasting exercises have been proposed inrecent literature. The ﬁrst perspective optimizes an asymmetric loss function duringmodel training to model the imbalance between over- and under-forecasting. (Crone,Lessmann, and Stahlbock, 2005) applies this methodology to neural networks, while(Yang, King, and Chan, 2002) take a closer look at support vector regression models.The second way of integrating proﬁt into a forecasting function takes place after thetraining process. (Bansal, Sinha, and Zhao, 2008) propose a tuning procedure thatmodiﬁes the predictions so they are cost-optimal. (Zhao, Sinha, and Bansal, 2011)further ﬁne-tune this procedure. (Bozos and Nikolopoulos, 2011) also take a monetaryvalue into account when evaluating their forecasts, but do not modify the models inany way.In this paper, we take proﬁt into account in all of the steps that are mentionedabove. We optimize the parameters of our models, select features when necessary andevaluate our forecasts based on an asymmetric expected proﬁt function that can easilybe adjusted to any business case.

3. Methodology

This methodology section is divided into ﬁve parts. We begin by describing the datasetsand by explaining the proﬁt function that was used for both optimization and evalua-tion purposes in this paper. Next, the general experimental set-up is introduced, whichalso includes the description of the feature selection procedure. The fourth subsectionis dedicated to an overview of the forecasting techniques, while the last subsectionfocuses on evaluation metrics.

The data sets in this paper stem from two sources. Firstly, The Coca-Cola Companyhas given us a total of 20 time series, which represent two of their product categoriesin ten diﬀerent countries. These monthly time series all range from January 2004until September 2016. The external variables that correspond with these datasets,were collected by means of in-company data sources and are all based on informationabout the location of the data. Concretely, they consist of 20 variables that containinformation on weather, macro-economic indicators, holidays and pricing information.As weather information, 4 variables were included, such as temperature and precip-itation, while 9 variables allude to macro-economic information, such as GDP andCPI. Additionally, 3 factors refer to calendar eﬀects of public holidays, while the ﬁnal4 variables relate to both in-company and competitor pricing. An overview of thesevariables can be found in Table 1. These external factors were selected according todata availability, but also take into account the literature on the interesting types ofvariables for sales forecasting. Several types of information have proven to be usefulin this ﬁeld, although this generally depends on the aggregation level of the time se-ries (Syntetos, Babai, Boylan, Kolassa, and Nikolopoulos, 2016) and the volatility ofthe time series (Currie and Rowley, 2010). Research has shown that factors such asweather (Bertrand, Brusset, and Fortin, 2015), macro-economic inﬂuences (Sagaert,Aghezzaf, Kourentzes, and Desmet, 2017) and pricing and promotional information(Huang, Fildes, and Soopramanien, 2014; Ma et al., 2016) all have an impact on sales.5 ariable name Explanation

Weather

Maximum temperature

Average daily maximum temperature weighted by population

Maximum temperature squared

Square of average daily maximum temperature weighted by population

Precipitation

Average daily precipitation volume

Sunshine hours

Average daily number of sunshine hours

Macro-economic indicators

Consumer Price Index

Seasonally adjusted percentage change of CPI with regards to the previous month

Unemployment rate

Percentage of unemployment for entire population

Exchange rate

Exchange rate with US dollar

Short-term interest rate

Short-term interest rate in percentage per annum

Industrial production

Seasonally adjusted percentage change of industrial production with regards to the previous month

Merchandise import

Seasonally adjusted percentage change of Merchandise import with regards to the previous month

Merchandise export

Seasonally adjusted percentage change of Merchandise export with regards to the previous month

Gross Domestic Product

Seasonally adjusted annual rate, percentage change of GDP with regards to the previous month

Private Consumption

Seasonally adjusted annual rate, percentage change of PC with regards to the previous month

Holidays

Public holiday

Number of public holidays per month

Weekend

Number of public holidays in the weekend per month (possibility of long weekend)

Tuesday/Thursday

Number of public holidays on Tuesday or Thursday per month (possibility of long weekend)

Pricing

Company price

Average product category price in US dollars

Company price deﬂated

Average product category price in US dollars deﬂated by CPI

Competitor price

Average product category price of the main competitor in US dollars

Competitor price deﬂated

Average product category price of the main competitor in US dollars deﬂated by CPI

Table 1.

Summary of external variables

Secondly, we include a total of 15 publicly available datasets with similar charac-teristics in the analyses, in order to increase the generalizability of our ﬁndings, whichcan mostly be found in The Time Series Data Library . The general features ofthese monthly time series are summarized in Table 2. As all of these datasets alsoinclude information on location, we collected twelve external variables that containinformation on weather, macro-economic indicators and holidays as well. Concretely,we include four weather variables, seven macro-economic indicators and one holidayvariable. The weather variables consist of the same features as deﬁned in Table 1, whilethe macro-economic information includes all features in Table 1 except MerchandiseImport and Merchandise Export. Finally, the models with external factors also containthe number of public holidays for each month. Pricing information was not availablefor these datasets. The sources for these three categories are publicly available . The evaluation of any predictive model is generally focused on the accuracy that itachieves on a test set. In this paper, however, we take both accuracy and a morebusiness-oriented proﬁt measure into account. The proﬁt measure is represented byEquation 2, which is dependent on our deﬁnition of the Percentage Error (PE), whichcan be found in Equation 1. This proﬁt measure was ﬁrst deﬁned in (Van Calsteret al., 2017) and represents an estimation of the expected proﬁt of the target variable. https://datamarket.com/data/list/?q=provider:tsdl https://opendata.socrata.com/Business/Car-Sales-Data/da8m-smts https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_3.23/crucy.1506241137.v3.23/ https://data.oecd.org/ https://pypi.python.org/pypi/holidays ame Number ofproduct categories Range Number ofdata points LocationBeer Car sales 1

Car sales 2

Champagne

Paper

Petrol

Wine

Table 2.

Public data summary

The formula is very easy to interpret and can easily be adjusted to any business usecase. The two fundamental components of the expected proﬁt measure are the volumeof the sales, as more sales lead to more proﬁt, and the accuracy of the forecast, asbad forecasts inevitably lead to a loss of proﬁt. Next to these two core elements, weintroduce several parameters that integrate expert knowledge into the proﬁt function.

P E = Actuals i − F orecast i Actuals i ∗

100 (1)

P rof it = (cid:40) ((1 − ( α ∗ | P E | )) ∗ ( β cat ∗ V olume cat ) P E > γ or P E < δβ cat ∗ V olume cat otherwise (2)Firstly, the business user can inﬂuence the impact of the forecasting error on theexpected proﬁt by setting two parameters. The ﬁrst one deals with how the size of theerror is used as a penalization, as both over- and under-forecasting have proven to leadto various costs (Kahn, 2003). This penalization factor α can be modiﬁed according toa speciﬁc circumstance in a data-driven manner by executing a sensitivity analysis ona validation set. In this instance, is set at 1 . γ and δ , which indicate that any forecast that has a PE within these boundaries, does notlead to a signiﬁcant impact on the ﬁnal proﬁt. Note that γ should always be larger than δ . For example, we set boundaries of 1% error in both directions for The Coca-ColaCompany use case ( γ = 1 and δ = − γ and δ parameters can alsobe set unequally, if the forecasting error has a larger impact on proﬁt in one particulardirection, or even be completely omitted, if every inaccuracy while forecasting leadsto a loss of proﬁt.Secondly, the β cat weight refers to the proﬁt margin for the product or product cate-gory at hand. This weight can be expressed both relatively between diﬀerent productsand in absolute numbers, such as currencies. For The Coca-Cola Company use case,these β weights were determined by the proﬁt that the product actually generated inthe last year of the original training set. It is important to note that these weights7 ame β weights Beer

Car sales 1

Car sales 2

Champagne

Paper

Petrol

Wine

Table 3. β weights of public datasets remain constant throughout the analyses once they are set by the training set of theﬁrst prediction. The actual proﬁt of a product will ﬂuctuate over time and is drivenby many external factors that are not captured in the function. We have chosen tokeep this parameter constant because of two reasons: ease of use and availability ofproﬁt data. While the ﬁrst reason is self-explanatory, the second one is tied to theparticular use case of this paper. If data about the actual proﬁt of a product is morereadily available, this parameter can be used dynamically by updating it during thetesting process. The proﬁt in the analyses of this paper can therefore be viewed asthe proﬁt that the product will generate if business stays the same and must truly beinterpreted as the expected proﬁt. The β weights for the publicly available datasetswere chosen randomly with values between 0 and 3, and are displayed in Table 3. The general experimental set-up consists of hold-out sample forecasts for all datasets.Concretely, the time series are split up into training, validation and test sets. The testset includes the ﬁnal two years of the data, which leads to 24 data points to forecast.The validation set then consists of the year before the date that will be forecastand is only used for feature selection and parameter tuning when necessary for thegiven technique. Parameter tuning is performed once on the ﬁrst validation set in thetesting procedure, in order to avoid computational issues in the testing procedure.However, the feature selection procedure is repeated every three months, in order tokeep the model up-to-date. Once the necessary variables and hyper parameters havebeen selected, the training and validation sets are merged together in order to forecastthe test set. Both the training and validation sets change with every forecast, as theset-up consists of an expanding window. In the end, we therefore collect 24 one-monthahead forecasts for each technique and for each dataset. The complete experimentalset-up is visualized in Figure 1.The feature selection procedure consists of a hybrid method, which is based onthe combination of Minimum Redundancy Maximum Relevance criterion (mRMR)that was created by (Peng, Long, and Ding, 2005) as a ﬁltering technique, and asimple incremental wrapper method. The mRMR method is a mutual-informationbased algorithm that ranks the external factors according to their shared informationwith the target variable, while also taking into account their dependency on the otherexternal factors. This can be achieved by ﬁnding the feature set S with m features x i that maximizes the relevance with the target class c and minimizes the dependency8 igure 1. Experimental set-up between the independent variables. In short, this ﬁlter ﬁnds the features that maximizeEquation 3, which combines Equation 4 (Dependency) and Equation 5 (Redundancy). maxφ ( D, R ) , φ = D − R (3) D = 1 | S | (cid:88) ∀ x i ∈ S I ( x i , c ) (4) R = 1 | S | (cid:88) ∀ x i ,x j ∈ S I ( x i , x j ) (5)In this paper, the ﬁrst feature step in this paper selects either the top 15 or the top10 ranking of features, for the Coca-Cola Company datasets and the public datasetsrespectively, and then passes this on to the next step. Next, a simple forward incre-mental wrapper method starts with the top feature of the ranking and forecasts thevalidation set. Consecutively, one feature is added at a time into the feature set untilthe entire top 15 or top 10 ranking is used in the forecasting model. This methodologytherefore takes advantage of the initial ranking that was made by the mRMR ﬁlter.The feature set that will be used to forecast the test set, is selected out of these 15or 10 options by maximizing the proﬁt function, which is deﬁned in Section 3.2. Thisentire procedure is explained in Algorithm 1. In our benchmark, k is either 15 or 10,depending on the dataset at hand, and m is equal to 12 months.9 lgorithm 1 Pseudo code for feature selection procedurechoose size of validation set m split time series into training set S tr and validation set S val,m choose initial number of features k rank features according to the mRMR Maximum Relevance criterion into ranking R k for i = 1 to k do select top i features from R k for j = 1 to m do train model with R i features on training set S tr forecast S val,j calculate proﬁt P i,j add S val,j to training set S tr end for calculate proﬁt P i by summing over all P i,m reset training set S tr and validation set S val,m to original split end for Select R i features with highest proﬁt P i Feature selection is generally important because of two entirely diﬀerent reasons.Firstly, some of the variables might be correlated or inﬂuenced by the same under-lying information, which can lead to less accurate forecasts (Boivin and Ng, 2006).A feature selection procedure is therefore used to determine which set of variableshas the highest predictive power, while also eliminating any possible multicollinearity.Secondly, feature selection is equally important from a business perspective, as trans-parent models also have an explanatory advantage. Business analysts are interestedin gaining knowledge on which external factors might inﬂuence their target variable,which can be useful for strategic decisions (Athanasopoulos et al., 2011). However,this knowledge then also relates to the maintenance of the model, as the variablesthat never survive the feature selection procedure during testing, are not needed anylonger.

In order to conduct the necessary experiments, a total of 17 forecasting techniqueswere selected, which are summarized in Table 4. These techniques are categorized ac-cording to three diﬀerent types of attributes in order to answer our research questions.Firstly, we organize the methods according to the ability to use them as univariatewith and/or without external drivers. We deﬁne a univariate technique without vari-ables as a technique that only makes use of the sales times series itself to predictthe next month. Techniques that are able to include variables, however, also integratethe external drivers, such as the weather, to generate a prediction. 8 techniques canbe used in both ways, such as regression models, when past sales values are encodedas independent variables, next to the aforementioned external factors. We thereforebenchmark a total of 26 techniques in our ﬁnal analysis. Secondly, Table 4 displays theability of a technique to explicitly model the seasonality of a time series, as season-ality is a typical characteristic of the sales time series that we are considering in thispaper. Thirdly, the forecasting techniques are classiﬁed into Machine Learning (ML)techniques and non-ML techniques. Recently, a lot of forecasting literature has focused10 odel Variables? Seasonal? ML? Hyper parameters Possible valuesHolt-Winters exponentialsmoothing

No Yes No / /

Seasonal ARIMA

No Yes No AR, MA, SAR andSMA terms [0, 5]

Seasonal decomposition byLoess model

No Yes No / /

Seasonal random walk

No Yes No / /

ARMA-GARCH

No No No AR and MA terms [0, 5]

Random walk

No No No / /

Seasonal ARIMAX

Yes Yes No AR, MA, SAR andSMA terms [0, 5]

Vector Autoregression

Yes No No AR term [0, 5]

Conditional InferenceRegression Tree

Both No Yes / /

Multiple Linear Regression

Both No No / /

Multivariate AdaptiveRegression Splines

Both No No Maximum degree ofinteraction [1, 2]

Recursive PartitioningRegression Tree

Both No Yes / /

K Nearest Neighbors Regression

Both No Yes Number of neighborsWeights for neighboringresponse values [2, 5]uniform, by distance

Long Short Term Memory RNN

Both No Yes Number of hidden neurons [1, 10]

Random Forests

Both No Yes / /

Simple Multilayer Perceptron

Both No Yes Number of hidden neurons [1, 10]

Support Vector Regression

Both No Yes KernelPenalty parameterof error termGamma (for rbf kernelonly) Radial basis function,linear1e0, 1e1, 1e2,1e3[1e-2, 1e2]

Table 4.

Overview of forecasting techniques on these ML techniques and often reports them to be more accurate than traditionaltechniques. In order to simplify the issue of what is considered an ML technique andwhat is not, we chose to consider methods ML if they belong to one of the four follow-ing categories: decision tree learning, neural networks, support vector machines andk-nearest neighbours, as this last category is based on a clustering algorithm. Thesethree categorizations will underpin the answer to which type of technique is best usedto achieve an accurate sales forecast. Finally, the table also contains the hyper parame-ters that were selected beforehand, and their possible values. Tuning hyper parametershas proven to be essential for truly assessing how well a certain technique can perform(Carrizosa, Mart´ın-Barrag´an, and Morales, 2014), and is therefore an essential partof benchmarking in general. In this paper, the parameter selection was conducted byevaluating model performance on the validation set and by applying an exhaustive gridsearch methodology. The evaluation metric that was optimized, is again the expectedproﬁt function that was deﬁned in Section 3.2. Note that only the parameters that arementioned in 4 are set in this way.It is important to comment on the inﬂuence of the type of technique on the datapreprocessing aspect of the analyses. Firstly, we normalized all variables to a rangebetween 0 and 1 for all of the analyses in this paper. This step was especially nec-essary for techniques such as neural networks, as literature reports this as a generalpractice because they beneﬁt greatly from this step (Sola and Sevilla, 1997). Further-more, business users can derive insights on the relative importance of variables if theforecasting technique is transparent, in order to identify the most important drivers11f their sales. Secondly, the time series that are part of the analyses all display a cer-tain trend and seasonality, which should be incorporated into the forecasting modelif possible. The time series analysis techniques that we consider in this paper, explic-itly include this seasonality in their model building by, for example, deﬁning seasonalparameters. However, other types of techniques, such as regression models or neuralnetworks, do not have this ability, which can lead to worse forecasts if the trend andseason have a strong inﬂuence on the sales (Zhang and Qi, 2005). We therefore addtwo additional data preprocessing steps for this type of models: trend/seasonal dif-ferencing and seasonal dummy variables. In the ﬁrst step, we check whether the timeseries actually contains either a trend or a season by means of appropriate unit roottests, such i.e. the Augmented Dickey-Fuller test (Dickey and Fuller, 1979) and theOsborn-Chui-Smith-Birchenhall test respectively (Osborn, Chui, Smith, and Birchen-hall, 1988). If the results thereof show signs of either characteristic, we apply thecorresponding diﬀerencing. Secondly, if the time series is seasonal, we also include aset of seasonal dummy variables to further model the possible seasonal eﬀects. Thesevariables are not included in the feature selection procedure, but are always included ifthere is a seasonal component in the time series. Thirdly, when techniques can be usedboth with and without variables, past sales values need to be encoded as independentvariables. We therefore need to determine how many past values will be included intothe model. This hyper parameter is selected on the same validation set as the otherhyper parameters, and has possible values ranging from one month to seven months.Furthermore, we deﬁne methods with external factors as techniques that use both pastsales data and external parameters as independent variables. In this case, the numberof past months to use as input to the model, is therefore again a hyper parameter.Finally, we note that the list of forecasting techniques is not exhaustive. Two typesof methods are notably under-represented: ensemble methodologies and deep learningmethods. We opted to include only one technique of each category in order to keep thescope of the paper manageable, i.e. Random Forests and Long-Short Term MemoryNeural Networks respectively. However, the obvious next step of this research is totake a closer look at these types of methodologies.

Evaluation for forecasting benchmarks is often entirely based on accuracy metrics.There has been a lot of discussion in the past about which metric gives the bestoverview of performance when comparing techniques, as many commonly used mea-sures can exhibit strange behavior (Hyndman and Koehler, 2006; Kolassa, 2016; Tash-man, 2000). In this paper, we therefore propose a combination of frequently used ac-curacy metrics and the expected proﬁt function that was deﬁned above, to select thebest-performing models. In the ﬁrst category, we take into account the Mean AbsolutePercentage Error (MAPE) and the Root Mean Squared Error (RMSE), as deﬁned inEquations 6 and 7. Furthermore, we include the seasonal version of the Mean Ab-solute Scaled Error, which was ﬁrst deﬁned in (Hyndman and Koehler, 2006), basedon the seasonality of the time series data. The formula for this metric can be foundin Equation 8 with m as the seasonality of the time series. This metric compares atechnique’s performance to the in-sample error of a seasonal nave model, which makesit perfect for truly benchmarking techniques. Next to the expected proﬁt function, wealso consider the computation time of each forecast as an approximate of the modelcomplexity. We therefore include a total of ﬁve quantitative performance metrics in12ur analysis. M AP E = 1 n n (cid:88) t =1 | Actual t − F orecast t Actual t | ∗

100 (6)

RM SE = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) t =1 ( Actual t − F orecast t ) (7) M ASE = T (cid:88) t =1 | Actual t − F orecast t | T − m T (cid:88) t = m +1 | Actual t − Actual t − m | (8)

4. Results

The result section of this paper will ﬁrstly take a look at the experimental results,which are based on forecasting the 35 datasets with 17 diﬀerent forecasting techniques.Secondly, we will discuss the implications of these results, while we also comment onthe limitations of this study.

The results of the experiments are based on a total of 21840 forecasts, as we performed24 one-month-ahead forecasts on 35 time series with 26 diﬀerent models. We only takeinto account the results for models that have completed both the parameter tuningand feature selection procedures that were explained in the methodology section ofthis paper, see Section 3. Other model set-ups were disregarded for the ﬁnal analyses.In order to compare all of these models to one another, we apply two rankingtests for the 26 forecasting techniques according to ﬁve evaluation measures: MAPE,RMSE, MASE, Proﬁt and computation time. Concretely, we rank all of the methodsfor each of the 840 unique forecasts and then display the average over these fore-casts. This methodology ensures a fairer comparison between the techniques than,e.g., simply taking an average MAPE of the 840 forecasts. Furthermore, we can verifyif the diﬀerences in rank are signiﬁcantly separate from one another. The Friedmantest (Friedman, 1940) is a non-parametric statistical test that veriﬁes whether thediﬀerence between two treatments is signiﬁcant or not. In this benchmark, the 26 fore-casting techniques constitute the ’treatments’, deﬁned as k in Equation 9, while the35 time series datasets are the ’blocks’, N in Equation 9, which form groups of similarunits. The Friedman test will rank the treatments according to a given evaluation cri-terion and will compare this ranking for each block. Therefore, three diﬀerent averagerankings are made for these experiments, according to the three evaluation measures.13 odel MAPE RMSE MASE Proﬁt Time Without external factors

ARMA-GARCH (GARCH)

Conditional Inference Regression Tree (CtreeUni)

Holt-Winters exponential smoothing (HW)

K Nearest Neighbors Regression (KNNUni)

Long Short Term Memory RNN (LSTMUni)

Multiple Linear Regression (LRUni)

Multivariate Adaptive Regression Splines (MARSUni)

Random Forests (RFUni)

Random walk (RW)

Seasonal ARIMA (SARIMA) 10.45 (/) 10.47 (/) 10.48 (/) 10.70 (/)

Seasonal decomposition by Loess model (DM)

Seasonal random walk (SRW)

Simple Multilayer Perceptron (MLPUni)

Support Vector Regression (SVRUni)

With external factors

Conditional Inference Regression Tree (CtreeMulti)

K Nearest Neighbors Regression (KNNMulti)

Long Short Term Memory RNN (LSTMMulti)

Multiple Linear Regression (LRMulti)

Multivariate Adaptive Regression Splines (MARSMulti)

Random Forests (RFMulti)

Recursive Partitioning Regression Tree (RpartMulti)

Seasonal ARIMAX (SARIMAX)

Simple Multilayer Perceptron (MLPMulti)

Support Vector Regression (SVRMulti)

Vector Autoregression (VAR)

Friedman test

Chi-Squared

P-value < < < < < Table 5.

Overview of benchmarking results. Columns contain the forecasting techniques and their averageranks according to MAPE, RMSE, MASE, expected proﬁt and computation time. The numbers betweenbrackets are the p-values from the pairwise Nemenyi test that compares the given method to the best techniqueaccording to the evaluation metric at hand. χ F = 12 Nk ( k + 1) [ (cid:88) j R j − k ( k + 1) CD = q α (cid:114) k ( k + 1)6 N (10)with q α as critical values, which consist of the Studentized range statistic divided by √ WLSTMUniLSTMMultiKNNMultiRpartMultiRFMultiMLPMultiRFUniSVRMultiVARRpartUniLRUniMARSUniMARSMultiKNNUniCtreeMultiCtreeUniLRMultiMLPUniSRWSVRUniGARCHDMSARIMAXHWSARIMA MAPE RMSE MASE Profit Time l l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l l

Figure 2.

Rank comparison sales forecasts, which all display a trend and seasonality. In order to select the topperforming techniques, we take Table 5 and Figure 3 into consideration again, whichshow a clear top four models in terms of both accuracy and proﬁt. The best-performingforecasting techniques are SARIMA, SARIMAX, Holt-Winters and the seasonal de-composition model (DM). The only exception to the seasonal, univariate and non-MLrule is SARIMAX, which also incorporates external drivers into the model.In Figures 3, we take a closer look at these best-performing models in terms of bothaccuracy and the expected proﬁt. These ﬁgures contain the distributions of the pairwisediﬀerences of SARIMA, SARIMAX, Holt-Winters and the seasonal decompositionmodel (DM). Grey boxplots indicate a signiﬁcant diﬀerence between the two modelsthat are mentioned on the Y-axis. DM seems to consistently perform worse than theother three models, while the remaining three time series models all perform equallyin terms of both MAPE and expected proﬁt.The last performance metric that can still make a diﬀerence in the selection ofthe best-performing technique, is computation time. This measure is indicative of thecomplexity of the model, but can also have an eﬀect on the ﬁnal costs of the modelas computation eﬀorts also lead to additional expenses. The average computationtime of the top four models is summarized in Table 6 below. Clearly, the trainingof Holt-Winters and DM take the least amount of time by far. However, the averagecomputation times of both SARIMA and SARIMAX are still below 10 seconds per16 igure 3.

Pairwise diﬀerences of four best-performing models

Model Average computation time (seconds)Holt-Winters

Seasonal decomposition model

SARIMA

SARIMAX

Table 6.

Average computation time for best-performing models forecast. Furthermore, it is logical that these last two techniques require more time totrain, given the feature and hyper parameter optimization according to proﬁt for bothof them. In conclusion, a top three of equally performing time series models remains:Holt-Winters exponential smoothing, Seasonal ARIMA and Seasonal ARIMAX, butHolt-Winters will signiﬁcantly save on computation time if there is a large number oftime series to forecast.Finally, we also take a closer look at the interpretability of these top three tech-niques. As time series models, they are all transparent methodologies that attributeweights to the autoregressive, trend and seasonal components of the time series. Addi-tionally, SARIMAX displays the weights of the added external factors, indicating theirimpact on the sales, which greatly adds to the explanatory power of the model. Thistherefore entails a large advantage for the SARIMAX technique in terms of businessvalue. On the other hand, the feature selection procedure leads to a higher computationtime and eﬀort, so these two aspects need to be weighed against one another. In theend, the univariate time series models perform equally to SARIMAX, but additionalinformation on the external inﬂuences on the sales might be preferable in a businesscontext. Note that this refers to two completely diﬀerent objectives, i.e. predictingversus explaining. Before the ﬁnal selection of the best technique, businesses need to17learly outline the objective of a forecasting model. In terms of variable selection inthis paper, Figure 4 shows the average percentage of selected variables for each ofthe variable types, illustrated for each of the two data sources. From these charts, wecan conclude that weather and macro-economic variables are selected the most forall datasets. On average, 2 weather variables and 2.5 macro-economic variables wereselected for the Coca-Cola Company datasets, while 1.78 weather variables and 3.89macro-economic variables were chosen for the public datasets.

Figure 4.

Average percentage of selected variables

The second research question focused on the integration of the expected proﬁt func-tion into the model selection process. We can clearly see from Table 5 and especiallyFigure 2 that the ranking of the techniques according to MAPE, RMSE and MASEare virtually the same, while the ranking according to the expected proﬁt functionlooks a bit diﬀerent. Although the top methods perform well according to all of theseevaluation measures, the changes in the ranking already indicate that it is valuableto compare models according to proﬁt as well, as it might lead to a diﬀerent rankingof the possible techniques. For example, the p-values in Table 5 of the DM techniqueare not signiﬁcantly diﬀerent from the top three time series models in terms of theaccuracy measures, but they are signiﬁcantly diﬀerent from them when we look at theexpected proﬁt function. In Figure 5, we will look at some pairwise diﬀerences of othermodels according to MAPE and Proﬁt as well. In this ﬁgure, we can clearly see thattechniques can signiﬁcantly diﬀer in terms of Proﬁt and not in terms of MAPE, orvice versa. Speciﬁcally, we compare the univariate cases of Multiple Linear Regression(LRUni) and Support Vector Regression (SVRUni), and the variant with external fac-tors of Simple Multilayer Perceptron (MLPMulti). The pairwise diﬀerences betweenSVR and LR, and SVR and MLP show that there is a clear diﬀerence between thetwo evaluation measures. It is also important to note that these changes do not existin pairwise diﬀerences when we only compare the accuracy metrics.18 igure 5.

Domination plots of pairwise Nemenyi diﬀerences in p-values

The results of this study have three interesting implications for model selection in salesforecasting from a business perspective. Firstly, we proposed a proﬁt-driven approachthat provides a completely automated framework for model building and selection.The expected proﬁt function that we implement, is completely adaptable to any salesforecasting situation by combining business expertise with traditional accuracy-basedevaluation. Furthermore, this proﬁt function can be used as an evaluation criterionthat gives a diﬀerent view on which technique is truly the best one in a benchmarkingexercise. While the results in this paper are consistently in line with the accuracymeasures, the overall ranking according to proﬁt is still signiﬁcantly diﬀerent thanthe accuracy-based ones. This indicates that a ranking according to proﬁt might yielda diﬀerent result in model selection. In this paper, however, the top three models’performance was consistenly very close to one another, while the same models out-performed others by a signiﬁcant margin. In other cases, when model performancebetween techniques is closer to one another, the expected proﬁt function can providean additional perspective into ﬁnal model selection. Furthermore, this paper adds tothe scarce literature on the use of proﬁt-driven analytics in forecasting and regressionanalysis.Secondly, we notice that univariate time series models that explicitly capture sea-sonality, perform the best in this benchmarking study, although the Seasonal ARIMAXmethod is an exception to the univariate characteristic. However, this technique onlyperforms equally to the aforementioned univariate methods and we can therefore raisethe question if the addition of external variables is truly useful in this context. Whileother studies have shown the value of adding external drivers into the models for salesforecasting on a strategic level (Sagaert et al., 2017), this research shows that we canforecast as well without any independent variables in the model. When we take intoaccount the additional cost of data collection and model maintenance, we concludethat forecasting the sales on a product category level is easier achieved by univariatemodels without compromising on accuracy or proﬁt. Although we recognize the addedexplanatory value of integrating features, we question if it is worth the eﬀort whenachieving the best forecast is the goal.Finally, we compared two categories of forecasting techniques to one another: sta-tistical methods and machine learning techniques. In the case of tactical sales forecast-19ng, we clearly see that simpler models outperform the others signiﬁcantly for these35 datasets. This leads us to conclude that the more traditional models are actuallystill performing the best when tackling this type of time series problem. These ﬁndingsare in line with (Makridakis and Hibon, 2000), but contradict (Crone et al., 2011). Toconclude, seasonal time series models tend to outperform other techniques for a tac-tical sales forecast. From a business perspective, this conclusion is especially positive,as these models are easy to interpret and have a faster computation time.

5. Conclusion

In this paper, we introduced a new, completely automated and proﬁt-oriented strategyto sales forecasting, which integrates an expected proﬁt function into several steps ofmodel selection. This function can be implemented in any sales forecasting context byletting business experts and previous data set the proﬁt margins for every product.Furthermore, our research has proven that simpler time series models tend to outper-form more complex techniques for 35 sales datasets. All of the applied ML techniquesachieve signiﬁcantly worse results than the traditional models, both in accuracy andproﬁt. This implies that less complex techniques are still the best type of method tohandle tactical sales forecasting. Finally, we found that univariate time series mod-els that are able to explicitly model the seasonality of a time series, perform best.This indicates that the addition of external variables is unnecessary, especially whenwe consider the additional costs that are linked to maintaining models with externaldrivers.In terms of possible limitations of this study, we recognize some shortcomings inthis paper. Firstly, it is impossible to come up with an exhaustive list of forecastingtechniques. However, we have attempted to implement common methods from allthree categories of techniques that are frequently used for forecasting. Furthermore,this research consists of 35 monthly time series, which is signiﬁcantly less than thelarger benchmarks and competitions in the ﬁeld (Athanasopoulos et al., 2011; Croneet al., 2011; Makridakis and Hibon, 2000). However, this paper particularly focuses onone ﬁeld, i.e. sales forecasting, and is one of the larger benchmarking studies in thisspeciﬁc area. Furthermore, we have added to the generalizability and reproducibilityof the study by including several publicly available datasets as well. Finally, we onlyimplemented individual forecasting methods without considering ensemble methods.This type of methodology has become extremely popular in forecasting (Lessmannet al., 2015) and it has been proven that this approach can signiﬁcantly impact theaccuracy of forecasts. Potential future research therefore includes an expansion ofthis study in three aspects. Firstly, we can include more sales time series in order tofurther underpin our statements. Secondly, we can implement more techniques andinclude ensemble methods. Finally, this study can be further expanded to other ﬁeldsthan sales forecasting as well. However, given the range of techniques and the numberof datasets that were already used in this paper, we can state that simple, seasonaltime series models are still the best choice for a high-level tactical sales forecast.

Acknowledgements

We would like to acknowledge The Coca-Cola Company for funding this research andproviding us with the necessary business expertise and data to conduct our experi-20ents.

BibliographyReferences

Aboagye-Sarfo, P., Mai, Q., Sanﬁlippo, F. M., Preen, D. B., Stewart, L. M., Fatovich, D. M.,2015. A comparison of multivariate and univariate time series approaches to modellingand forecasting emergency department demand in western australia. Journal of biomedicalinformatics 57, 62–73.Akın, M., 2015. A novel approach to model selection in tourism demand modeling. TourismManagement 48, 64–72.Armstrong, J. S., 2006. Findings from evidence-based forecasting: Methods for reducing fore-cast error. International Journal of Forecasting 22 (3), 583–598.Armstrong, J. S., Fildes, R., 2006. Making progress in forecasting. International Journal ofForecasting 22 (3), 433–441.Arunraj, N. S., Ahrens, D., 2015. A hybrid seasonal autoregressive integrated moving averageand quantile regression for daily food sales forecasting. International Journal of ProductionEconomics 170, 321–335.Athanasopoulos, G., Hyndman, R. J., Song, H., Wu, D. C., 2011. The tourism forecastingcompetition. International Journal of Forecasting 27 (3), 822–844.Bansal, G., Sinha, A. P., Zhao, H., 2008. Tuning data mining methods for cost-sensitive regres-sion: a study in loan charge-oﬀ forecasting. Journal of Management Information Systems25 (3), 315–336.Bertrand, J.-L., Brusset, X., Fortin, M., 2015. Assessing and hedging the cost of unseasonalweather: Case of the apparel sector. European Journal of Operational Research 244 (1),261–276.Boivin, J., Ng, S., 2006. Are more data always better for factor analysis? Journal of Econo-metrics 132 (1), 169–194.Bozos, K., Nikolopoulos, K., 2011. Forecasting the value eﬀect of seasoned equity oﬀeringannouncements. European Journal of Operational Research 214 (2), 418–427.Cang, S., Yu, H., 2014. A combination selection algorithm on forecasting. European Journalof Operational Research 234 (1), 127–139.Carrizosa, E., Mart´ın-Barrag´an, B., Morales, D. R., 2014. A nested heuristic for parametertuning in support vector machines. Computers & Operations Research 43, 328–334.Crone, S. F., Hibon, M., Nikolopoulos, K., 2011. Advances in forecasting with neural networks?empirical evidence from the nn3 competition on time series prediction. International Journalof Forecasting 27 (3), 635–660.Crone, S. F., Lessmann, S., Stahlbock, R., 2005. Utility based data mining for time seriesanalysis: cost-sensitive learning for neural network predictors. In: Proceedings of the 1stinternational workshop on Utility-based data mining. ACM, pp. 59–68.Currie, C. S., Rowley, I. T., 2010. Consumer behaviour and sales forecast accuracy: What’sgoing on and how should revenue managers respond? Journal of Revenue and Pricing Man-agement 9 (4), 374–376.Dickey, D. A., Fuller, W. A., 1979. Distribution of the estimators for autoregressive time serieswith a unit root. Journal of the American statistical association 74 (366a), 427–431.Fagiani, M., Squartini, S., Gabrielli, L., Spinsante, S., Piazza, F., 2015. A review of datasets andload forecasting techniques for smart natural gas and water grids: Analysis and experiments.Neurocomputing 170, 448–465.Fildes, R., 2006. The forecasting journals and their contribution to forecasting research: Cita-tion analysis and expert opinion. International Journal of forecasting 22 (3), 415–432.Franses, P. H., Van Dijk, D., 2005. The forecasting performance of various models for seasonal- ty and nonlinearity for quarterly industrial production. International Journal of Forecasting21 (1), 87–102.Friedman, M., 1940. A comparison of alternative tests of signiﬁcance for the problem of mrankings. The Annals of Mathematical Statistics 11 (1), 86–92.Gil-Alana, L. A., Cunado, J., Perez de Gracia, F., 2008. Tourism in the canary islands: fore-casting using several seasonal time series models. Journal of Forecasting 27 (7), 621–636.Gunter, U., ¨Onder, I., 2015. Forecasting international city tourism demand for paris: Accuracyof uni-and multivariate models employing monthly data. Tourism Management 46, 123–135.Huang, T., Fildes, R., Soopramanien, D., 2014. The value of competitive information in fore-casting fmcg retail product sales and the variable selection problem. European Journal ofOperational Research 237 (2), 738–748.Hyndman, R. J., Koehler, A. B., 2006. Another look at measures of forecast accuracy. Inter-national journal of forecasting 22 (4), 679–688.Kahn, K. B., 2003. How to measure the impact of a forecast error on an enterprise? TheJournal of Business Forecasting 22 (1), 21.Kolassa, S., 2016. Evaluating predictive count data distributions in retail sales forecasting.International Journal of Forecasting 32 (3), 788–803.Lessmann, S., Baesens, B., Seow, H.-V., Thomas, L. C., 2015. Benchmarking state-of-the-artclassiﬁcation algorithms for credit scoring: An update of research. European Journal ofOperational Research 247 (1), 124–136.Lessmann, S., Voß, S., 2017. Car resale price forecasting: The impact of regression method,private information, and heterogeneity on forecast accuracy. International Journal of Fore-casting 33 (4), 864–877.Ma, S., Fildes, R., Huang, T., 2016. Demand forecasting with high dimensional data: Thecase of sku retail sales forecasting with intra-and inter-category promotional information.European Journal of Operational Research 249 (1), 245–257.Makridakis, S., Hibon, M., 2000. The m3-competition: results, conclusions and implications.International journal of forecasting 16 (4), 451–476.Nemenyi, P., 1962. Distribution-free multiple comparisons. In: Biometrics. Vol. 18. INTERNA-TIONAL BIOMETRIC SOC 1441 I ST, NW, SUITE 700, WASHINGTON, DC 20005-2210,p. 263.Nikolopoulos, K., Goodwin, P., Patelis, A., Assimakopoulos, V., 2007. Forecasting with cueinformation: A comparison of multiple regression with alternative forecasting approaches.European Journal of Operational Research 180 (1), 354–368.Osborn, D. R., Chui, A. P., Smith, J. P., Birchenhall, C. R., 1988. Seasonality and the order ofintegration for consumption. Oxford Bulletin of Economics and Statistics 50 (4), 361–377.´Oskarsd´ottir, M., Bravo, C., Verbeke, W., Sarraute, C., Baesens, B., Vanthienen, J., 2017. So-cial network analytics for churn prediction in telco: Model building, evaluation and networkarchitecture. Expert Systems with Applications 85, 204–220.Peng, H., Long, F., Ding, C., 2005. Feature selection based on mutual information criteriaof max-dependency, max-relevance, and min-redundancy. IEEE Transactions on patternanalysis and machine intelligence 27 (8), 1226–1238.Petropoulos, F., Makridakis, S., Assimakopoulos, V., Nikolopoulos, K., 2014. horses for coursesin demand forecasting. European Journal of Operational Research 237 (1), 152–163.Ramos, P., Santos, N., Rebelo, R., 2015. Performance of state space and arima models forconsumer retail sales forecasting. Robotics and computer-integrated manufacturing 34, 151–163.Sagaert, Y. R., Aghezzaf, E.-H., Kourentzes, N., Desmet, B., 2017. Tactical sales forecast-ing using a very large set of macroeconomic indicators. European Journal of OperationalResearch.Santos, A. A., Nogales, F. J., Ruiz, E., 2012. Comparing univariate and multivariate modelsto forecast portfolio value-at-risk. Journal of ﬁnancial econometrics 11 (2), 400–441.Sola, J., Sevilla, J., 1997. Importance of input data normalization for the application of neuralnetworks to complex industrial problems. IEEE Transactions on Nuclear Science 44 (3),464–1468.Stripling, E., vanden Broucke, S., Antonio, K., Baesens, B., Snoeck, M., 2015. Proﬁt max-imizing logistic regression modeling for customer churn prediction. In: Data Science andAdvanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on. IEEE,pp. 1–10.Syntetos, A. A., Babai, Z., Boylan, J. E., Kolassa, S., Nikolopoulos, K., 2016. Supply chainforecasting: Theory, practice, their gap and the future. European Journal of OperationalResearch 252 (1), 1–26.Tashman, L. J., 2000. Out-of-sample tests of forecasting accuracy: an analysis and review.International journal of forecasting 16 (4), 437–450.Taylor, J. W., De Menezes, L. M., McSharry, P. E., 2006. A comparison of univariate methodsfor forecasting electricity demand up to a day ahead. International Journal of Forecasting22 (1), 1–16.Van Calster, T., Baesens, B., Lemahieu, W., 2017. Profarima: A proﬁt-driven order identiﬁ-cation algorithm for arima models in sales forecasting. Applied Soft Computing.Verbeke, W., Baesens, B., Bravo, C., 2017. Proﬁt Driven Business Analytics: A Practitioner’sGuide to Transforming Big Data into Added Value. John Wiley & Sons.Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B., 2012. New insights into churnprediction in the telecommunication sector: A proﬁt driven data mining approach. EuropeanJournal of Operational Research 218 (1), 211–229.Verbraken, T., Verbeke, W., Baesens, B., 2013. A novel proﬁt maximizing metric for measur-ing classiﬁcation performance of customer churn prediction models. IEEE Transactions onKnowledge and Data Engineering 25 (5), 961–973.Weller, M., Crone, S., 2012. Supply chain forecasting: Best practices & benchmarking study.Technical Paper. Lancaster Centre for Forecasting, 1–42.Yang, H., King, I., Chan, L., 2002. Non-ﬁxed and asymmetrical margin approach to stockmarket prediction using support vector regression. In: Neural Information Processing, 2002.ICONIP’02. Proceedings of the 9th International Conference on. Vol. 3. IEEE, pp. 1398–1402.Zhang, G. P., Qi, M., 2005. Neural network forecasting for seasonal and trend time series.European journal of operational research 160 (2), 501–514.Zhao, H., Sinha, A. P., Bansal, G., 2011. An extended tuning method for cost-sensitive regres-sion and forecasting. Decision Support Systems 51 (3), 372–383.