Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective
Tine Van Calster, Filip Van den Bossche, Bart Baesens, Wilfried Lemahieu
PProfit-oriented sales forecasting: a comparison of forecastingtechniques from a business perspective
Tine Van Calster a , Filip Van den Bossche a , Bart Baesens a, b and Wilfried Lemahieu a a Faculty of Economics and Business, KU Leuven, Naamsestraat 69, 3000 Leuven, Belgium; b University of Southampton, University Road, Southampton SO17 1BJ, United Kingdom
ARTICLE HISTORY
Compiled February 5, 2020
ABSTRACT
Choosing the technique that is the best at forecasting your data, is a problem thatarises in any forecasting application. Decades of research have resulted into an enor-mous amount of forecasting methods that stem from statistics, econometrics andmachine learning (ML), which leads to a very difficult and elaborate choice to makein any forecasting exercise. This paper aims to facilitate this process for high-leveltactical sales forecasts by comparing a large array of techniques for 35 times seriesthat consist of both industry data from the Coca-Cola Company and publicly avail-able datasets. However, instead of solely focusing on the accuracy of the resultingforecasts, this paper introduces a novel and completely automated profit-driven ap-proach that takes into account the expected profit that a technique can create duringboth the model building and evaluation process. The expected profit function thatis used for this purpose, is easy to understand and adaptable to any situation bycombining forecasting accuracy with business expertise. Furthermore, we examinethe added value of ML techniques, the inclusion of external factors and the use ofseasonal models in order to ascertain which type of model works best in tacticalsales forecasting. Our findings show that simple seasonal time series models consis-tently outperform other methodologies and that the profit-driven approach can leadto selecting a different forecasting model.
KEYWORDS
Tactical sales forecasting; Benchmarking; External factors; Forecast evaluation;Forecasting practice
1. Introduction
This paper focuses on one of the most frequently asked questions in forecasting theoryand practice: which technique(s) should I choose to forecast this time series? In litera-ture, this question has been posed many times and has indeed been answered by bench-marks and competitions (Armstrong and Fildes, 2006; Crone, Hibon, and Nikolopou-los, 2011; Petropoulos, Makridakis, Assimakopoulos, and Nikolopoulos, 2014), as fore-casting has been an integral part of the business decision-making process for decadesand is used for this purpose in many industries (Armstrong and Fildes, 2006; Cangand Yu, 2014; Lessmann and Voß, 2017). However, most studies only take one evalua-tion criterion into account, i.e. the performance of the techniques on a test set, whilethe final choice of a model in a business context depends on more considerations.
Corresponding author: [email protected] a r X i v : . [ ec on . E M ] F e b ndoubtedly, the costs that are associated with inaccurate forecasts make sure thataccuracy will always remain an important evaluation standard (Kahn, 2003). However,from a decision-making perspective, other questions immediately arise in the mind ofthe business expert as well, such as the potential impact of the forecast on the revenueof the company or the maintenance cost of the model. This paper therefore proposesan expected profit function that can be integrated into several steps of the forecastingprocess, while also taking a closer look at which types of models perform best in asales forecast on a tactical or strategic level.Recent publications have shown a large offering of forecasting techniques, rangingfrom the statistical methods to machine learning techniques. Given all of these theo-retical and technological developments, it is becoming increasingly difficult to selectthe right type of technique for a given use case. Especially the group of Machine Learn-ing (ML) techniques has received a lot of attention recently, as it constitutes one ofthe most popular topics in forecasting literature (Fildes, 2006). Most articles on MLtechniques report favourable results when compared to more traditional methodolo-gies, both for single use cases and more extensive comparisons (Crone et al., 2011),although publications generally have a tendency to only report on positive outcomes(Armstrong, 2006). However, several authors have expressed their reservations con-cerning these complex techniques (Makridakis and Hibon, 2000). In contrast, (Croneet al., 2011) have shown that machine learning has caught up with statistical modellingand should not be dismissed lightly for forecasting exercises. This paper therefore alsoaims to investigate whether these more complex ML techniques truly outperform theclassical models for a tactical sales forecast.In this paper, we will focus on the field of sales forecasting, as successful sales fore-casts are vital in both short- and long-term strategic and financial planning (Ramos,Santos, and Rebelo, 2015). This research specifically deals with high-level forecasts,which are primarily meant for decision-making purposes, as opposed to inventoryplanning for specific products. In practice, this typically entails a monthly time series,which is non-intermittent and is prone to display a trend, a seasonal pattern or a com-bination of these characteristics. This type of time series is common in other fields aswell, and has therefore frequently been used for benchmarking purposes (Armstrongand Fildes, 2006; Crone et al., 2011; Petropoulos et al., 2014). We therefore take alook at the performance of techniques that model seasonality versus methodologiesthat do not have this ability, as season is a typical characteristic of sales time series.While trying out non-seasonal models might seems counterintuitive for this data, a lotof the more recently developed techniques do not have a seasonal component and stillseem to perform very well for many applications. This paper therefore also investigateswhether this type of model can perform well on these seasonal time series, given thenecessary pre-processing of the data. Furthermore, this high-level data also raises thequestion of the usefulness of incorporating external factors into the forecast. While thaddition of variables has obvious benefits, such as the explanatory value, it frequentlyleads to higher model maintainability costs. Thus, we also compare univariate tech-niques with and without the ability to add external drivers to one another in thispaper.Our contributions are twofold, as we aim to both benchmark a large set of forecastingtechniques and integrate a practical construct into the model building and evaluat-ing process, i.e. profit. Firstly, we propose a new strategy to inject a profit-orientedview into the entire forecasting process without explicitly forecasting profit itself. Inpractice, this constitutes a different way of performing feature selection, tuning hy-per parameters and evaluating the forecasting techniques with the goal of achieving2he models that yield the highest expected profit. The expected profit function thatis used for this purpose, is easy to understand and adaptable to any situation bycombining forecasting accuracy with business expertise (Van Calster, Baesens, andLemahieu, 2017). Furthermore, our methodology ensures a completely automated anddata-driven model building process. Secondly, we benchmark a large range of fore-casting techniques according to three different categorizations. As mentioned above,we contrast a range of complex techniques and traditional techniques, in order toassess whether the ML techniques are truly able to perform equally in regards to tac-tical sales forecasts. Secondly, we take the seasonal characteristics of sales time seriesinto consideration by distinguishing techniques that model seasonality themselves andmethods that require seasonal dummy variables to achieve the same goal. Finally, wecontrast techniques with and without variables, as we investigate the value of externalfactors in a high-level sales forecast. In terms of evaluating the techniques, we takeaccuracy, expected profit, model complexity and model interpretability into consider-ation in order to integrate the business aspect of forecasting into the benchmark. Inthe end, we aim to quantitatively select the techniques that forecast accurately, leadto the highest expected profit for any business case, and make the most sense froma business perspective. We will address these research questions by means of a totalof 35 monthly sales datasets. The datasets were collected from both The Coca-ColaCompany and from publicly available resources in order to add to the generalizabilityof the study.The paper is organized as follows. Section 2 deals with the related work that providesa necessary background to the research questions. Section 3 describes the datasets, theforecasting techniques and the general methodology of the experiments. Next, section4 focuses on the results of the research, while section 5 includes the conclusion.
2. Related work
This section on related work focuses on the necessary background literature for theresearch questions. We take a closer look at the forecasting literature on benchmarking,while also considering recent literature on profit-oriented analytics.
Forecasts are typically performed by three categories of techniques (Cang and Yu,2014): traditional time series analysis (Aboagye-Sarfo, Mai, Sanfilippo, Preen, Stew-art, and Fatovich, 2015; Akın, 2015; Arunraj and Ahrens, 2015; Athanasopoulos, Hyn-dman, Song, and Wu, 2011; Franses and Van Dijk, 2005; Gil-Alana, Cunado, andPerez de Gracia, 2008; Gunter and ¨Onder, 2015; Petropoulos et al., 2014; Ramos et al.,2015; Santos, Nogales, and Ruiz, 2012), causal regression techniques (Akın, 2015; Arun-raj and Ahrens, 2015; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Lessmann,Baesens, Seow, and Thomas, 2015; Ma, Fildes, and Huang, 2016; Nikolopoulos, Good-win, Patelis, and Assimakopoulos, 2007), and more complex artificial intelligence tech-niques (Akın, 2015; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Crone et al.,2011; Fagiani, Squartini, Gabrielli, Spinsante, and Piazza, 2015; Lessmann et al., 2015;Taylor, De Menezes, and McSharry, 2006). The emergence of new techniques often re-quires a comparison with former methods, which leads to an extensive literature onbenchmarking, both for individual use cases (Aboagye-Sarfo et al., 2015; Arunraj andAhrens, 2015; Bozos and Nikolopoulos, 2011; Gil-Alana et al., 2008; Gunter and ¨Onder,3015; Lessmann et al., 2015) and for larger sets of time series (Athanasopoulos et al.,2011; Cang and Yu, 2014; Crone et al., 2011; Franses and Van Dijk, 2005; Ma et al.,2016; Makridakis and Hibon, 2000; Petropoulos et al., 2014; Weller and Crone, 2012).This research consists of both field-specific (Aboagye-Sarfo et al., 2015; Akın, 2015;Athanasopoulos et al., 2011; Bozos and Nikolopoulos, 2011; Cang and Yu, 2014; Fa-giani et al., 2015; Lessmann et al., 2015; Ma et al., 2016; Weller and Crone, 2012) andindustry-neutral benchmarks, which are oriented towards general conclusions (Croneet al., 2011; Makridakis and Hibon, 2000; Petropoulos et al., 2014). While some stud-ies use a combination of generated data and industry data (Petropoulos et al., 2014),most use real-life datasets to answer their research questions (Bozos and Nikolopoulos,2011; Cang and Yu, 2014; Lessmann et al., 2015; Weller and Crone, 2012).In terms of the conclusions that have come out of the larger studies, some discrep-ancies arise. While several studies point out that the newer ML techniques do notperform as well as the more traditional methods for classical time series (Makridakisand Hibon, 2000), others claim that these complex techniques have caught up in recentyears (Crone et al., 2011). In this paper, we therefore take a look at a wider range oftechniques from all three categories that were mentioned above. Furthermore, we alsocontrast techniques with and without external factors, which adds another factor thathas not been part of many larger benchmarking studies, except for (Athanasopouloset al., 2011). Our paper combines these elements in an extensive benchmark that isbased on publicly available data and recent sales time series.
Profit-driven analytics has recently become a hot topic in analytics, as businessesare interested in the actual value that predictive models generate or the influencethat they have on their eventual net profits. Integrating this value-centric view intoanalytics, has led to a growing number of profit-driven methodologies, techniques andmetrics (Verbeke, Baesens, and Bravo, 2017). These profit functions can be used indifferent steps of the model building and model selection process. For example, profithas been used as an evaluation metric for benchmarks in different fields ( ´Oskarsd´ottir,Bravo, Verbeke, Sarraute, Baesens, and Vanthienen, 2017; Verbraken, Verbeke, andBaesens, 2013), while it has inspired entire profit-driven algorithms as well (Stripling,vanden Broucke, Antonio, Baesens, and Snoeck, 2015; Verbeke, Dejaeger, Martens,Hur, and Baesens, 2012). In this paper, we aim to integrate this profit-oriented viewinto multiple steps of the forecasting process instead of only using it as an additionalevaluation criterion.In forecasting, research on the profit aspect is scarcer than in other fields. While themonetary value of classification models has been extensively reviewed, the same cannotbe said for regression models. However, the impact of forecasting accuracy on netprofit is an interesting subject, as under- and over-forecasting both lead to completelydifferent costs. The former might lead to a loss in sales and out-of-stock products,while the latter can lead to overstock and storage costs. While both directions forthe error inevitably bring about a loss of profit, they are often not equal. Completelysymmetric profit loss functions that are solely based on accuracy measures are thereforenot representative of the real world. The ultimate goal of profit-oriented analytics isto find the model with the best balance between costs and accuracy. While thesetwo concepts are inevitably linked in a forecasting exercise, we cannot state thatthey are exactly the same. Therefore, profit-oriented benchmarking should take into4ccount both traditional accuracy metrics and metrics that point to the costs of theforecast, such as expected profit functions or model complexity. So far, two differentviews on the integration of profit into forecasting exercises have been proposed inrecent literature. The first perspective optimizes an asymmetric loss function duringmodel training to model the imbalance between over- and under-forecasting. (Crone,Lessmann, and Stahlbock, 2005) applies this methodology to neural networks, while(Yang, King, and Chan, 2002) take a closer look at support vector regression models.The second way of integrating profit into a forecasting function takes place after thetraining process. (Bansal, Sinha, and Zhao, 2008) propose a tuning procedure thatmodifies the predictions so they are cost-optimal. (Zhao, Sinha, and Bansal, 2011)further fine-tune this procedure. (Bozos and Nikolopoulos, 2011) also take a monetaryvalue into account when evaluating their forecasts, but do not modify the models inany way.In this paper, we take profit into account in all of the steps that are mentionedabove. We optimize the parameters of our models, select features when necessary andevaluate our forecasts based on an asymmetric expected profit function that can easilybe adjusted to any business case.
3. Methodology
This methodology section is divided into five parts. We begin by describing the datasetsand by explaining the profit function that was used for both optimization and evalua-tion purposes in this paper. Next, the general experimental set-up is introduced, whichalso includes the description of the feature selection procedure. The fourth subsectionis dedicated to an overview of the forecasting techniques, while the last subsectionfocuses on evaluation metrics.
The data sets in this paper stem from two sources. Firstly, The Coca-Cola Companyhas given us a total of 20 time series, which represent two of their product categoriesin ten different countries. These monthly time series all range from January 2004until September 2016. The external variables that correspond with these datasets,were collected by means of in-company data sources and are all based on informationabout the location of the data. Concretely, they consist of 20 variables that containinformation on weather, macro-economic indicators, holidays and pricing information.As weather information, 4 variables were included, such as temperature and precip-itation, while 9 variables allude to macro-economic information, such as GDP andCPI. Additionally, 3 factors refer to calendar effects of public holidays, while the final4 variables relate to both in-company and competitor pricing. An overview of thesevariables can be found in Table 1. These external factors were selected according todata availability, but also take into account the literature on the interesting types ofvariables for sales forecasting. Several types of information have proven to be usefulin this field, although this generally depends on the aggregation level of the time se-ries (Syntetos, Babai, Boylan, Kolassa, and Nikolopoulos, 2016) and the volatility ofthe time series (Currie and Rowley, 2010). Research has shown that factors such asweather (Bertrand, Brusset, and Fortin, 2015), macro-economic influences (Sagaert,Aghezzaf, Kourentzes, and Desmet, 2017) and pricing and promotional information(Huang, Fildes, and Soopramanien, 2014; Ma et al., 2016) all have an impact on sales.5 ariable name Explanation
Weather
Maximum temperature
Average daily maximum temperature weighted by population
Maximum temperature squared
Square of average daily maximum temperature weighted by population
Precipitation
Average daily precipitation volume
Sunshine hours
Average daily number of sunshine hours
Macro-economic indicators
Consumer Price Index
Seasonally adjusted percentage change of CPI with regards to the previous month
Unemployment rate
Percentage of unemployment for entire population
Exchange rate
Exchange rate with US dollar
Short-term interest rate
Short-term interest rate in percentage per annum
Industrial production
Seasonally adjusted percentage change of industrial production with regards to the previous month
Merchandise import
Seasonally adjusted percentage change of Merchandise import with regards to the previous month
Merchandise export
Seasonally adjusted percentage change of Merchandise export with regards to the previous month
Gross Domestic Product
Seasonally adjusted annual rate, percentage change of GDP with regards to the previous month
Private Consumption
Seasonally adjusted annual rate, percentage change of PC with regards to the previous month
Holidays
Public holiday
Number of public holidays per month
Weekend
Number of public holidays in the weekend per month (possibility of long weekend)
Tuesday/Thursday
Number of public holidays on Tuesday or Thursday per month (possibility of long weekend)
Pricing
Company price
Average product category price in US dollars
Company price deflated
Average product category price in US dollars deflated by CPI
Competitor price
Average product category price of the main competitor in US dollars
Competitor price deflated
Average product category price of the main competitor in US dollars deflated by CPI
Table 1.
Summary of external variables
Secondly, we include a total of 15 publicly available datasets with similar charac-teristics in the analyses, in order to increase the generalizability of our findings, whichcan mostly be found in The Time Series Data Library . The general features ofthese monthly time series are summarized in Table 2. As all of these datasets alsoinclude information on location, we collected twelve external variables that containinformation on weather, macro-economic indicators and holidays as well. Concretely,we include four weather variables, seven macro-economic indicators and one holidayvariable. The weather variables consist of the same features as defined in Table 1, whilethe macro-economic information includes all features in Table 1 except MerchandiseImport and Merchandise Export. Finally, the models with external factors also containthe number of public holidays for each month. Pricing information was not availablefor these datasets. The sources for these three categories are publicly available . The evaluation of any predictive model is generally focused on the accuracy that itachieves on a test set. In this paper, however, we take both accuracy and a morebusiness-oriented profit measure into account. The profit measure is represented byEquation 2, which is dependent on our definition of the Percentage Error (PE), whichcan be found in Equation 1. This profit measure was first defined in (Van Calsteret al., 2017) and represents an estimation of the expected profit of the target variable. https://datamarket.com/data/list/?q=provider:tsdl https://opendata.socrata.com/Business/Car-Sales-Data/da8m-smts https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_3.23/crucy.1506241137.v3.23/ https://data.oecd.org/ https://pypi.python.org/pypi/holidays ame Number ofproduct categories Range Number ofdata points LocationBeer Car sales 1
Car sales 2
Champagne
Paper
Petrol
Wine
Table 2.
Public data summary
The formula is very easy to interpret and can easily be adjusted to any business usecase. The two fundamental components of the expected profit measure are the volumeof the sales, as more sales lead to more profit, and the accuracy of the forecast, asbad forecasts inevitably lead to a loss of profit. Next to these two core elements, weintroduce several parameters that integrate expert knowledge into the profit function.
P E = Actuals i − F orecast i Actuals i ∗
100 (1)
P rof it = (cid:40) ((1 − ( α ∗ | P E | )) ∗ ( β cat ∗ V olume cat ) P E > γ or P E < δβ cat ∗ V olume cat otherwise (2)Firstly, the business user can influence the impact of the forecasting error on theexpected profit by setting two parameters. The first one deals with how the size of theerror is used as a penalization, as both over- and under-forecasting have proven to leadto various costs (Kahn, 2003). This penalization factor α can be modified according toa specific circumstance in a data-driven manner by executing a sensitivity analysis ona validation set. In this instance, is set at 1 . γ and δ , which indicate that any forecast that has a PE within these boundaries, does notlead to a significant impact on the final profit. Note that γ should always be larger than δ . For example, we set boundaries of 1% error in both directions for The Coca-ColaCompany use case ( γ = 1 and δ = − γ and δ parameters can alsobe set unequally, if the forecasting error has a larger impact on profit in one particulardirection, or even be completely omitted, if every inaccuracy while forecasting leadsto a loss of profit.Secondly, the β cat weight refers to the profit margin for the product or product cate-gory at hand. This weight can be expressed both relatively between different productsand in absolute numbers, such as currencies. For The Coca-Cola Company use case,these β weights were determined by the profit that the product actually generated inthe last year of the original training set. It is important to note that these weights7 ame β weights Beer
Car sales 1
Car sales 2
Champagne
Paper
Petrol
Wine
Table 3. β weights of public datasets remain constant throughout the analyses once they are set by the training set of thefirst prediction. The actual profit of a product will fluctuate over time and is drivenby many external factors that are not captured in the function. We have chosen tokeep this parameter constant because of two reasons: ease of use and availability ofprofit data. While the first reason is self-explanatory, the second one is tied to theparticular use case of this paper. If data about the actual profit of a product is morereadily available, this parameter can be used dynamically by updating it during thetesting process. The profit in the analyses of this paper can therefore be viewed asthe profit that the product will generate if business stays the same and must truly beinterpreted as the expected profit. The β weights for the publicly available datasetswere chosen randomly with values between 0 and 3, and are displayed in Table 3. The general experimental set-up consists of hold-out sample forecasts for all datasets.Concretely, the time series are split up into training, validation and test sets. The testset includes the final two years of the data, which leads to 24 data points to forecast.The validation set then consists of the year before the date that will be forecastand is only used for feature selection and parameter tuning when necessary for thegiven technique. Parameter tuning is performed once on the first validation set in thetesting procedure, in order to avoid computational issues in the testing procedure.However, the feature selection procedure is repeated every three months, in order tokeep the model up-to-date. Once the necessary variables and hyper parameters havebeen selected, the training and validation sets are merged together in order to forecastthe test set. Both the training and validation sets change with every forecast, as theset-up consists of an expanding window. In the end, we therefore collect 24 one-monthahead forecasts for each technique and for each dataset. The complete experimentalset-up is visualized in Figure 1.The feature selection procedure consists of a hybrid method, which is based onthe combination of Minimum Redundancy Maximum Relevance criterion (mRMR)that was created by (Peng, Long, and Ding, 2005) as a filtering technique, and asimple incremental wrapper method. The mRMR method is a mutual-informationbased algorithm that ranks the external factors according to their shared informationwith the target variable, while also taking into account their dependency on the otherexternal factors. This can be achieved by finding the feature set S with m features x i that maximizes the relevance with the target class c and minimizes the dependency8 igure 1. Experimental set-up between the independent variables. In short, this filter finds the features that maximizeEquation 3, which combines Equation 4 (Dependency) and Equation 5 (Redundancy). maxφ ( D, R ) , φ = D − R (3) D = 1 | S | (cid:88) ∀ x i ∈ S I ( x i , c ) (4) R = 1 | S | (cid:88) ∀ x i ,x j ∈ S I ( x i , x j ) (5)In this paper, the first feature step in this paper selects either the top 15 or the top10 ranking of features, for the Coca-Cola Company datasets and the public datasetsrespectively, and then passes this on to the next step. Next, a simple forward incre-mental wrapper method starts with the top feature of the ranking and forecasts thevalidation set. Consecutively, one feature is added at a time into the feature set untilthe entire top 15 or top 10 ranking is used in the forecasting model. This methodologytherefore takes advantage of the initial ranking that was made by the mRMR filter.The feature set that will be used to forecast the test set, is selected out of these 15or 10 options by maximizing the profit function, which is defined in Section 3.2. Thisentire procedure is explained in Algorithm 1. In our benchmark, k is either 15 or 10,depending on the dataset at hand, and m is equal to 12 months.9 lgorithm 1 Pseudo code for feature selection procedurechoose size of validation set m split time series into training set S tr and validation set S val,m choose initial number of features k rank features according to the mRMR Maximum Relevance criterion into ranking R k for i = 1 to k do select top i features from R k for j = 1 to m do train model with R i features on training set S tr forecast S val,j calculate profit P i,j add S val,j to training set S tr end for calculate profit P i by summing over all P i,m reset training set S tr and validation set S val,m to original split end for Select R i features with highest profit P i Feature selection is generally important because of two entirely different reasons.Firstly, some of the variables might be correlated or influenced by the same under-lying information, which can lead to less accurate forecasts (Boivin and Ng, 2006).A feature selection procedure is therefore used to determine which set of variableshas the highest predictive power, while also eliminating any possible multicollinearity.Secondly, feature selection is equally important from a business perspective, as trans-parent models also have an explanatory advantage. Business analysts are interestedin gaining knowledge on which external factors might influence their target variable,which can be useful for strategic decisions (Athanasopoulos et al., 2011). However,this knowledge then also relates to the maintenance of the model, as the variablesthat never survive the feature selection procedure during testing, are not needed anylonger.
In order to conduct the necessary experiments, a total of 17 forecasting techniqueswere selected, which are summarized in Table 4. These techniques are categorized ac-cording to three different types of attributes in order to answer our research questions.Firstly, we organize the methods according to the ability to use them as univariatewith and/or without external drivers. We define a univariate technique without vari-ables as a technique that only makes use of the sales times series itself to predictthe next month. Techniques that are able to include variables, however, also integratethe external drivers, such as the weather, to generate a prediction. 8 techniques canbe used in both ways, such as regression models, when past sales values are encodedas independent variables, next to the aforementioned external factors. We thereforebenchmark a total of 26 techniques in our final analysis. Secondly, Table 4 displays theability of a technique to explicitly model the seasonality of a time series, as season-ality is a typical characteristic of the sales time series that we are considering in thispaper. Thirdly, the forecasting techniques are classified into Machine Learning (ML)techniques and non-ML techniques. Recently, a lot of forecasting literature has focused10 odel Variables? Seasonal? ML? Hyper parameters Possible valuesHolt-Winters exponentialsmoothing
No Yes No / /
Seasonal ARIMA
No Yes No AR, MA, SAR andSMA terms [0, 5]
Seasonal decomposition byLoess model
No Yes No / /
Seasonal random walk
No Yes No / /
ARMA-GARCH
No No No AR and MA terms [0, 5]
Random walk
No No No / /
Seasonal ARIMAX
Yes Yes No AR, MA, SAR andSMA terms [0, 5]
Vector Autoregression
Yes No No AR term [0, 5]
Conditional InferenceRegression Tree
Both No Yes / /
Multiple Linear Regression
Both No No / /
Multivariate AdaptiveRegression Splines
Both No No Maximum degree ofinteraction [1, 2]
Recursive PartitioningRegression Tree
Both No Yes / /
K Nearest Neighbors Regression
Both No Yes Number of neighborsWeights for neighboringresponse values [2, 5]uniform, by distance
Long Short Term Memory RNN
Both No Yes Number of hidden neurons [1, 10]
Random Forests
Both No Yes / /
Simple Multilayer Perceptron
Both No Yes Number of hidden neurons [1, 10]
Support Vector Regression
Both No Yes KernelPenalty parameterof error termGamma (for rbf kernelonly) Radial basis function,linear1e0, 1e1, 1e2,1e3[1e-2, 1e2]
Table 4.
Overview of forecasting techniques on these ML techniques and often reports them to be more accurate than traditionaltechniques. In order to simplify the issue of what is considered an ML technique andwhat is not, we chose to consider methods ML if they belong to one of the four follow-ing categories: decision tree learning, neural networks, support vector machines andk-nearest neighbours, as this last category is based on a clustering algorithm. Thesethree categorizations will underpin the answer to which type of technique is best usedto achieve an accurate sales forecast. Finally, the table also contains the hyper parame-ters that were selected beforehand, and their possible values. Tuning hyper parametershas proven to be essential for truly assessing how well a certain technique can perform(Carrizosa, Mart´ın-Barrag´an, and Morales, 2014), and is therefore an essential partof benchmarking in general. In this paper, the parameter selection was conducted byevaluating model performance on the validation set and by applying an exhaustive gridsearch methodology. The evaluation metric that was optimized, is again the expectedprofit function that was defined in Section 3.2. Note that only the parameters that arementioned in 4 are set in this way.It is important to comment on the influence of the type of technique on the datapreprocessing aspect of the analyses. Firstly, we normalized all variables to a rangebetween 0 and 1 for all of the analyses in this paper. This step was especially nec-essary for techniques such as neural networks, as literature reports this as a generalpractice because they benefit greatly from this step (Sola and Sevilla, 1997). Further-more, business users can derive insights on the relative importance of variables if theforecasting technique is transparent, in order to identify the most important drivers11f their sales. Secondly, the time series that are part of the analyses all display a cer-tain trend and seasonality, which should be incorporated into the forecasting modelif possible. The time series analysis techniques that we consider in this paper, explic-itly include this seasonality in their model building by, for example, defining seasonalparameters. However, other types of techniques, such as regression models or neuralnetworks, do not have this ability, which can lead to worse forecasts if the trend andseason have a strong influence on the sales (Zhang and Qi, 2005). We therefore addtwo additional data preprocessing steps for this type of models: trend/seasonal dif-ferencing and seasonal dummy variables. In the first step, we check whether the timeseries actually contains either a trend or a season by means of appropriate unit roottests, such i.e. the Augmented Dickey-Fuller test (Dickey and Fuller, 1979) and theOsborn-Chui-Smith-Birchenhall test respectively (Osborn, Chui, Smith, and Birchen-hall, 1988). If the results thereof show signs of either characteristic, we apply thecorresponding differencing. Secondly, if the time series is seasonal, we also include aset of seasonal dummy variables to further model the possible seasonal effects. Thesevariables are not included in the feature selection procedure, but are always included ifthere is a seasonal component in the time series. Thirdly, when techniques can be usedboth with and without variables, past sales values need to be encoded as independentvariables. We therefore need to determine how many past values will be included intothe model. This hyper parameter is selected on the same validation set as the otherhyper parameters, and has possible values ranging from one month to seven months.Furthermore, we define methods with external factors as techniques that use both pastsales data and external parameters as independent variables. In this case, the numberof past months to use as input to the model, is therefore again a hyper parameter.Finally, we note that the list of forecasting techniques is not exhaustive. Two typesof methods are notably under-represented: ensemble methodologies and deep learningmethods. We opted to include only one technique of each category in order to keep thescope of the paper manageable, i.e. Random Forests and Long-Short Term MemoryNeural Networks respectively. However, the obvious next step of this research is totake a closer look at these types of methodologies.
Evaluation for forecasting benchmarks is often entirely based on accuracy metrics.There has been a lot of discussion in the past about which metric gives the bestoverview of performance when comparing techniques, as many commonly used mea-sures can exhibit strange behavior (Hyndman and Koehler, 2006; Kolassa, 2016; Tash-man, 2000). In this paper, we therefore propose a combination of frequently used ac-curacy metrics and the expected profit function that was defined above, to select thebest-performing models. In the first category, we take into account the Mean AbsolutePercentage Error (MAPE) and the Root Mean Squared Error (RMSE), as defined inEquations 6 and 7. Furthermore, we include the seasonal version of the Mean Ab-solute Scaled Error, which was first defined in (Hyndman and Koehler, 2006), basedon the seasonality of the time series data. The formula for this metric can be foundin Equation 8 with m as the seasonality of the time series. This metric compares atechnique’s performance to the in-sample error of a seasonal nave model, which makesit perfect for truly benchmarking techniques. Next to the expected profit function, wealso consider the computation time of each forecast as an approximate of the modelcomplexity. We therefore include a total of five quantitative performance metrics in12ur analysis. M AP E = 1 n n (cid:88) t =1 | Actual t − F orecast t Actual t | ∗
100 (6)
RM SE = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) t =1 ( Actual t − F orecast t ) (7) M ASE = T (cid:88) t =1 | Actual t − F orecast t | T − m T (cid:88) t = m +1 | Actual t − Actual t − m | (8)
4. Results
The result section of this paper will firstly take a look at the experimental results,which are based on forecasting the 35 datasets with 17 different forecasting techniques.Secondly, we will discuss the implications of these results, while we also comment onthe limitations of this study.
The results of the experiments are based on a total of 21840 forecasts, as we performed24 one-month-ahead forecasts on 35 time series with 26 different models. We only takeinto account the results for models that have completed both the parameter tuningand feature selection procedures that were explained in the methodology section ofthis paper, see Section 3. Other model set-ups were disregarded for the final analyses.In order to compare all of these models to one another, we apply two rankingtests for the 26 forecasting techniques according to five evaluation measures: MAPE,RMSE, MASE, Profit and computation time. Concretely, we rank all of the methodsfor each of the 840 unique forecasts and then display the average over these fore-casts. This methodology ensures a fairer comparison between the techniques than,e.g., simply taking an average MAPE of the 840 forecasts. Furthermore, we can verifyif the differences in rank are significantly separate from one another. The Friedmantest (Friedman, 1940) is a non-parametric statistical test that verifies whether thedifference between two treatments is significant or not. In this benchmark, the 26 fore-casting techniques constitute the ’treatments’, defined as k in Equation 9, while the35 time series datasets are the ’blocks’, N in Equation 9, which form groups of similarunits. The Friedman test will rank the treatments according to a given evaluation cri-terion and will compare this ranking for each block. Therefore, three different averagerankings are made for these experiments, according to the three evaluation measures.13 odel MAPE RMSE MASE Profit Time Without external factors
ARMA-GARCH (GARCH)
Conditional Inference Regression Tree (CtreeUni)
Holt-Winters exponential smoothing (HW)
K Nearest Neighbors Regression (KNNUni)
Long Short Term Memory RNN (LSTMUni)
Multiple Linear Regression (LRUni)
Multivariate Adaptive Regression Splines (MARSUni)
Random Forests (RFUni)
Random walk (RW)
Seasonal ARIMA (SARIMA) 10.45 (/) 10.47 (/) 10.48 (/) 10.70 (/)
Seasonal decomposition by Loess model (DM)
Seasonal random walk (SRW)
Simple Multilayer Perceptron (MLPUni)
Support Vector Regression (SVRUni)
With external factors
Conditional Inference Regression Tree (CtreeMulti)
K Nearest Neighbors Regression (KNNMulti)
Long Short Term Memory RNN (LSTMMulti)
Multiple Linear Regression (LRMulti)
Multivariate Adaptive Regression Splines (MARSMulti)
Random Forests (RFMulti)
Recursive Partitioning Regression Tree (RpartMulti)
Seasonal ARIMAX (SARIMAX)
Simple Multilayer Perceptron (MLPMulti)
Support Vector Regression (SVRMulti)
Vector Autoregression (VAR)
Friedman test
Chi-Squared
P-value < < < < < Table 5.
Overview of benchmarking results. Columns contain the forecasting techniques and their averageranks according to MAPE, RMSE, MASE, expected profit and computation time. The numbers betweenbrackets are the p-values from the pairwise Nemenyi test that compares the given method to the best techniqueaccording to the evaluation metric at hand. χ F = 12 Nk ( k + 1) [ (cid:88) j R j − k ( k + 1) CD = q α (cid:114) k ( k + 1)6 N (10)with q α as critical values, which consist of the Studentized range statistic divided by √ WLSTMUniLSTMMultiKNNMultiRpartMultiRFMultiMLPMultiRFUniSVRMultiVARRpartUniLRUniMARSUniMARSMultiKNNUniCtreeMultiCtreeUniLRMultiMLPUniSRWSVRUniGARCHDMSARIMAXHWSARIMA MAPE RMSE MASE Profit Time l l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l ll l l l l
Figure 2.
Rank comparison sales forecasts, which all display a trend and seasonality. In order to select the topperforming techniques, we take Table 5 and Figure 3 into consideration again, whichshow a clear top four models in terms of both accuracy and profit. The best-performingforecasting techniques are SARIMA, SARIMAX, Holt-Winters and the seasonal de-composition model (DM). The only exception to the seasonal, univariate and non-MLrule is SARIMAX, which also incorporates external drivers into the model.In Figures 3, we take a closer look at these best-performing models in terms of bothaccuracy and the expected profit. These figures contain the distributions of the pairwisedifferences of SARIMA, SARIMAX, Holt-Winters and the seasonal decompositionmodel (DM). Grey boxplots indicate a significant difference between the two modelsthat are mentioned on the Y-axis. DM seems to consistently perform worse than theother three models, while the remaining three time series models all perform equallyin terms of both MAPE and expected profit.The last performance metric that can still make a difference in the selection ofthe best-performing technique, is computation time. This measure is indicative of thecomplexity of the model, but can also have an effect on the final costs of the modelas computation efforts also lead to additional expenses. The average computationtime of the top four models is summarized in Table 6 below. Clearly, the trainingof Holt-Winters and DM take the least amount of time by far. However, the averagecomputation times of both SARIMA and SARIMAX are still below 10 seconds per16 igure 3.
Pairwise differences of four best-performing models
Model Average computation time (seconds)Holt-Winters
Seasonal decomposition model
SARIMA
SARIMAX
Table 6.
Average computation time for best-performing models forecast. Furthermore, it is logical that these last two techniques require more time totrain, given the feature and hyper parameter optimization according to profit for bothof them. In conclusion, a top three of equally performing time series models remains:Holt-Winters exponential smoothing, Seasonal ARIMA and Seasonal ARIMAX, butHolt-Winters will significantly save on computation time if there is a large number oftime series to forecast.Finally, we also take a closer look at the interpretability of these top three tech-niques. As time series models, they are all transparent methodologies that attributeweights to the autoregressive, trend and seasonal components of the time series. Addi-tionally, SARIMAX displays the weights of the added external factors, indicating theirimpact on the sales, which greatly adds to the explanatory power of the model. Thistherefore entails a large advantage for the SARIMAX technique in terms of businessvalue. On the other hand, the feature selection procedure leads to a higher computationtime and effort, so these two aspects need to be weighed against one another. In theend, the univariate time series models perform equally to SARIMAX, but additionalinformation on the external influences on the sales might be preferable in a businesscontext. Note that this refers to two completely different objectives, i.e. predictingversus explaining. Before the final selection of the best technique, businesses need to17learly outline the objective of a forecasting model. In terms of variable selection inthis paper, Figure 4 shows the average percentage of selected variables for each ofthe variable types, illustrated for each of the two data sources. From these charts, wecan conclude that weather and macro-economic variables are selected the most forall datasets. On average, 2 weather variables and 2.5 macro-economic variables wereselected for the Coca-Cola Company datasets, while 1.78 weather variables and 3.89macro-economic variables were chosen for the public datasets.
Figure 4.
Average percentage of selected variables
The second research question focused on the integration of the expected profit func-tion into the model selection process. We can clearly see from Table 5 and especiallyFigure 2 that the ranking of the techniques according to MAPE, RMSE and MASEare virtually the same, while the ranking according to the expected profit functionlooks a bit different. Although the top methods perform well according to all of theseevaluation measures, the changes in the ranking already indicate that it is valuableto compare models according to profit as well, as it might lead to a different rankingof the possible techniques. For example, the p-values in Table 5 of the DM techniqueare not significantly different from the top three time series models in terms of theaccuracy measures, but they are significantly different from them when we look at theexpected profit function. In Figure 5, we will look at some pairwise differences of othermodels according to MAPE and Profit as well. In this figure, we can clearly see thattechniques can significantly differ in terms of Profit and not in terms of MAPE, orvice versa. Specifically, we compare the univariate cases of Multiple Linear Regression(LRUni) and Support Vector Regression (SVRUni), and the variant with external fac-tors of Simple Multilayer Perceptron (MLPMulti). The pairwise differences betweenSVR and LR, and SVR and MLP show that there is a clear difference between thetwo evaluation measures. It is also important to note that these changes do not existin pairwise differences when we only compare the accuracy metrics.18 igure 5.
Domination plots of pairwise Nemenyi differences in p-values
The results of this study have three interesting implications for model selection in salesforecasting from a business perspective. Firstly, we proposed a profit-driven approachthat provides a completely automated framework for model building and selection.The expected profit function that we implement, is completely adaptable to any salesforecasting situation by combining business expertise with traditional accuracy-basedevaluation. Furthermore, this profit function can be used as an evaluation criterionthat gives a different view on which technique is truly the best one in a benchmarkingexercise. While the results in this paper are consistently in line with the accuracymeasures, the overall ranking according to profit is still significantly different thanthe accuracy-based ones. This indicates that a ranking according to profit might yielda different result in model selection. In this paper, however, the top three models’performance was consistenly very close to one another, while the same models out-performed others by a significant margin. In other cases, when model performancebetween techniques is closer to one another, the expected profit function can providean additional perspective into final model selection. Furthermore, this paper adds tothe scarce literature on the use of profit-driven analytics in forecasting and regressionanalysis.Secondly, we notice that univariate time series models that explicitly capture sea-sonality, perform the best in this benchmarking study, although the Seasonal ARIMAXmethod is an exception to the univariate characteristic. However, this technique onlyperforms equally to the aforementioned univariate methods and we can therefore raisethe question if the addition of external variables is truly useful in this context. Whileother studies have shown the value of adding external drivers into the models for salesforecasting on a strategic level (Sagaert et al., 2017), this research shows that we canforecast as well without any independent variables in the model. When we take intoaccount the additional cost of data collection and model maintenance, we concludethat forecasting the sales on a product category level is easier achieved by univariatemodels without compromising on accuracy or profit. Although we recognize the addedexplanatory value of integrating features, we question if it is worth the effort whenachieving the best forecast is the goal.Finally, we compared two categories of forecasting techniques to one another: sta-tistical methods and machine learning techniques. In the case of tactical sales forecast-19ng, we clearly see that simpler models outperform the others significantly for these35 datasets. This leads us to conclude that the more traditional models are actuallystill performing the best when tackling this type of time series problem. These findingsare in line with (Makridakis and Hibon, 2000), but contradict (Crone et al., 2011). Toconclude, seasonal time series models tend to outperform other techniques for a tac-tical sales forecast. From a business perspective, this conclusion is especially positive,as these models are easy to interpret and have a faster computation time.
5. Conclusion
In this paper, we introduced a new, completely automated and profit-oriented strategyto sales forecasting, which integrates an expected profit function into several steps ofmodel selection. This function can be implemented in any sales forecasting context byletting business experts and previous data set the profit margins for every product.Furthermore, our research has proven that simpler time series models tend to outper-form more complex techniques for 35 sales datasets. All of the applied ML techniquesachieve significantly worse results than the traditional models, both in accuracy andprofit. This implies that less complex techniques are still the best type of method tohandle tactical sales forecasting. Finally, we found that univariate time series mod-els that are able to explicitly model the seasonality of a time series, perform best.This indicates that the addition of external variables is unnecessary, especially whenwe consider the additional costs that are linked to maintaining models with externaldrivers.In terms of possible limitations of this study, we recognize some shortcomings inthis paper. Firstly, it is impossible to come up with an exhaustive list of forecastingtechniques. However, we have attempted to implement common methods from allthree categories of techniques that are frequently used for forecasting. Furthermore,this research consists of 35 monthly time series, which is significantly less than thelarger benchmarks and competitions in the field (Athanasopoulos et al., 2011; Croneet al., 2011; Makridakis and Hibon, 2000). However, this paper particularly focuses onone field, i.e. sales forecasting, and is one of the larger benchmarking studies in thisspecific area. Furthermore, we have added to the generalizability and reproducibilityof the study by including several publicly available datasets as well. Finally, we onlyimplemented individual forecasting methods without considering ensemble methods.This type of methodology has become extremely popular in forecasting (Lessmannet al., 2015) and it has been proven that this approach can significantly impact theaccuracy of forecasts. Potential future research therefore includes an expansion ofthis study in three aspects. Firstly, we can include more sales time series in order tofurther underpin our statements. Secondly, we can implement more techniques andinclude ensemble methods. Finally, this study can be further expanded to other fieldsthan sales forecasting as well. However, given the range of techniques and the numberof datasets that were already used in this paper, we can state that simple, seasonaltime series models are still the best choice for a high-level tactical sales forecast.
Acknowledgements
We would like to acknowledge The Coca-Cola Company for funding this research andproviding us with the necessary business expertise and data to conduct our experi-20ents.
BibliographyReferences
Aboagye-Sarfo, P., Mai, Q., Sanfilippo, F. M., Preen, D. B., Stewart, L. M., Fatovich, D. M.,2015. A comparison of multivariate and univariate time series approaches to modellingand forecasting emergency department demand in western australia. Journal of biomedicalinformatics 57, 62–73.Akın, M., 2015. A novel approach to model selection in tourism demand modeling. TourismManagement 48, 64–72.Armstrong, J. S., 2006. Findings from evidence-based forecasting: Methods for reducing fore-cast error. International Journal of Forecasting 22 (3), 583–598.Armstrong, J. S., Fildes, R., 2006. Making progress in forecasting. International Journal ofForecasting 22 (3), 433–441.Arunraj, N. S., Ahrens, D., 2015. A hybrid seasonal autoregressive integrated moving averageand quantile regression for daily food sales forecasting. International Journal of ProductionEconomics 170, 321–335.Athanasopoulos, G., Hyndman, R. J., Song, H., Wu, D. C., 2011. The tourism forecastingcompetition. International Journal of Forecasting 27 (3), 822–844.Bansal, G., Sinha, A. P., Zhao, H., 2008. Tuning data mining methods for cost-sensitive regres-sion: a study in loan charge-off forecasting. Journal of Management Information Systems25 (3), 315–336.Bertrand, J.-L., Brusset, X., Fortin, M., 2015. Assessing and hedging the cost of unseasonalweather: Case of the apparel sector. European Journal of Operational Research 244 (1),261–276.Boivin, J., Ng, S., 2006. Are more data always better for factor analysis? Journal of Econo-metrics 132 (1), 169–194.Bozos, K., Nikolopoulos, K., 2011. Forecasting the value effect of seasoned equity offeringannouncements. European Journal of Operational Research 214 (2), 418–427.Cang, S., Yu, H., 2014. A combination selection algorithm on forecasting. European Journalof Operational Research 234 (1), 127–139.Carrizosa, E., Mart´ın-Barrag´an, B., Morales, D. R., 2014. A nested heuristic for parametertuning in support vector machines. Computers & Operations Research 43, 328–334.Crone, S. F., Hibon, M., Nikolopoulos, K., 2011. Advances in forecasting with neural networks?empirical evidence from the nn3 competition on time series prediction. International Journalof Forecasting 27 (3), 635–660.Crone, S. F., Lessmann, S., Stahlbock, R., 2005. Utility based data mining for time seriesanalysis: cost-sensitive learning for neural network predictors. In: Proceedings of the 1stinternational workshop on Utility-based data mining. ACM, pp. 59–68.Currie, C. S., Rowley, I. T., 2010. Consumer behaviour and sales forecast accuracy: What’sgoing on and how should revenue managers respond? Journal of Revenue and Pricing Man-agement 9 (4), 374–376.Dickey, D. A., Fuller, W. A., 1979. Distribution of the estimators for autoregressive time serieswith a unit root. Journal of the American statistical association 74 (366a), 427–431.Fagiani, M., Squartini, S., Gabrielli, L., Spinsante, S., Piazza, F., 2015. A review of datasets andload forecasting techniques for smart natural gas and water grids: Analysis and experiments.Neurocomputing 170, 448–465.Fildes, R., 2006. The forecasting journals and their contribution to forecasting research: Cita-tion analysis and expert opinion. International Journal of forecasting 22 (3), 415–432.Franses, P. H., Van Dijk, D., 2005. The forecasting performance of various models for seasonal- ty and nonlinearity for quarterly industrial production. International Journal of Forecasting21 (1), 87–102.Friedman, M., 1940. A comparison of alternative tests of significance for the problem of mrankings. The Annals of Mathematical Statistics 11 (1), 86–92.Gil-Alana, L. A., Cunado, J., Perez de Gracia, F., 2008. Tourism in the canary islands: fore-casting using several seasonal time series models. Journal of Forecasting 27 (7), 621–636.Gunter, U., ¨Onder, I., 2015. Forecasting international city tourism demand for paris: Accuracyof uni-and multivariate models employing monthly data. Tourism Management 46, 123–135.Huang, T., Fildes, R., Soopramanien, D., 2014. The value of competitive information in fore-casting fmcg retail product sales and the variable selection problem. European Journal ofOperational Research 237 (2), 738–748.Hyndman, R. J., Koehler, A. B., 2006. Another look at measures of forecast accuracy. Inter-national journal of forecasting 22 (4), 679–688.Kahn, K. B., 2003. How to measure the impact of a forecast error on an enterprise? TheJournal of Business Forecasting 22 (1), 21.Kolassa, S., 2016. Evaluating predictive count data distributions in retail sales forecasting.International Journal of Forecasting 32 (3), 788–803.Lessmann, S., Baesens, B., Seow, H.-V., Thomas, L. C., 2015. Benchmarking state-of-the-artclassification algorithms for credit scoring: An update of research. European Journal ofOperational Research 247 (1), 124–136.Lessmann, S., Voß, S., 2017. Car resale price forecasting: The impact of regression method,private information, and heterogeneity on forecast accuracy. International Journal of Fore-casting 33 (4), 864–877.Ma, S., Fildes, R., Huang, T., 2016. Demand forecasting with high dimensional data: Thecase of sku retail sales forecasting with intra-and inter-category promotional information.European Journal of Operational Research 249 (1), 245–257.Makridakis, S., Hibon, M., 2000. The m3-competition: results, conclusions and implications.International journal of forecasting 16 (4), 451–476.Nemenyi, P., 1962. Distribution-free multiple comparisons. In: Biometrics. Vol. 18. INTERNA-TIONAL BIOMETRIC SOC 1441 I ST, NW, SUITE 700, WASHINGTON, DC 20005-2210,p. 263.Nikolopoulos, K., Goodwin, P., Patelis, A., Assimakopoulos, V., 2007. Forecasting with cueinformation: A comparison of multiple regression with alternative forecasting approaches.European Journal of Operational Research 180 (1), 354–368.Osborn, D. R., Chui, A. P., Smith, J. P., Birchenhall, C. R., 1988. Seasonality and the order ofintegration for consumption. Oxford Bulletin of Economics and Statistics 50 (4), 361–377.´Oskarsd´ottir, M., Bravo, C., Verbeke, W., Sarraute, C., Baesens, B., Vanthienen, J., 2017. So-cial network analytics for churn prediction in telco: Model building, evaluation and networkarchitecture. Expert Systems with Applications 85, 204–220.Peng, H., Long, F., Ding, C., 2005. Feature selection based on mutual information criteriaof max-dependency, max-relevance, and min-redundancy. IEEE Transactions on patternanalysis and machine intelligence 27 (8), 1226–1238.Petropoulos, F., Makridakis, S., Assimakopoulos, V., Nikolopoulos, K., 2014. horses for coursesin demand forecasting. European Journal of Operational Research 237 (1), 152–163.Ramos, P., Santos, N., Rebelo, R., 2015. Performance of state space and arima models forconsumer retail sales forecasting. Robotics and computer-integrated manufacturing 34, 151–163.Sagaert, Y. R., Aghezzaf, E.-H., Kourentzes, N., Desmet, B., 2017. Tactical sales forecast-ing using a very large set of macroeconomic indicators. European Journal of OperationalResearch.Santos, A. A., Nogales, F. J., Ruiz, E., 2012. Comparing univariate and multivariate modelsto forecast portfolio value-at-risk. Journal of financial econometrics 11 (2), 400–441.Sola, J., Sevilla, J., 1997. Importance of input data normalization for the application of neuralnetworks to complex industrial problems. IEEE Transactions on Nuclear Science 44 (3),464–1468.Stripling, E., vanden Broucke, S., Antonio, K., Baesens, B., Snoeck, M., 2015. Profit max-imizing logistic regression modeling for customer churn prediction. In: Data Science andAdvanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on. IEEE,pp. 1–10.Syntetos, A. A., Babai, Z., Boylan, J. E., Kolassa, S., Nikolopoulos, K., 2016. Supply chainforecasting: Theory, practice, their gap and the future. European Journal of OperationalResearch 252 (1), 1–26.Tashman, L. J., 2000. Out-of-sample tests of forecasting accuracy: an analysis and review.International journal of forecasting 16 (4), 437–450.Taylor, J. W., De Menezes, L. M., McSharry, P. E., 2006. A comparison of univariate methodsfor forecasting electricity demand up to a day ahead. International Journal of Forecasting22 (1), 1–16.Van Calster, T., Baesens, B., Lemahieu, W., 2017. Profarima: A profit-driven order identifi-cation algorithm for arima models in sales forecasting. Applied Soft Computing.Verbeke, W., Baesens, B., Bravo, C., 2017. Profit Driven Business Analytics: A Practitioner’sGuide to Transforming Big Data into Added Value. John Wiley & Sons.Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B., 2012. New insights into churnprediction in the telecommunication sector: A profit driven data mining approach. EuropeanJournal of Operational Research 218 (1), 211–229.Verbraken, T., Verbeke, W., Baesens, B., 2013. A novel profit maximizing metric for measur-ing classification performance of customer churn prediction models. IEEE Transactions onKnowledge and Data Engineering 25 (5), 961–973.Weller, M., Crone, S., 2012. Supply chain forecasting: Best practices & benchmarking study.Technical Paper. Lancaster Centre for Forecasting, 1–42.Yang, H., King, I., Chan, L., 2002. Non-fixed and asymmetrical margin approach to stockmarket prediction using support vector regression. In: Neural Information Processing, 2002.ICONIP’02. Proceedings of the 9th International Conference on. Vol. 3. IEEE, pp. 1398–1402.Zhang, G. P., Qi, M., 2005. Neural network forecasting for seasonal and trend time series.European journal of operational research 160 (2), 501–514.Zhao, H., Sinha, A. P., Bansal, G., 2011. An extended tuning method for cost-sensitive regres-sion and forecasting. Decision Support Systems 51 (3), 372–383.