[PDF] An industry case of large-scale demand forecasting of hierarchical components

Abstract

Demand forecasting of hierarchical components is essential in manufacturing. However, its discussion in the machine-learning literature has been limited, and judgemental forecasts remain pervasive in the industry. Demand planners require easy-to-understand tools capable of delivering state-of-the-art results. This work presents an industry case of demand forecasting at one of the largest manufacturers of electronics in the world. It seeks to support practitioners with five contributions: (1) A benchmark of fourteen demand forecast methods applied to a relevant data set, (2) A data transformation technique yielding comparable results with state of the art, (3) An alternative to ARIMA based on matrix factorization, (4) A model selection technique based on topological data analysis for time series and (5) A novel data set. Organizations seeking to up-skill existing personnel and increase forecast accuracy will find value in this work.

Full PDF

AAn industry case of large-scale demand forecastingof hierarchical components

Rodrigo Rivera-Castro ∗ , Ivan Nazarov ∗ , Yuke Xiang † , Ivan Maksimov ∗ , Aleksandr Pletnev ∗ Evgeny Burnaev ∗∗ Skolkovo Institute of Science and [email protected] † Huawei Noah’s Ark Lab

Abstract —Demand forecasting of hierarchical componentsis essential in manufacturing. However, its discussion in themachine-learning literature has been limited, and judgementalforecasts remain pervasive in the industry. Demand plannersrequire easy-to-understand tools capable of delivering state-of-the-art results. This work presents an industry case of demandforecasting at one of the largest manufacturers of electron-ics in the world. It seeks to support practitioners with ﬁvecontributions: (1) A benchmark of fourteen demand forecastmethods applied to a relevant data set, (2) A data transformationtechnique yielding comparable results with state of the art, (3)An alternative to ARIMA based on matrix factorization, (4) Amodel selection technique based on topological data analysis fortime series and (5) A novel data set. Organizations seeking toup-skill existing personnel and increase forecast accuracy willﬁnd value in this work.

Index Terms —Demand forecasting, machine learning, electron-ics manufacturing, hierarchical structures

I. O

RIGINALITY AND V ALUE

This research presents a demand forecasting system ofelectronic components in manufacturing validated with realdata. The contributions cover the areas of pre-processing,prediction and model selection and are suited for individualswith domain knowledge but limited understanding of machinelearning methods. They are the following:1) An industry case of demand prediction for a largemanufacturer of electronics,2) An evaluation of 14 different models for demand pre-diction of items with hierarchical dependencies,3) An implementation of a method for demand forecastingbased on matrix factorization,4) A feature engineering technique that is both easy to im-plement and yields similar results to those obtained fromusing feature engineering requiring domain knowledge,5) A methodology for model selection based on topologicaldata analysis suited for large data sets in an industry-setting,6) For reproducibility purposes, an implementation, anddata set available for download . Sections I, II, III, IV were supported by the Ministry of Education andScience of the Russian Federation (Grant no. 14.756.31.0001). Other sectionswere supported by the Mexican National Council for Science and Technology(CONACYT), 2018-000009-01EXTF-00154. The authors would like to thankHuawei Noah’s Ark Lab for the advice and support. https://github.com/rodrigorivera/icmla2019 II. P

ROBLEM S TATEMENT

One of the world’s largest manufacturers of electronics hasto forecast demand for both its products and their respectiveindividual components, amounting to millions of time seriesdata to predict. Traditional forecasting techniques are herelargely ineffective. Nevertheless, the manufacturer has to gen-erate reliable estimates for its future demand over multipleperiods. III. R

ESEARCH A BSTRACT AND G OALS

At the moment, there are more than $12 trillion USD ininventory either stockpiled or in transit, amounting to 17% ofthe world’s Gross Domestic Product (GDP), [1]. An accuratedemand forecasting is essential in the industry. Nevertheless,imprecise demand planning is still pervasive. For new prod-ucts, forecast errors are, on average 44-53%, whereas, forimproved products, it is 31%, [2], [3]. Companies compensatefor this inaccuracy gap through expensive operational mea-sures such as trans-shipments [4]. Nevertheless, retailers stillexperience out-of-stock (OOS) events with rates amounting to8.3% worldwide, [5].

Fig. 1. Overview of the implemented methodology

The objective of this research is to present three techniquesfor (1) data pre-processing, (2) prediction, and (3) modelselection accessible to non-technical business experts andoffering competitive results. They represent a cohesive systemdepicted in Figure 1. The use of novel machine learningmethods for this ﬁeld is a promising area with little academicresearch and with insufﬁcient efforts to expose practitioners tothem, [6]. It is relevant to have robust methods accessible to a r X i v : . [ s t a t . A P ] S e p roader audiences, [7]. [8] observed that for discrepancies aslow as 2%, it is worth investing in improving the accuracy ofa forecast. [9] goes as far as claiming that a 10% reduction inOOS increases revenue of retailers by up to 0.5%. Neverthe-less, companies struggle to hire adequate personnel to addressthese tasks, [10]. [11] reported that over 60% of surveyedbusinesses are resorting to internal training to compensate forthis. This work seeks to alleviate this situation by presentingan extensive comparison of methods, proposing a featureengineering technique well-suited for demand forecasting inmanufacturing, evaluating a novel method based on matrixfactorization, and proposing a technique for model selectionthat is both accurate as well as easy to communicate todecision-makers. The research goal of this work is to proposea set of approaches for time series forecast that can beadopted by business practitioners. For this purpose, the studyposes the questions: 1) What is state of the art in academicresearch of time series prediction with structures? 2) Howdoes the proposed method differs from popular approachesapplied to time series prediction tasks? Two objectives achievethe research goal: a) To review the existing theory on timeseries prediction and especially on techniques for dynamichierarchical structures; b) To make a performance comparisonof the proposed technique. The object of research is thebalance between accessibility and precision of methods fortime series in a massive data context within the industry. Thesubject of the research is forecasting product demand usingtechniques for time series with hierarchies.IV. L ITERATURE R EVIEW

Supply chain management (SCM) in general and demandforecasting, in particular, are ﬁelds that have commandedattention from different communities according to [12]. Acomprehensive treatment is available in the works of [7]and [13]. Sales forecasting is an essential part of the supplychain management. The forecasting community uses and trainsquantitative methods of the statistical family of ARIMA,exponential smoothing models, and alike with historical datato forecast future points to improve the forecasting accuracy.However, [14] argues that there have been few large scalecomparative studies of machine learning models for regressionor time series aimed at forecasting problems. In the retailand manufacturing sectors, authors such as [15] paid attentionto the demand forecasting of edibles. Similarly, [16] dealswith time series characterized by a high volatility skewnessto forecast daily sales for a supermarket chain at the pointof sale. [17] experimented with recurrent neural networks forshort term forecasting of real-valued time series. While [18]explored demand forecasting with incomplete information. Forthe electronics manufacturing industry, [19] introduced SVMregression to the supply chain of various producers. AlthoughSVM regression is a popular method for forecasting, noteveryone has identiﬁed it as the most effective method. Forexample, [20] presented a MARS model, and [21] proposeda Bayesian model. Other manufacturing-centric sectors suchas fashion have also delved into demand forecasting but to a different extent. [22] claims that pure statistical methods arenot yet commonplace in the fashion industry. It is preferredto make use of judgmental forecasts or a combination ofquantitative and qualitative forecasts.V. D

ATASET

The data consists of a data set of observations from anelectronics manufacturer representing a subset of their totalinventory. It contains the demand for 2562 different itemswith a length of n = 45 . These items have varying amountsof required quantities, with many of them being requestedsporadically, as seen in Figure 2, and few of them beingrequested in large quantities. Fig. 2. Number of NaNs (zero orders) per item. X-axis: Item’s ID, Y-axis:Number of zeroes

VI. D

IAGONAL FEEDING

One of the contributions of this work is to introduce thepractitioner to a data transformation technique, useful formulti-step structured forecasting from anticipatory data. It ispart of the ﬁrst step, ’Preprocess,’ of the system introducedin Figure 1. The main beneﬁt of Diagonal Feeding is thatit helps utilize the anticipatory nature of pre-orders’ time-series data and makes forecasting the pre-order structure morestreamlined. It is made possible due to the data set containinginformation not only about the current demand but also onthe volumes of pre-orders made in advance. Advance pre-orders are expectation-driven, naturally forward-looking, andknown beforehand, as they reﬂect planning and anticipationof the market and economic environment at the end of theperiod when the order is to be fulﬁlled. At the same time,forecasting the pre-order structure over several next periods isof signiﬁcant practical interest. It is reasonable to leverage theanticipatory information of the advance pre-orders, known bythe present, for predicting the pre-order structure in the futureby also taking into account the cross-correlations betweenthe pre-orders. Let q ht be the volume of some item in the“quantity” ﬁeld in the data set, t corresponds to the “deliverydate”, and h be the “periods before delivery date”. The value q ht denotes the total amount requested via h period advancepre-orders to be delivered by the end of period t . The keyproperty of the data set is that for every item, the value q ht iseffectively-known and available for use by the end of period t − h – the period when the h -ahead pre-orders were made. Forexample, q t − is known at the end of t − . It corresponds tothe quantity requested at the end of t − . That is the case dueto the accumulation of pre-orders made by the end of t − .ince q ht reﬂects expectations about the market conditions at t and is known h periods in advance, it seems reasonableto reorder the data set with respect to the period when theybecome known and reshape it to keep the pre-order structure.This makes predicting q ht with q fs data for t − h > s − f , whichis either past ( q ht − ) or anticipatory ( q h +1 t ), more streamlined.The proposed reshaping of the multivariate time series of aparticular “item” is illustrated below. Since the quantity q ht isknown at time t − h , each diagonal ( q ht + s + h ) h ≥ in the schemeabove is known at t + s , s ∈ Z ; thus, potentially up to inﬁniteperiods:  q t +0 q t +0 q t +0 q t +1 q t +1 q t +1 q t +2 q t +2 q t +2 q t +3 q t +3 q t +3  →  x t x t x t y t x t x t y t y t x t y t y t y t  . (1) In Equation 1, the target y t is the output and represents thepre-order structure for the next periods beginning with t + 1 .The objective is to predict the lower diagonal of the matrix. Itis done using x t and its history as an input, i.e., past pre-orderstructure. Although, in principle, predicting the structure in y t allows planning production volumes several months ahead, themost relevant targets for practical demand forecasting are onthe largest diagonal of y t , since they are the earliest futurevolumes. VII. M ATRIX F ACTORIZATION

Matrix Factorization (MF) methods are used in a variety ofapplications such as recommender systems, signal processing,[23], computer vision, [24], and others. The second contribu-tion of this work is adapting a method discussed in [25] and[26] to demand forecasting in manufacturing. Let Y be T × n sparse or dense matrix of observations of n objects spanningthe period of T time steps, i.e. each column i = 1 , . . . , n of Y is a times series y ( i ) = ( Y ti ) Tt =1 related to the i -thobject. The problem of factorizing a fully or partially observed T × n matrix Y consists of ﬁnding d -dimensional factors Z and the corresponding factor loadings F . It must be in theform of T × d and d × n matrices respectively. As such,their product ZF most accurately recovers the observed Y ,i.e. Y ti ≈ (cid:80) dj =1 Z tj F ji . This is usually achieved by solvingthe following optimization problem:minimize F,Z | Ω | (cid:107)P Ω ( Y − ZF ) (cid:107) + λ F R F ( F ) + λ Z R Z ( Z ) , (2)where Ω ⊂ { ..T } × { ..n } is the sparsity pattern of Y , P Ω zeroes out unobserved entries. The coefﬁcients λ F and λ Z are non-negative regularization coefﬁcients that govern thetrade-off between the reconstruction error and the regularizingterms R F and R Z . The latter depends on the particular desiredproperties of the factorization, typically in conjunction with aRidge regression-type penalty ( (cid:96) norm).VIII. M ODELS

The third contribution of this work is a large-scale studyof various methods for demand forecasting. In the systempresented in Figure 1, they belong to the parts ’Training’ and’Testing.’ In total, the assessment consists of fourteen different methods. They are 1) Adaboost, 2) ARIMAX, 3) ARIMA,4) Bayesian Structural Time Series (BSTS), 5) BayesianStructural Time Series with a Bayesian Classiﬁer (BSTSClassiﬁer), 6) Ensemble of Gradient Boosting (Ensemble), 7)Ridge regression (Ridge), 8) Kernel regression (Kernel), 9)Lasso, 10) Matrix Factorization from section VII (MF), 11)Neural Network (NN), 12) Poisson regression (Poisson), 13)Random Forest (RF), 14) Support Vector Regression (SVR).Each of them had as a target value three different options: a)Quantity (non-transformed), b) Log-transformed quantity, c)Min-Max transformed quantity. Additionally, Diagonal Feed-ing, presented in section VI, was evaluated for regressionmethods. Thus, one evaluates three settings: a) No DiagonalFeeding, b) Diagonal Feeding with an item by item training(One by One). In this case, a vector containing the input ofa speciﬁc item is fed individually to a model, c) DiagonalFeeding ﬁtting the model on the full data set (All Items).Here, one uses a matrix with the input from all items. In allthree cases, one obtains an individual vector corresponding toa given item as an output. For a), extensive feature engineeringis necessary, and the outcome was 360 features. The speciﬁcfeatures are documented in the code base provided . Thetraining set consisted of 37 periods, and the test set of 8. TheSymmetric Mean Absolute Percent Error (SMAPE) serves toevaluate the performance of the models, and one deﬁnes it asSMAPE = n (cid:80) nt =1 | F t − A t || A t | + | F t | with F t being the forecastedvalue and A t the actual value at time t respectively. Onecan see the results of the experiment in Table I. The tablecontains both the median and average SMAPE for all models,an average for models ﬁt without Diagonal Feeding (DF), anda second average in the case where it was used. Further, theperformance across models was uneven. The top 5 of modelsthat achieved the lowest SMAPE for a given item were 1)Adaboost with 222 items, 2) Ensemble of Random Forestswith 45, 3) BSTS with 42, 4) BSTS Classiﬁer with 32 and 5)ARIMAX with 21 respectively.IX. TDA FOR M ODEL S ELECTION

In the system presented in Figure 1, model selection isdone with a method based on Topological Data Analysis. Itrepresents the fourth contribution of this study. TDA is a newﬁeld that emerged from a combination of various statistical,computational, and topological methods during the ﬁrst decadeof the century. It allows us to ﬁnd shape-like structuresin the data and has proven to be a powerful exploratoryapproach for noisy and multi-dimensional data sets. For adetailed introduction, the reader is invited to consult [27]. Twomotivations lie behind this approach. First, in a production-setting with millions of time series to forecast, it is necessaryin advance to decide on the appropriate model for a particularitem in order to minimize computing costs and efforts. Thereare many periods with zero orders and peaks in demand.Second, SMAPE as the sole metric for decision-making canbe misleading, especially if it is evaluated exclusively on the https://github.com/rodrigorivera/icmla2019ABLE IO VERVIEW OF RESULTS USING MEAN

SMAPE. L

OW VALUES AREBETTER . 1:1: O

NE BY O NE . AI: A LL I TEMS . MM: M IN -M AX . LT:L OG -T RANSFORM . DF: D

IAGONAL F EEDING

Model SMAPE Model SMAPEAdaboost 0,17 Ridge 1:1 MM DF 0,42Ensemble 0,18 Adaboost 1:1 LT DF 0,43ARIMA 0,27 Kernel AI LT DF 0,43Ridge 0,3 Ridge AI MM DF 0,43SVR 0,3 Kernel 1:1 DF 0,44ARIMAX 0,32 Adaboost 1:1 DF 0,47RF 1:1 LT DF 0,34 Adaboost 1:1 MM DF 0,47Poisson AI LT DF 0,36 Kernel 1:1 LT DF 0,47Lasso AI DF 0,37 NN AI MM DF 0,47Poisson 1:1 DF 0,37 MF 0,5Poisson 1:1 LT DF 0,37 NN 1:1 MM DF 0,52Poisson 1:1 MM DF 0,37

AVERAGE ALL

AVERAGE DF

MEDIAN ALL

AVERAGE NO DF training set. For example in Figure 3, the best forecast usingARIMA is depicted. A relatively low SMAPE of 0,20 wasobtained. Nevertheless, the model is only predicting the valueat time t + 1 using the value from time t .This research proposes a pipeline consisting of 8 steps toselect a model. (A) For a subset of time series, in this case,200, all possible models are ﬁtted. For this experiment, oneused only ﬁve models, see Figure 4. (B) On the test datasetand for the same items, one calculates SMAPE for each model. Fig. 3. Top forecast using ARIMA. X-axis: Period. Y-axis: Quantity. Bluecolor: Actual quantity. Orange color: Predicted quantity. SMAPE: 0,20 (C) For each time series, the best model is chosen based onSMAPE. The best model becomes a target. (D) One computesrelevant features describing each time series, see [28]. (F) Agraph is constructed using the Mapper algorithm, see [27].The Canberra distance, see [29], is used as a distance metricand the ﬁrst principal component obtained from the Mapperalgorithm as a lens. (G) A graph partitioning algorithm, see[30], is run recursively until reaching the lowest limit ofdata points per cluster. (H) One chooses the most frequentlyobserved target (model) for all models within a cluster. (I) Fora new time series, one can select the best model by runningthe K-nearest neighbors algorithm on the features obtained inpoint (D). For this experiment, one chooses seven features.Based on the described pipeline, one obtains two clusters ofnodes from the graph: a) AdaBoost, BSTS, BSTS classiﬁer for74% of the time series, b) Poisson regression, and RandomForest for 26% respectively. They are depicted in Figure 4.Using cross-validation for model selection, for 71% of the timeseries, AdaBoost, BSTS, BSTS Classiﬁer were the best choice.Hence, using only one graph, partitioning yields a small modelselection error (6%).X. D

ISCUSSION & L

EARNINGS a) On Diagonal Feeding:

The critical insight from theanalysis of the data set through Diagonal Feeding is thatthe currently known one-period mostly determines the nextperiod’s gross total demand volume q t +1 ahead pre-orders forthe period ( q t +1 ). The net-next period’s volume, δ t +1 , is thedifference between q t +1 and q t +1 . Viewed through DiagonalFeeding, it is mostly independent of the history of net pre-orders for the period t + 1 and is thus less predictable fromadvance pre-order data, as indicated by the correlation analysisand the results of a grid search experiment. The apparentsuccess of forecasting the q t +1 , especially in contrast to theother next period’s pre-order volumes q jt +1+ j for j ∈ { , , } ,might be attributed to an observed high correlation of the Fig. 4. TDA pipeline for 5 models and 7 features with Canberra distance.Colors: Blue (BSTS), Orange (BTSTS classiﬁer), Yellow (Poisson), Green(RF), Grey (Adaboost). ne-period ahead pre-order volume q t +1 . Further, DiagonalFeeding delivers results comparable to those obtained doingextensive feature engineering. Along these lines, exploringdifferent transformations of the target value is essential. Forexample, a Neural Network without a transformed quantityﬁtted on the full data set had a SMAPE of 1,13, with a logtransformation, it was 0,38. b) On Matrix Factorization: The contribution of thiswork concerning [26] is an implementation of MF withtemporal regularization solving explicitly the following opti-mization problem (extended with graph similarity regularizer).The major advantage is that in the high dimensional objectmode T (cid:28) n , it has fewer parameters ( T k + kn + kp ) toestimate than p -th order vector autoregression ( pn ), whileretaining the power to capture the correlations among thetime series in Y , [26]. Nevertheless, criticism is twofold.First, the method is wasteful. Its most precise forecasts areone-step-ahead, since it relies on the “dynamic” forecast-ing method: the factor forecasts are computed based onthe prior forecasts ˆ Z T + h | T, j = (cid:80) pi =1 φ ji ˆ Z T + h − i | T, j with ˆ Z T + h − i | T, j = Z T + h − i, j for i ≥ h . One attributes thisdeterioration of forecast accuracy to the accumulating forecasterror inherent to this method. The secondary reason is thatthe (cid:96) and AR( p ) regularizers jointly force stationary factortime series ( Z t, j ) Tt =1 , with the characteristic roots lying withinthe C unit disk. Therefore, the dynamic forecast, althoughcapable of exhibiting complex dynamic patterns for high p ,still has vanishing oscillations, eventually leveling to zero.The second shortcoming is that it is impossible to get thenew latent factor values when one updates Y with new data,other than re-estimating the factorization model. The key issuewith re-estimation is that the re-estimated factors and loadingsare not guaranteed to resemble the ones from the factorizationbefore the data update. Given these shortcomings, followingguidelines for the application of the temporal regularizedmatrix factorization can be formulated. First, one shouldobserve the experiments by [26] suggest that at least ofthe entries in Y for an adequate reconstruction of the missingdynamics within the training set. Second, the structure of the AR( p ) regularizer suggests that the factorization should notexpress extreme volatility. A comparison of the performanceof this method with the second data set (non-sparse andmoderately volatile) against the third one (highly sparse andvolatile) supports this. Third, due to the dynamic nature of thefactor forecasts, the best strategy is to compute one-step-aheadforecasts and re-estimate the factorization upon new data. c) On the experiment: The results from Table I showthat the best model was Adaboost with an SMAPE of 0,17. Itwas followed by the Ensemble of Random Forests with 0,18.Both performed signiﬁcantly better than Arimax, the baselineused by the manufacturer, with 0,32. Worth highlighting arethe results obtained by Diagonal Feeding. The best methodusing this transformation technique, a random forest with log-transform and ﬁtted on the full data set, obtained 0,34. Itwas signiﬁcantly better than an average consisting of methodstrained on 360 features with a SMAPE of 0,42. d) On the scope of the study:

The objective of this studywas to improve the results obtained from the forecast methodused by the manufacturer, ARIMAX. At the same time, itseeks to provide tools that demand planners at the electronicsmanufacturer can use without requiring extensive knowledgein computer science. In this study, ARIMAX showed goodresults using SMAPE as an error metric. However, lookingat individual items, it only gave the best results for lessthan 10% of the inventory. Besides, it showed that usingDiagonal Feeding improves results without extensive featureengineering. From an academic perspective, this study ﬁlleda void. In the literature, there are no comprehensive studieson demand forecasting for manufacturers that practitioners canuse as a reference. e) On TDA for Model Selection:

The manufacturer’sinventory consists of millions of components. Thus, proper andefﬁcient model selection becomes essential. Model selectionbased on TDA produced fast and explainable results. It workedwell even with a small number of data in comparison to thenumber of models, i.e., 200 time-series and ﬁve models. Tofurther validate this approach in an industry-setting, two com-parisons were conducted between TDA and Dynamic TimeWarping (DTW) with K-Means, see [31]. The ﬁrst experimentconsisted of 80 000 time series generated from the data setwith added random noise. For TDA, it took less than 30minutes on a standard commercial laptop, whereas DTW wasnot able to complete the process. A second experiment usingDTW with K-Means was made under the same conditionsdescribed in section IX. It revealed that the ﬁrst clusterconsisting of AdaBoost, BSTS and BSTS classiﬁer is thebest suited for 69% of the time series. For the second clustercontaining Poisson regression and Random Forest, it was 31%.Yet, it is still necessary to conduct experiments to verify thepurity of the cluster. XI. C

ONCLUSION

This work had as an objective to provide practitioners witha system for demand forecasting consisting of preprocessing,training, and prediction of a large number of models as wellas model selection. As a preprocessing technique, DiagonalFeeding was introduced. It helps demand planners improve theaccuracy of their methods whenever future delivery dates areknown and without requiring domain knowledge or extensivefeature engineering. For prediction and testing, a large studycomparing over fourteen methods was presented. Also, itapplied a method based on matrix factorization for demandforecasting. Similarly, a model selection method based onTDA was presented. In an industry-setting, low error met-rics such as SMAPE can be misleading. The trained modelmight be incapable of forecasting the actual demand. Themethodology provided alleviates this and shows better resultsthan similar techniques while being easy to communicate tostakeholders. As a further line of work, this study would liketo point out two main directions. First, for matrix factorization,there is the need to improve it for sparse data as well asto be more computationally efﬁcient. Second, for the modelelection based on TDA, it is worth considering differentapproaches not based on graph partitioning. One example isclustering based on point clouds. In conclusion, there is a needto up-skill existing personnel, and researchers can contributeto close this gap. Given the signiﬁcant demand for analyticstalent in the years to come, one can expect that the academiccommunity will focus their attention in this direction.R

EFERENCES[1] D. Bogataj and M. Bogataj, “Npv approach to material requirementsplanning theory - a 50-year review of these research achievements,”

International Journal of Production Research , 2018.[2] K. B. Kahn, “An exploratory investigation of new product forecastingpractices,” 2002.[3] C. Jain, “Benchmarking new product forecasting,”

The Journal ofBusiness Forecasting , 2005.[4] D. Simchi-Levi and P. Kaminsky,

Designing and managing the supplychain: concepts, strategies and case studies . Tata McGraw-HillEducation, 2008.[5] T. W. Gruen, D. S. Corsten, and S. Bharadwaj, “Retail out of stocks:A worldwide examination of extent, causes, and consumer responses,”2002.[6] R. Rivera and E. Burnaev, “Forecasting of commercial sales with largescale Gaussian Processes,”

ArXiv e-prints , Sep. 2017.[7] C. W. Chase,

Demand-Driven Forecasting . Hoboken, NJ, USA:John Wiley & Sons, Inc., aug 2013. [Online]. Available: http://doi.wiley.com/10.1002/9781118691861[8] E. Fleisch and C. Tellkamp, “Inventory inaccuracy and supply chainperformance: a simulation study of a retail supply chain,”

InternationalJournal of Production Economics , vol. 95, no. 3, pp. 373–385, mar 2005.[9] R. Kaul, “Retail out-of-stock management: An outcome-based ap-proach,” 2013.[10] C. Pompa and T. Burke, “Data Science and Analytics Skills Shortage:Equipping the APEC Workforce with the Competencies Demandedby Employers,”

APEC Human Resource Development Working Group ,2017.[11] N. Agell and M. Carricano, “Adopcion e impacto del Big Data yAdvanced Analytics en Espaa,”

ESADE Business and Law School , May2018.[12] K. S. Attar, “Regression for Demand Forecasting,” vol. 5, no. 1, 2016.[13] M. Gilliand,

Business Forecasting , M. Gilliland, L. Tashman, andU. Sglavo, Eds. Hoboken, NJ, USA: John Wiley & Sons, Inc., dec2015. [Online]. Available: http://doi.wiley.com/10.1002/9781119244592[14] N. K. Ahmed, A. F. Atiya, N. El Gayar, and H. El-Shishiny, “An empir-ical comparison of machine learning models for time series forecasting,”

Econometric Reviews , vol. 29, no. 5, 2010.[15] G. Tirkes, C. G¨uray, and N. C¸ elebi, “Demand forecasting: a comparisonbetween the Holt-Winters, trend analysis and decomposition models,”

Tehnicki vjesnik - Technical Gazette , vol. 24, no. Supplement 2, pp.503–509, sep 2017.[16] J. W. Taylor, “Forecasting daily supermarket sales using exponentiallyweighted quantile regression,”

European Journal of OperationalResearch , vol. 178, no. 1, pp. 154–167, apr 2007. [Online]. Available:http://linkinghub.elsevier.com/retrieve/pii/S0377221706000737[17] F. M. Bianchi, E. Maiorino, M. C. Kampffmeyer, A. Rizzi, andR. Jenssen, “An overview and comparative analysis of Recurrent NeuralNetworks for Short Term Load Forecasting,” 2017.[18] R. Carbonneau, K. Laframboise, and R. Vahidov, “Application ofmachine learning techniques for supply chain demand forecasting,”

European Journal of Operational Research , vol. 184, no. 3, pp. 1140–1154, feb 2008.[19] X.-l. Wan, Z. Zhang, X.-x. Rong, and Q.-c. Meng, “Exploring anInteractive Value-Adding Data-Driven Model of Consumer ElectronicsSupply Chain Based on Least Squares Support Vector Machine,”

Scientiﬁc Programming

Decision Support Systems , 2012.[21] P. M. Yelland, S. Kim, and R. Stratulate, “A Bayesian Model for SalesForecasting at \ nSun Microsystems,” Interfaces , vol. 40, no. 2, pp. 118–129, 2010. [22] N. Liu, S. Ren, T.-M. Choi, C.-L. Hui, and S.-F. Ng, “Sales Forecastingfor Fashion Retailing Service Industry: A Review,”

MathematicalProblems in Engineering

Acoustics, Speech and Signal Processing (ICASSP), 2012IEEE International Conference on . IEEE, 2012, pp. 2697–2700.[24] P. Chen and D. Suter, “Recovering the missing components in a largenoisy low-rank matrix: Application to sfm,”

IEEE transactions onpattern analysis and machine intelligence , vol. 26, no. 8, pp. 1051–1063, 2004.[25] R. Rivera, I. Nazarov, and E. Burnaev, “Towards forecast techniques forbusiness analysts of large commercial data sets using matrix factorizationmethods,”

Journal of Physics: Conference Series , vol. 1117, p. 012010,nov 2018.[26] H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal regularized matrixfactorization for high-dimensional time series prediction,” in

Advancesin Neural Information Processing Systems 29 , 2016.[27] F. Chazal and B. Michel, “An introduction to topological data analysis:fundamental and practical aspects for data scientists,” 2017.[28] M. Christ, N. Braun, and J. Neuffer, “Time series feature extraction onbasis of scalable hypothesis tests (tsfresh–a python package),”

Neuro-computing , 2018.[29] G. N. Lance and W. T. Williams, “Mixed-data classiﬁcatory programs i- agglomerative systems,” 1967.[30] B. Slininger, “Fiedlers theory of spectral graph partitioning.”[31] D. J. Berndt and J. Clifford, “Using dynamic time warping to ﬁndpatterns in time series.” in