Reinforcement Learning based dynamic weighing of Ensemble Models for Time Series Forecasting
Satheesh K. Perepu, Bala Shyamala Balaji, Hemanth Kumar Tanneru, Sudhakar Kathari, Vivek Shankar Pinnamaraju
RReinforcement Learning based dynamic weighing of Ensemble Modelsfor Time Series Forecasting
Satheesh K. Perepu , Bala Shyamala Balaji , Hemanth Kumar Tanneru , Sudhakar Kathari , Vivek Shankar Pinnamaraju Abstract — Ensemble models are powerful model buildingtools that are developed with a focus to improve the accuracyof model predictions. They find applications in time series fore-casting in varied scenarios including but not limited to processindustries, health care, and economics where a single modelmight not provide optimal performance. It is known that if mod-els selected for data modelling are distinct (linear/non-linear,static/dynamic) and independent (minimally correlated models),the accuracy of the predictions is improved. Various approachessuggested in the literature to weigh the ensemble models use astatic set of weights. Due to this limitation, approaches usinga static set of weights for weighing ensemble models cannotcapture the dynamic changes or local features of the dataeffectively. To address this issue, a Reinforcement Learning(RL) approach to dynamically assign and update weights ofeach of the models at different time instants depending on thenature of data and the individual model predictions is proposedin this work. The RL method implemented online, essentiallylearns to update the weights and reduce the errors as the timeprogresses. Simulation studies on time series data showed thatthe dynamic weighted approach using RL learns the weightbetter than existing approaches. The accuracy of the proposedmethod is compared with an existing approach of online NeuralNetwork tuning quantitatively through normalized mean squareerror(NMSE) values.
I. I
NTRODUCTION
Model building is a fundamental step in understandinga process, estimating the optimal parameters, making pre-dictions, fault detection, safety and economic aspects ofany process. There are 2 primary approaches to modelling(i) first-principles models, where the mathematical modelof the process is derived from fundamental physical laws,and (ii) data-driven models, where the model that describesthe dynamics of underlying process is estimated from thedata. In this paper, we focus on the latter approach for timeseries forecasting. Although there exists different data-drivenmethods and model structures [1]–[4], forecasting industrialdata using a single model structure may not be sufficient to Satheesh K Perepu is with Ericsson Global Services, Chennai, TamilNadu, India [email protected] Bala Shyamala Balaji is with the Department of Chemical Engineering,Indian Institute of Technology - Madras, Chennai, Tamil Nadu, India [email protected] Hemanth Kumar Tanneru is with the Department of Chemical Engi-neering, Indian Institute of Petroleum and Energy, Visakhapatnam, AndhraPradesh, India [email protected] Sudhakar Kathari is with Honeywell Connected Enterprise, Bangalore,Karnataka, India [email protected] Vivek Shankar Pinnamaraju is an Assistant Professor at Indian Instituteof Technology- Dhanbad, Jharkand, India [email protected] achieve the desired performance which brings out the twoquestions as below.1) Which is the best model amongst all models built forthe given data?2) Will all the models built in 1 provide a similar perfor-mance on entire data set?To answer the first question, generally a trial and errorapproach is employed. Different sets of models are builtand are validated using predictions on the test data. Modelwith least prediction error is naturally selected as the bestone. However, it doesn’t mean that the chosen model is ableto capture the data characteristics well. If the data containsdifferent local features such as linear/periodic etc., differenttypes of models are required to capture them. However,identifying all the features in the data is strenuous owingto complex nature of data. Hence it is difficult to comparemodels built in (A), which answers the second question. Inorder to improve the performance of the resulting model,recently, building an ensemble of individual models hasgained attention. However, the accuracy of an ensemblemodel depends on the weights given to the individual modelssince the contribution of models may vary based on the localfeatures of the data. Hence, it is important to decide how toselect weights for the different models under consideration.There are various ways in which the ensemble models aregenerated.Ensemble models in the literature are predominantly builtbased on two approaches: i) serial and ii) parallel. The serialapproach involves dividing the training data into subsets andbuilding different models for each split. For instance, Zhanget.al. [5] has provided an approach for predicting exchangerates using a serial ensemble of neural network models.To use this method, some primary knowledge about localfeatures and data trend is essential to split into differentregions for model building. On the other hand, the parallelapproach ensemble models are built on the entire trainingdata [6], i.e., many relevant models are built for the entireset of data. In both the approaches, the predictions obtainedfrom individual models are combined to arrive at a singleprediction.A straightforward ensemble would be to use the first modalvalue or a simple average of predictions of all models fora regression problem or a majority vote for a classificationproblem. An alternate approach is to compute a weightedaverage of individual models by solving an optimization a r X i v : . [ c s . L G ] A ug roblem to minimize the overall prediction error for a regres-sion problem or to minimize the misclassification rate in thecase of a classification problem. The disadvantage of theseapproaches is that they use constant value of weights throughout the entire training data for the different models underconsideration. However, this may not be correct since themodels’ performance depends on local features of the data.Based on the individual model performance, the weights ofthe ensemble models are required to be updated when a newsample arrives. To address the same, in this work we resort tousing Reinforcement Learning (RL) to dynamically estimatethe weights of the ensemble model.RL derive their foundations from the aspects of optimalcontrol. The basic aim of RL is to maximize the reward(which is measure of how good your system performs)obtained by performing the actions (input in control terminol-ogy) such that the system moves from one state to anotherstate (direction that yields good reward). For more detailson RL, reader is directed to a book by Sutton and Barto[7]. Few works in literature have also reported the usage ofRL techniques for pruning of ensemble models [8], [9]. Inthese works, RL techniques are used for ensemble pruningi.e. to select which model is performing better for the currentsample in the case of classification problem. The action space(weights of individual model) and state space (predictionerror) as considered as discrete, specifically binary. However,for the case of forecasting, the state space and action spaceshould be infinite since the error and weights can take anyreal value. For the ease of solving the problem, we assumethe action space is infinite and state space is finite. Moredetails on this is explained in the Section II.Addressing the challenges as above, the novel contri-butions of the work are i) to pose the problem of timeseries forecasting as weighted ensemble model, ii)to computeweights of the ensemble model dynamically by using RLtechnique assuming infinite action space. It is to be kept inmind that RL is not directly used for the model predictions.The advantage of this approach is that RL can dynamicallylearn the different model weights depending on the localfeatures of the training data at each instant of time. Thiscan help us in time series forecasting or missing dataidentification after sufficient training using RL. The methodat each instant, rewards or penalizes each model dependingon the prediction at previous instant(s), which in turn is anaggregate of the past prediction errors.The remainder of the paper is organized as follows.Section II focuses on explaining the proposed approach usingthe RL methodology and a Neural Network based dynamicweighting of ensemble models with which the results arecompared. Section III provides the results on a time seriesdata set and discussion on the performance of the proposedapproach. This is followed by Section IV providing thecontributions of the proposed approach and the possibleextensions to future work. II. P ROBLEM FORMULATION
A general description of ensemble model prediction isexplained below. Let y be an univariate time series data in R n and the prediction of the time series data at an instantof time t using a model i is given by ˆ y [ t ] i . The predictionsfrom different models is combined linearly as in Eq. (1). ˆ y p [ t ] = M (cid:88) i =1 w i ˆ y i [ t ] (1)where y ∈ R N is uni-variate time series data, M is thenumber of models, ˆ y i [ t ] represents the prediction from the i th model at time instant t and w i represents the weightsused to combine predictions. The individual predictions areweighted averaged out to obtain the combined prediction ˆ y p [ t ] as per Eq. (1). The individual weights are chosen to bebetween [0 , and the sum of the weights amounting to .One can solve this as optimization problem and we computethe static set of weights for entire training data. However, theformulation has the disadvantage that the individual modelscan perform differently across the entire stretch of trainingdata.Hence it is better to dynamically identify the weightsof the model depending on the local features of the data.The RL method can learn new data trends to effectivelymodify weights to improve the forecast. Having describedthe problem statement, we will focus on the interpretationof the RL terminologies with respect to the problem underconsideration and the challenges and solution methods.The important terminologies in RL are the reward, policy,states and actions which are explained as follows.1) States : States describe the output of the system i.e.the system response. Here states are considered as thenormalized form of the combined prediction error ofthe ensemble of models expressed as percentage. If weconsider the number of models are M for training thesystem. Then the current state is given by S t = ( y [ t ] − M (cid:88) i =1 w i [ t ]ˆ y i [ t ]) y [ t ] × (2)where y [ t ] is the true value of prediction at time t and ˆ y [ t ] i is the prediction of the i th model at time t weightedby the factor w i at time t .As already mentioned in Section I, the computationalcost of the RL methods depends on number of statesand actions of the process. Here, the prediction error isconsidered as state of the process and can take infinitevalues. Hence, we choose to discretize the predictionerror into n categories to make the states finite. Thefinite state given as S ft = LB when LB ≤ S t ≤ U B and the LB and U B are the obtained by dividing[0,100] to n intervals. The process is considered as aMarkov Decision process wherein the next state dependsonly on the state action pair at the current instant oftime.) Actions : Actions represent the movement from onestate to another. With respect to this work actions arethe weights assigned to each of the M models. Theactions are considered to be an infinite variable in thisapproach. Although, this results in high computationalcost, there is no way we can discretize them as a smallchange in weight can lead to large changes in error ofthe predictions. Hence, the actions/weights to each ofthe ensemble models is a continuous value ranging from to . Weight w i [ t ] in Eq. (2) is denoted as action A t for the RL problem.To summarise, the prediction error forms the state andthe assignment of weights become the actions. Next wewill look into the concepts of the reward function whichforms the objective in the RL framework.3) Reward and Return : Return is the total discountedreward obtained due to the current action performed tomove the system output from current to the next state.The objective of the work is to apply actions such thatthe system is moved to a state which yields maximumreturn. In RL, the return at the current instant dependsnot only on the instantaneous reward obtained but alsoon the future rewards. This can be described as G t = R t +1 + γR t +2 + γ R t +3 · · · = ∞ (cid:88) k =1 γ k R t + k +1 (3)where R t is the reward at current time t and γ is a scalarbetween [0 , . γ is known as the discounting factor andis used to give importance to the contribution of rewardto the future states due to the current action. In otherwords, if γ is close to , we give more importance to thefuture states and if the γ is close to , lower importanceis given to the farther states. For the proposed method,the reward at an instant t is computed as R t = S t − − S t (4)where S t is representative of the prediction error atinstant t . Naturally, it is required to lower the predictionerror and hence this reward function is put to use.More the decrease in the prediction error, better is thereward. Different reward forms like inverse of predic-tion error ( / | S t | ), inverse of difference of predictionerror ( / | S t − S t − | ) can also be used. However, fromsimulations, Eq. (4) was found to perform better.4) Policy : Policy provides the distribution of actions forthe current state. A policy π is given by π ( a | s ) = P [ A t = a | S t = s ] . (5)It is required to identify the optimal policy to maximizethe rewards. Using the distribution of actions we cancome up with state and action value functions whichgive us the long term value of states and expectedreward to move from current state and action followingthe policy function. The state value function is given by v π ( s ) = E π [ G t | S t = s ] . (6) and the action value function is given by q π ( s, a ) = E π [ G t | S t = s, A t = a ] . (7)The optimal state/action value function is given asthe maximum of the state/action value function. Theobjective of the RL is to find optimal policy whichgenerates total maximum reward.There are two types of policy learning methods inliterature [7] (i) off-policy learning, like Q-learningwhere the value function is learned from executing theactions of another policy and (ii) on-policy learning,like SARSA, where the value function is learned fromexecuting actions of same policy. For detailed analysisof these methods, readers are advised to read [7]. Inthis work, we used online policy learning to update thepolicy over the episodes.Delving into to the major challenges in using RL, incertain cases it is not trivial to arrive at reward function as thesystem can be a black box model. In such cases, we resortto deep-RL where the reward function is modeled usinga deep neural network. The deep network is trained withstates as input and actions and rewards as the output. Thedeep neural network solves a regression problem to estimatebest action which generates high reward for a given state.The regression problem ensures mapping the infinite actionspace which translates to infinite choice of weights. Theweights of the network are updated based on the predictedvalue function from the Bellman equation and actual rewardobtained from the network. In this way, the proposed methodhelps in identifying the weights of the proposed method.The methodology of the weight updation using RL can beexplained using Fig. 1. The following steps are performed inorder to compute the predictions using the proposed approach1) y ∈ R N is divided into y train ∈ R K to train theensemble models and RL and y test ∈ R N − K to testthe predictions y p
2) Fit M different models for y train and estimate thepredictions ˆy i ∀ i ∈ , , · · · , M for N − K samples.3) Formulate the RL problem and use episodic RLlearning with P episodes by dividing y train as [ y train, y train, · · · y train,P ] T where y train,i ∈ R K b and K b = NP , K b ∈ Z . Selection of K b depends on thesize of training data.4) The output of the RL model will be a dynamic set ofweights for the ensemble models on each instant oftraining data in each episode. Reset the system aftereach episode.5) Use the learned RL model to predict the instant byinstant of the first sample of the testing data andcompute the prediction error.6) Input the error to the RL model and obtain the new setof weights which is used to predict next instant of thetest data.7) Repeat steps 5-6 until all the test data is predicted.An interpretation of the proposed method can be seen asonline model which updates weights of ensemble models at ig. 1. Proposed methodology using Reinforcement Learning every instant of time. The weights of the ensemble modelsare updated not only based on the previous instant predictionerror but also on the instants before that. The weights ofensemble models can be seen as non-linear function ofthe prediction errors of previous instants and function iscontinuously updated at every instant. For demonstrating theefficacy of the method, we compare the results of proposedmethod with another famous non-linear approximation (neu-ral network (NN)) to update the weights of ensemble models.More details on the results is discussed in next section of thepaper. III. R ESULTS AND D ISCUSSION
The proposed approach has been tested on the benchmarkCATS (Competition on Artificial Time series) dataset [10]which was released as part of IJCN 2004 Time seriesprediction problem. The total length of the time series datais samples out of which data points are consideredmissing and these missing samples are grouped into 5 blocksas to , to , to , to and to .The goal of this problem is to predict those × missingdata points. Since, we are trying to solve this problem usingtime series forecasting approach, we predict those samplesonly using the previous samples. For the sake of comparison,we also predict the missing values in the dataset using theonline learning NN method where the weights are updatedat every sample since the proposed method with RL alsoupdates the weights at every sample. In general, this may notbe efficient as updating the weights for a batch of sampleswhich can lower the computational level. This is because,the features of data may not change from sample to sample.However, this is beyond the scope of this work and can be solved by adding batch size as one of the hyperparametersto the problem.The ensemble of models selected for this data are a linearregression model, Long Short Term Memory (LSTM) model,Artificial Neural Networks (ANN) and Random forest. Thedifferent models selected ensure that it can capture lineartrends if the data is linear (linear regression) and to capturelower order non linearity (ANN with 2 hidden layers), LSTMmodel to take the past data into effect and the random forestas it can give a good performance with least interpretation.As discussed, data points are missing for every samples and hence we consider every samples as trainingdata and subsequent samples as test data. These sam-ples are considered as individual samples and which meanswe have × samples as training data and intermediatemissing × form test data. For each × samples, wefit different models as mentioned above. Similarly, we fitfor every set of training data to obtain five sets of 4 differenttypes of models. The parameters of the models for differentsets of data are averaged to obtain the different types ofmodels representing the entire training data.Coming to the computation details of the problem, each ofthe samples are considered as single episode in RL andtotal training samples ( × ) are considered as episodes.We run the RL for episodes which is equivalent torunning through the entire training data times. The rewardobtained across each episode is summed up and total rewardis plotted with respect to episodes in Fig. 2. It can be seenfrom the Fig. 2 that as the number of episodes increase, thereward value obtained also tends to increase. This is becausethe system is learning to modify the coefficients (weights)such that the prediction error decreases with every episode. ig. 2. Smoothed reward function over different episodes Though the purpose of the work is not to use differentmodels for different regions of data, it was observed thatdifferent models performed better in different regions of data.The performance is decided by observing the sum of weightsgiven to the various models in different bands (consideredhere as samples) of data. We have considered the modelwhich has the maximum weightage in each band as thedominant model. It was observed that in the samples − the Neural Network model worked better whereas in theregion − the Linear model worked better. Similarlyin the region − , the LSTM model was betterand in − , the Neural Network model was againdominant. Finally, in the region − the RandomForest performed well. All these model identification are notvisually evident from the data but a few can be inferred. Forinstance, it can be observed from Fig. 3 that in the regionaround − there seems to be a linear trend andhence the Linear model performing well in that region isanticipated.As explained at the beginning of the section, the weightsfor the purpose of comparison has been obtained by usingonline NN. To explain, the same set of ensemble modelsare built for each of the samples and the correspondingmodel coefficients are averaged to come up with a singlemodel for each ensemble models. Then, we formulate theNN with two hidden layers with 4 nodes each with tanh activation function and 4 input nodes corresponding to thenumber of ensemble models and 4 output nodes with weightsof the models. Initially, the network is initialized to randomweights and use the stochastic gradient descent to updatethe weights of the model. The loss function here takes theestimated output (weights of the ensemble model), computesthe prediction at that instant and compares with the true dataat that instant. The updation happens at every sample andwith every sample we obtain new set of weights which isused to predict the next sample. In the case of the test data,the final updated set of weights are used to predict eachsample in contrast to the RL technique where the set ofweights are updated even when progressing through the test − − − Fig. 3. Comparison of predictions of true data to RL based proposedapproach and Online NN TABLE INMSE
SCORES FOR DIFFERENT MODELS
Model NMSE scores
LSTM 0.478ANN 0.26Linear regression 0.75Random Forest 0.84Online NN 0.256Proposed approach 0.143 data. The reason is becasue of the udpated nature of the RLmethod.Figure 3 compares the true value of time series data withthat of predictions obtained from the proposed methodologyand online NN. From the plot, it is evident that the pro-posed methodology gives predictions closer to the true datawhen compared with online NN approach. To quantify theperformance, we used the NMSE metric to compare both themethods. The metric is computed asNMSE = n test (cid:88) t =1 ( y t − ˆ y t ) n test (cid:88) i =1 y t (8)where n test is the number of test data samples, and ˆ y and y arethe predicted and the true value of the time series data at timeinstant t . To check the usability of these models on the testingdata, we computed NMSE only for the testing samples. TableI provides the NMSE scores obtained from the two methodsand the individual model performances on the test data i.e.,on samples. From the table it is evident that the proposedensemble approach with RL outperforms individual modelsas well as online NN method in terms of the NMSE scores.It can be observed from Table I that ANN model has themaximal accuracy among individual models. The proposedmethod has a better NMSE of . which could be tunedto obtain better scores by modifying the hyperparameters ofthe RL model. For a clear picture on the performance of the
20 40 60 80
Index A m p li t ude True DataProposed ApproachEnsemble Method
Fig. 4. Comparison on the test data two methods on the test data, Fig. 4 has been included. Itis to be observed that the index refers to the indices of themissing variables amounting to samples.The computational complexity of the proposed method isequivalent to that of online NN counterpart. In this casestudy, it took an average of . seconds to compute theweights using the RL based technique and . seconds toestimate weights for online NN method. These results areobtained when the code is run on local machine with MACOS with 16GB RAM and i9 processor with no GPU. Fromthe computational time taken, it can be seen that the proposedmethod can be employed in online applications even withlesser sampling time i.e. less than that of 0.2 seconds. Fromthe results, it can be concluded that the proposed method withRL gives good and accurate predictions when compared withthe online counterparts.IV. C ONCLUSIONS
In this paper, a method based on RL is proposed toestimate the weights of an ensemble model dynamically fortime series forecasting. Simulation results showed that theproposed method performs well when compared to the exist-ing methods that use NN based dynamic weighting. Further,the proposed method can handle local features of the timeseries data since the weights are computed dynamically. Anadded advantage of the proposed method is that the user willbe able to identify the predominant model at any instant oftime based on the computed weights. The NMSE scores fordifferent models in modeling the benchmark CATS data setillustrate that the proposed RL-based approach outperformsthe existing static weighting methods and online learningmethods.Future work includes the extension of the proposedmethod for modeling and forecasting of multivariate timeseries. Modeling multivariate time series is not an easy taskdue to the existence of multicollinearity and cross-correlationamong predictors. The possible scope of the improvement tothe proposed method in modeling univariate time series is that the number of episodes of RL can be reduced usingbetter reward value, which is also an interest of future work.R
EFERENCES[1] Lennart Ljung. System identification.
Wiley Encyclopedia of Electricaland Electronics Engineering , pages 1–19, 1999.[2] Minju Kim, Kosuke Nishi, Kandhasamy Sowndhararajan, and Song-mun Kim. A time series analysis to investigate the effect of inhalationof aldehyde c10 on the human eeg activity.
European Journal ofIntegrative Medicine , 25:20–27, 2019.[3] Yongjian Wang and Hongguang Li. A novel intelligent modelingframework integrating convolutional neural network with an adaptivetime-series window and its application to industrial process operationaloptimization.
Chemometrics and Intelligent Laboratory Systems ,179:64–72, 2018.[4] Dazhi Jiang, Jian Gong, and Akhil Garg. Design of early warningmodel based on time series data for production safety.
Measurement ,101:62–71, 2017.[5] G Peter Zhang and VL Berardi. Time series forecasting withneural network ensembles: an application for exchange rate prediction.
Journal of the operational research society , 52(6):652–664, 2001.[6] Daijin Kim and Chulhyun Kim. Forecasting time series with geneticfuzzy predictor ensemble.
IEEE Transactions on Fuzzy systems ,5(4):523–535, 1997.[7] Richard S Sutton and Andrew G Barto.
Reinforcement learning: Anintroduction . MIT press, 2018.[8] Ioannis Partalas, Grigorios Tsoumakas, Ioannis Katakis, and IoannisVlahavas. Ensemble pruning using reinforcement learning. In GrigorisAntoniou, George Potamias, Costas Spyropoulos, and Dimitris Plex-ousakis, editors,
Advances in Artificial Intelligence , pages 301–310,Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.[9] Christos Dimitrakakis. Ensembles for sequence learning, 01 2007.[10] Amaury Lendasse, Erkki Oja, Olli Simula, and Michel Verleysen.Time series prediction competition: The CATS benchmark.