Optimal Prediction Intervals for Macroeconomic Time Series Using Chaos and NSGA II
11 Optimal P rediction Intervals for Macroeconomic Time Series Using Chaos and NSGA–II
Vangala Sarveswararao , Vadlamani Ravi [0000-0003-0082-6227] , Sheik Tanveer Ul Huq Centre of Excellence in Analytics, Institute for Development and Research in Banking Technology, Castle Hills Road No. 1, Masab Tank, Hyderabad-500057, India SCIS, University of Hyderabad, Hyderabad-500046, India
Abstract
In a first-of-its-kind study, this paper proposes the formulation of constructing prediction intervals (PIs) in a time series as a bi-objective optimization problem and solves it with the help of Nondominated Sorting Genetic Algorithm (NSGA-II). We also proposed modeling the chaos present in the time series as a preprocessor in order to model the deterministic uncertainty present in the time series. Even though the proposed models are general in purpose, they are used here for quantifying the uncertainty in macroeconomic time series forecasting. Ideal PIs should be as narrow as possible while capturing most of the data points. Based on these two objectives, we formulated a bi-objective optimization problem to generate PIs in 2-stages, wherein reconstructing the phase space using Chaos theory (stage-1) is followed by generating optimal point prediction using NSGA-II and these point predictions are in turn used to obtain PIs (stage-2). We also proposed a 3-stage hybrid, wherein the 3 rd stage invokes NSGA-II too in order to solve the problem of constructing PIs from the point prediction obtained in 2 nd stage. The proposed models when applied to the macroeconomic time series, yielded better results in terms of both prediction interval coverage probability (PICP) and prediction interval average width (PIAW) compared to the state-of-the-art Lower Upper Bound Estimation Method (LUBE) with Gradient Descent (GD). The 3-stage model yielded better PICP compared to the 2-stage model but showed similar performance in PIAW with added computation cost of running NSGA-II second time. Keywords:
Prediction Intervals, Macroeconomic time series, Bi-objective optimization, NSGA-II, LSTM, LUBE Method
1. Introduction
Even though neural networks (NNs) have shown impressive performance in terms of prediction accuracy, in many real-world applications , however, the uncertainty of each prediction must also be quantified as there is a large downside to making an incorrect prediction in areas such as finance, weather, traffic, manufacturing, energy networks and prognostics. Krzywinski & Altman [1] and Gal [2] experimented on NNs to meet this requirement. Prediction intervals (PIs) will communicate with us directly Corresponding Author;
Tel.: +914023294310; Fax: +914023534551; E-mail: [email protected] by offering a lower and an upper bound for each prediction and this information helps us make better-informed decisions. A variety of approaches have been developed, ranging from fully Bayesian NNs [3] to interpreting dropout as performing variational inference [4]. However, these methods require either strong assumptions or high computational demands for running them. Khosravi et al. [5] developed the Lower Upper Bound Estimation (LUBE) method, which uses NN to generate PIs and its loss function is incompatible with Gradient Descent (GD) for training. Later, Pearce et al. [6] developed a new loss function which is compatible with GD outperformed the LUBE based state-of-the-art methods. In this work, we formulate the construction of PIs as a multi-objective optimization problem and solve it by NSGA-II [7]. PIs should capture as many data points as possible, while narrowing the width of the interval. Accordingly, quality of PIs is mainly assessed by using the measures viz., prediction interval coverage probability (PICP) and prediction interval average width (PIAW) [8] [9] [10]. We utilized these two objectives and invoked NSGA-II to quantify the uncertainty for each prediction. First, we estimated the point predictions using NSGA-II by minimizing forecasting errors as one objective and directional symmetry as another objective and then we formulated two equations to compute both lower and upper bounds using these point predictions. These equations involve two random numbers. In the 2-stage method, we used the grid search to get two optimal random numbers to be used in the equation. However, the 3-stage method is proposed with two variants, where we invoked NSGA-II with PICP as one objective and PIAW as another to obtain the optimal combination of random numbers. The chief advantages of the proposed methods are its intuitive objectives and low computational cost compared to LUBE method as it requires training an NN with the help of evolutionary algorithms. The rest of the paper is organized as follows: Section 2 presents the literature review; Section 3 presents the overview the techniques used; the Section 4 presents our proposed model in detail; Section 5 presents experimental methodology; Section 6 presents a discussion of the results and finally section 6 concludes the paper and presents future directions.
2. Literature survey
Tibshirani [11], Papadopoulos et al. [8] and Khosravi et al. [9] are the pioneers who worked on quantifying uncertainty in regression with neural networks. While the first one focuses on Confidence Intervals (CIs), the latter works specifically focus on forecasting prediction intervals. They have presented three primary methods: Delta Method [12] which follows the theory for constructing CIs used by non-linear regression models and this method is computationally expensive as it requires the use of Hessian Matrix. Mean-Variance Estimation (MVE) (Nix and Weigend [13]) uses a neural network with two output nodes, one representing mean and the other representing the variance of the normal distribution, which allows the estimation of the variance of data noise. In this method, they have used Negative Log-likelihood of the prediction distribution of the given data as a loss function. Bootstrap method (Heskes [14]) trains multiple neural networks on different resampled versions of the training data with different initializations of parameters. It can be easily combined with MVE to estimate total variance. Lakshninarayan et al.[15] improved the work of Heskes [14] by ensembling individual MVE Neural Networks by resampling the training set and including adversarial training examples, which they refer as MVE Ensemble. Lower Upper Bound Estimation (LUBE) method was developed by Khosravi et al. [5] based on the principle of High-Quality Prediction Intervals. LUBE method used a neural network to construct prediction intervals and the parameters of the NN were estimated using the Simulated Annealing algorithm. Various non-gradient methods such as Genetic Algorithms (Ak et al. [16]), Gravitational Search Algorithms (Lian et al. [17]), Particle Swarm Optimization (Galvan et al. [10]; Wang et al. [18]), Extreme Learning Machines (Sun et al. [19]), and Artificial Bee Colony Algorithms (Shen et al. [20]) were proposed to improve the predictions. LUBE method has been used in many applications including predicting energy load (Wan et al. [21]; Quan et al. [22]), wind speed ([18] [16]), landslide displacement ([17]), solar energy ([10]) and others. Works in PIs for financial time series include Muller and Watson’s [23] Robust Bayes PI algorithm to forecast long term prediction intervals for economic growth over a horizon of 10- 75 years and Chudy et al. [24] works on computational adjustments to the Bayesian & bootstrapping methods to improve coverage probability for eight macroeconomic indicators. Sarveswararao and Ravi [25] used LSTM architecture instead of NN in the LUBE method as LSTM are known for their capability to learn complex sequential information and outperformed the LUBE method with GD for constructing PIs for CPI inflation. Several works were published on inflation forecasting as it plays an important role in monetary policy formulation in many countries [26] [27] [28] [29] and these works computed point prediction to inflation without any PIs. Pradeepkumar & Ravi [31] worked on the FOREX rate predictions and improved the accuracy by modelling chaos before applying any forecasting algorithm and Ravi et al. [30] proposed hybrids of Neural Networks + Evolutionary algorithms; Quantile regression random forest [32], and Multivariate Regression Splines [33] for forecasting the same. Some works on generating PIs without NN include Kumar et al. [34]’s MapReduce based Fuzzy very fast decision tree and Ravi et al. [35]’s hybrid consisting of support vector machine & quantile regression random forest. Krishna and Ravi [36] presented a comprehensive survey of Evolutionary computing (EC) algorithms applied to Customer Relationship Management (CRM), which focused on using various EC algorithms to solve simple to complex analytical CRM tasks, which turn out to be data mining tasks. So far, Evolutionary Multi-Objective (EMO) algorithms were conspicuous by their absence in generating PIs. Instead, they were employed to train NNs which in turn generated PIs in case of LUBE method. Our work mainly focusses on constructing PIs for an important macroeconomic time series, namely, Consumer Price Index using NSGA-II without any NNs.
3. Overview of the techniques used
In the late 1800s, the theory of chaos was proposed by Poincare and later extended by Lorenz [37] in 1963 to deal with unpredictable complex nonlinear systems [38]. A chaotic system is deterministic, dynamic and evolves from the given initial conditions and it can be described by trajectories in the state space. As the governing equations are not known ahead for a chaotic system, the state space is represented by phase space, which can be reconstructed from the original time series. This reconstructed phase space provides a multi-dimensional view of the original time series [38]. Packard et al. [39] proposed a method to reconstruct the phase space using the method delays, using which for a time series 𝑋 𝑖 where i = 1, 2, 3, .., n, the phase space can be constructed by a m-dimensional vector as in Eq. (1): 𝑌 𝑖 = (𝑥 𝑖 , 𝑥 𝑖+𝜏 , 𝑥 𝑖+2𝜏 , … . , 𝑥 𝑖+(𝑚−1)𝜏 ) (1) Where 𝜏 is the delay time or the lag of the system and 𝑚 is the embedding dimension to reconstruct the phase space. After the reconstruction of phase space, the time series problem gets converted into multi-input single-output (MISO) prediction problem, which can be modelled by methods ranging from linear models to deep neural networks. Rosenstein’s algorithm [40] (see Fig. 1) estimates the largest Lyapunov Exponent ( 𝜆 ) [41] from the given time series as shown as in the figure 1. If 𝜆 ≥ 0 then chaos is present; otherwise chaos is absent in a given time series. Fig. 1 . Rosenstein’s method to calculate 𝜆 Cao proposed a method [42] to find out the minimum embedding dimension for a given times series. Let
𝑋 = (𝑥 , 𝑥 , 𝑥 , 𝑥 , . . . . , 𝑥 𝑁 ) be a time series. In the phase space, the time series can be reconstructed as in Eq. (2): 𝑌 𝑖 = (𝑥 𝑖 , 𝑥 𝑖+𝜏 , 𝑥 𝑖+2𝜏 , … . , 𝑥 𝑖+(𝑚−1)𝜏 ) (2) Where, 𝑌 𝑖 is the i th reconstructed vector, and 𝜏 is the time delay. MOEAs were proposed to solve optimization problems involving two objectives. The MOEAs preserve the solutions which are non-dominating, while progressing algorithmically towards the optimal Pareto front and maintaining diversity in the optimal front. The decision maker can select the solutions from this optimal Pareto front [43] to solve his problem. Deb [44], Mukhopadhyay et al. [36, 37] and Coello Coello et al. [43] proposed some of the works on MOEAs.
NSGA [109] is one of the famous and effective algorithms for solving multi-objective optimization problems. Still, it attracted criticism for its large computational complexity, absence of elitism, requirement of niching and for choosing optimal 𝜎 𝑠ℎ𝑎𝑟𝑒 . Deb [7] modified the original NSGA method to use elitism, a better sorting algorithm & the number of parameters to be chosen ahead and he named this modified version as NSGA-II. For NSGA-II, the population is initialized just like NSGA followed by sorting it based on the objectives into various fronts in a hierarchical way. The initial front has contains non-dominated set of individuals in the whole populations and the second front is only dominated by first, the third front is dominated only by first and second, and the same goes for all. After the sorting is over, crowding distance is calculated for each candidate in the front. It is a measure of closeness of each candidate in the objective space to other members in the population. The fitness value or rank of 1 is given to members in the first front, and two is given to members in the second front and so on. Based on their rank and crowding distance, non-dominated members of the population are selected for further processing. Offspring population (Qt) is generated from the parent population (Pt) by crossover & mutation operations, and they are combined with parent populations to select best N members based on non-dominated sorting. This process of generating a new population goes on until we reach a terminated condition. At every generation, all the best solutions from the previous generations are passed on, which takes care of elitism. A new generation (Pt+1) obtained by merging each front (Fi) until the number of individuals in the selection exceeds the population, in that case, we consider the crowding distance to select best ones from that front until we reach N (see Fig. 2). Fig. 2.
Non-dominated Sorting Genetic Algorithm (NSGA-II)
4. Proposed models
We can formulate generating PIs as a bi-objective optimization problem as follows:
Minimize O = Symmetric Mean Absolute Percentage Error (SMAPE) and Maximize O = Directional Symmetry Statistic (DS) [47] Where SMAPE and DS are defined in Eqs. (3) and (4) respectively.
𝑆𝑀𝐴𝑃𝐸 = 100%𝑛 ∑ |𝑌 𝑡 − 𝑌^ 𝑡 |(|𝑌 𝑡 | + |𝑌^ 𝑡 |) / 2 𝑛𝑡=1 (3) 𝐷𝑆(𝐴, 𝐹) = 100𝑛 − 1 ∑ 𝑑 𝑖𝑛𝑖=2 𝑑 𝑖 = {1, 𝑖𝑓 (𝑌 𝑖 − 𝑌 𝑖−1 )(𝑌^ 𝑖 − 𝑌^ 𝑖−1 ) > 00, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4) Where Y t is the actual value and Y t ^ is the predicted value and n is the number of data points. In forecasting financial time series, minimizing the SMAPE is important as it ignores the scale of underlying data while minimizing the average prediction errors. And the same important has to be given to Directional symmetry statistic as it helps in predicting the movement of the series. Let
𝑌 = (𝑦 , 𝑦 , 𝑦 , 𝑦 , . . . . , 𝑦 𝑁 ) be a time series of size N. Prediction intervals are constructed using the 2-stage model as follows: Stage 1: Reconstructing the phase space Check whether Y contains chaos. If the test is positive then rebuild the phase space of Y with the help of time delay (𝜏) and embedding dimension ( 𝑚 ). Partition 𝑌 into 𝑌 𝑇𝑟𝑎𝑖𝑛 = {𝑦 𝑡 ; 𝑡 = 𝜏𝑚 + 1, 𝜏𝑚 + 2, … , ℎ} and 𝑌 𝑇𝑒𝑠𝑡 = {𝑦 𝑡 ; 𝑡 = ℎ + 1, ℎ + 2, . . . .,𝑁} . Stage 2: Constructing Optimal Prediction Intervals
Obtain optimal coefficients for auto-regression model using NSGA-II, where the auto-regression coefficients are the decision variables which are in the range (-0.5,0.5) with SMAPE and DS as the objective functions to be minimized and maximized respectively. Now, using those coefficients obtain the predictions for both training & test set. These predictions are then used for computing the lower and upper bounds (a.k.a. prediction intervals) for our series with the help of Eq. (5) and (6). 𝑌 𝑡 ̂ = 𝛼 + 𝛼 𝑦 𝑡−𝜏 + 𝛼 𝑦 𝑡−2𝜏 + 𝛼 𝑦 𝑡−3𝜏 + ⋯ + 𝛼 𝑚 𝑦 𝑡−𝜏𝑚 Where 𝑌̂ 𝑡 is point prediction at time t and (𝛼 , 𝛼 , 𝛼 , . … . , 𝛼 𝑚 ) are decision variables / auto-regression coefficients, while (𝑦 𝑡−𝜏 , 𝑦 𝑡−2𝜏 , 𝑦 𝑡−3𝜏 , . . , 𝑦 𝑡−𝑚𝜏 ) are input features after chaos modelling. 𝑌̂ 𝑙𝑜𝑤𝑒𝑟 = 𝑌̂ 𝑡 − 𝑟 ∗ 𝜎(𝑌̂ 𝑡𝑟𝑎𝑖𝑛 ) (5) 𝑌̂ 𝑢𝑝𝑝𝑒𝑟 = 𝑌̂ 𝑡 + 𝑟 ∗ 𝜎(𝑌̂ 𝑡𝑟𝑎𝑖𝑛 ) (6) Where 𝑟 𝑎𝑛𝑑 𝑟 are two random numbers following uniform distribution between (0,1) and we applied grid search to find their values in order to obtain optimal PICP and PIAW, (𝑌̂ 𝐿𝑜𝑤𝑒𝑟𝑡 , 𝑌̂
𝑈𝑝𝑝𝑒𝑟𝑡 ) are the lower and upper bounds for 𝑌̂ 𝑡 , 𝑎𝑛𝑑 𝜎(𝑌̂ 𝑡𝑟𝑎𝑖𝑛 ) is the standard deviation of the point predictions of the training set. The schematic for the 2-satge model is depicted in Fig 3. Fig. 3.
Schematic of the 2-stage model
Let
𝑌 = (𝑦 , 𝑦 , 𝑦 , 𝑦 , . . . . , 𝑦 𝑁 ) be a time series of size N. Prediction intervals for CPI are constructed using the three-stage model as follows. Here NSGA-II is invoked twice. Stage 1: Reconstructing the phase space Check whether Y contains chaos. If the test is positive then rebuild the phase space of Y with the help of time delay (𝜏) and embedding dimension ( 𝑚 ). Partition 𝑌 into 𝑌 𝑇𝑟𝑎𝑖𝑛 = {𝑦 𝑡 ; 𝑡 = 𝜏𝑚 + 1, 𝜏𝑚 + 2, … , ℎ} and 𝑌 𝑇𝑒𝑠𝑡 = {𝑦 𝑡 ; 𝑡 = ℎ + 1, ℎ + 2, . . . .,𝑁} . Stage 2: Obtaining Optimal Auto regression coefficients
Obtain optimal coefficients for auto-regression model using NSGA-II, where the auto-regression coefficients are the decision variables which are in the range (-0.5,0.5) with SMAPE and DS as the objective functions to be minimized and maximized respectively. Now, using those coefficients obtain the predictions for both training & test set. These predictions are then used for getting lower & upper bounds for our series (a.k.a. prediction intervals) with the help of Eq. (6) and (7). 𝑌 𝑡 ̂ = 𝛼 + 𝛼 𝑦 𝑡−𝜏 + 𝛼 𝑦 𝑡−2𝜏 + 𝛼 𝑦 𝑡−3𝜏 + ⋯ + 𝛼 𝑚 𝑦 𝑡−𝜏𝑚 Where 𝑌̂ 𝑡 is point prediction at time t and (𝛼 , 𝛼 , 𝛼 , . … . , 𝛼 𝑚 ) are decision variables / auto-regression coefficients. (𝑦 𝑡−𝜏 , 𝑦 𝑡−2𝜏 , 𝑦 𝑡−3𝜏 , . . , 𝑦 𝑡−𝑚𝜏 ) are input features after chaos modelling. Stage 3: Constructing prediction intervals
In this final stage, instead of performing the grid search to find the optimal random numbers r and r , we invoke NSGA-II which optimizes PICP and PIAW explicitly. 𝑌̂ 𝑙𝑜𝑤𝑒𝑟 = 𝑌̂ 𝑡 − 𝑟 ∗ 𝜎(𝑌̂ 𝑡𝑟𝑎𝑖𝑛 ) (6) 𝑌̂ 𝑢𝑝𝑝𝑒𝑟 = 𝑌̂ 𝑡 + 𝑟 ∗ 𝜎(𝑌̂ 𝑡𝑟𝑎𝑖𝑛 ) (7) Where 𝑟 𝑎𝑛𝑑 𝑟 are the decision variables in the range (0,1) and NSGA-II will find optimal values for decision variables. (𝑌̂ 𝑙𝑜𝑤𝑒𝑟𝑡 , 𝑌̂ 𝑢𝑝𝑝𝑒𝑟𝑡 ) are lower and upper bounds for 𝑌̂ 𝑡 . 𝜎(𝑌̂ 𝑡𝑟𝑎𝑖𝑛 ) is the standard deviation of the point predictions of the training set. Thus, we apply NSGA-II in both 2 nd and 3 rd stages. In the 3 rd stage, we have two variants (i) single random number (ii) two distinct random numbers. In the first variant, we applied NSGA-II to get a single random number so that the PI’s are obtained with the assumption 𝑟 = 𝑟 =r. whereas the 2 nd variant assumes that they are distinct. The optimization problem is as follows: Maximize O = PICP 𝒂𝒏𝒅 Minimize O = PIAW Where
𝑃𝐼𝐶𝑃 = 𝑐𝑛 , 𝑃𝐼𝐴𝑊 = ∑ 𝑦̂𝑈 𝑖 − 𝑦̂𝐿 𝑖𝑛𝑖=1 and c is the number of data points captured out of total n points by prediction interval and < 𝑦̂𝐿 𝑖 , 𝑦𝑈̂ 𝑖 > are predicted lower and upper bounds. The schematic for the 3-stage model is depicted in Fig. 4. Fig. 4.
Schematic of the 3-stage model
5. Experimental Design
We used the Consumer Price Index (CPI) Inflation data of Food & Beverages, Fuel & Light and Headline from the Ministry of Statistics and Programme Implementation (MoSPI), Government of India. The dataset contains monthly inflation starting from January 2012 to December 2018 presented in Fig. 5. Summary statistics of this dataset presented in Table 1 indicates that inflation in India is highly volatile. Table 2 presents the Lyapunov exponent values as well as the minimum embedding dimension for all CPI series. Table 1:
Summary Statistics of the Datasets.
Indicator CPI Food & Beverages CPI Fuel & Light CPI Headline
Mean
Standard deviation
Maximum
Minimum -1.69 2.49 1.46
Table 2:
Chaotic Modelling of Macroeconomic Variables
Series Lyapunov Exponent Delay Time Embedding Dimension
CPI Headline Inflation 0.074 1 8 CPI Food & Beverages Inflation 0.062 1 8 CPI Fuel & Light Inflation 0.060 1 7
Fig. 5.
CPI Headline & its Components Monthly inflation from 2012M01 to 2018M12. We considered last 6 months’ data to be the test and previous data to be the training set. (see Table 3)
Table 3:
Train – Test Split for Data sets
Series Train set Test set
CPI Headline Jan’12 – Jun’18 Jul’18 – Dec’18 CPI Food & Beverages Jan’12 – Jun’18 July’18 – Dec’18 CPI Fuel & Light Jan’12 – Jun’18 July’18 – Dec’18
The proposed model execution is carried out using
Ubuntu 18.04 LTS
Platform with 32 GB RAM and 1 TB HDD. However, it can also be carried out on other platforms. Table 4 catalogues various tools employed in this study. Table 4:
Tools and Techniques used for Proposed Model
Technique used Used for Tool used
Rosenstein’s Method Calculating Lyapunov exponent Python Cao’s Method Finding minimum embedding dimension R Auto Correlation Function Finding Optimal lag/time delay Python NSGA-II Obtaining Coefficients of AR model Python
We have considered two metrics used in PIs literature for measuring the performance of the prediction intervals, namely, PICP and PIAW.
Prediction Interval Coverage Probability (PICP) gives us the percentage of data points in between lower bound and upper bound a.k.a Prediction Interval as in Eq. (6):
𝑃𝐼𝐶𝑃 = 𝑐𝑛 (6) Where c is the total number of points captured by the intervals and n is the total number of points.
Prediction Interval Average Width (PIAW) gives us the average width between the lower and upper bounds as in Eq. (7):
𝑃𝐼𝐴𝑊 = ∑ 𝑦̂𝑈 𝑖 − 𝑦̂𝐿 𝑖𝑛𝑖=1 . (7) Where 𝑦̂𝑈 𝑖 and 𝑦̂𝐿 𝑖 are predicted upper & lower bounds. All three inflation series turns out to be non-stationary when tested using the Augmented Dickey-Fuller Test [48]. We are not accounting for the seasonality in the series as CPI inflation is assumed to contain no seasonality [26]. CPI inflation series contains chaos as confirmed by the Lyapunov Exponent, so we have to model chaos before embarking any forecasting algorithm.
6. Results and Discussion
The best performing hyperparameters for all the experiments in this study are presented in Table 8. We have compared our proposed model results with LUBE method with GD as their model was outperforming all non-gradient based training algorithms for LUBE method to construct prediction intervals. All the three proposed models outperformed the LUBE + GD and LUBE + LSTM in terms of both PICP and PIAW for constructing PI for CPI inflation data. We ran the experiments 20 times with 20 different seeds to remove the effect of the seed value. Accordingly, we presented the mean and standard deviation of PICP and PIAW corresponding to the best solution of the population over 20 experiments in Tables 5, 6 and 7 in the form of 𝑚𝑒𝑎𝑛 ± 𝑠𝑡𝑑. 𝑑𝑒𝑣 .
6. 1. CPI Food and beverages inflation
Table 5 presents the PICP and PIAW values using two-stage model (Chaos + NSGA-II{SMAPE, DS}), and three-stage model (Chaos + NSGA-II {SMAPE, DS} + NSGA-II {PICP, PIAW}). According to the results, the performance of 2-stage model is improved 2x compared to LUBE + GD & LUBE + LSTM in terms of PICP metric and showed similar performance in terms of PIAW metric. By running NSGA-II second time with PICP and PIAW as objectives in the 3-stage model, the performance improved with respect to PICP metric but not in PIAW metric and it comes with an extra computational cost of running NSGA-II compared to single run using 2-stage model. Fig. 6 depicts that the predicted values of the two-stage model with LUBE + GD model, and it reveals that NSGA-II based 2-stage model could yield very closer prediction intervals compared to LUBE + GD method for CPI Food & Beverages. Fig. 7 compares the performance of the two variants of the 3-stage model.
Table 5:
Results for CPI Food & Beverages inflation.
Method PICP PIAW
LUBE Method with GD 0.44 ± 0.17 2.60 ± 1.16 LUBE Method with LSTM [25] 0.42 ± 0.11 2.37 ± 0.30 NSGA-II based 2-stage model (Grid search) 0.91 ± 0.11 2.59 ± 0.27 NSGA-II based 3-stage model (Using single random number) 1.00 ± 0.00 2.57 ± 0.44 NSGA-II based 3-stage model (Using two random numbers) 1.00 ± 0.00
Fig. 6.
PIs for CPI Food & Beverages using the 2-stage model ( left ) and LUBE + GD ( right ) Fig. 7.
PIs for CPI Food & Beverages using 3-stage model (single random no.) ( left ) and 3-stage model (two random no.) ( right ) Table 6 presents the PICP and PIAW values using two-stage model (Chaos + NSGA-II{SMAPE, DS}), and three-stage model (Chaos + NSGA-II{SMAPE, DS} + NSGA-II{PICP, PIAW}). According to the results, the performance of NSGA-II based 2-stage model is improved by 9.4% compared to LUBE + GD in terms of PICP metric & 37% in terms of PIAW metric and when compared with LUBE + LSTM, PICP showed similar performance & PIAW got improved by 34%. The performance of NSGA-II based 3-stage model improved by 7.5% w.r.t PICP metric and 11.12% in terms of PIAW but the added benefits in performance come with an extra computational cost of running NSGA-II second time compared to single run using 2-stage model. Fig. 8 depicts the predicted values of the NSGA-II based 2-stage model with LUBE + GD model, and it reveals that 2-stage model could yield very closer prediction intervals compared to LUBE + GD method for CPI Headline inflation. Fig. 9 compares the performance of the two variants of the 3-stage model.
Table 6:
Results for CPI Headline inflation
Method PICP PIAW
LUBE Method with Gradient Descent 0.85 ± 0.14 3.00 ± 1.18 LUBE Method with LSTM [25] 0.93 ± 0.13 2.54 ± 0.25 NSGA-II based 2-stage model (Grid search) 0.93 ± 0.20 1.89 ± 0.12 NSGA-II based 3-stage model (Using single random number) 1.00 ± 0.00 1.68 ± 0.53 NSGA-II based 3-stage model (Using two random numbers) 1.00 ± 0.00 1.74 ± 0.28 Fig. 8.
PIs for CPI Headline using the 2-stage model ( left ) and LUBE + GD ( right ) Fig. 9.
PIs for CPI Headline using the 3-stage model (single random no.) ( left ) and the 3-stage model (two random no.) ( right ) Table 7 presents the PICP and PIAW values using Two-stage model (Chaos + NSGA-II{SMAPE, DS}), and three-stage model (Chaos + NSGA-II{SMAPE, DS} + NSGA-II{PICP, PIAW}). According to the results, the performance of 2-stage model is improved by 6.4% compared to LUBE + GD in w.r.t PICP metric and 35.4% in terms of PIAW metric and when compared with LUBE + LSTM, PICP showed similar performance & PIAW got improved by 34%. By running the NSGA-II second time with PICP and PIAW as objectives, the performance of the model improved by 20% in terms of PICP and got deteriorated by 6% in terms of PIAW but the performance enhancement comes with an extra computational cost of running NSGA-II second time compared to single run using 2-stage model. Fig. 10 depicts that Chaos + NSGA-II the predicted values of the two-stage model with LUBE + GD model, and it reveals that 2-stage model could yield very closer prediction intervals compared to LUBE + GD method for CPI Fuel & Light. Fig. 11 compares the performance of the two variants of the 3-stage model. Table 7:
Results for the CPI Fuel & Light PICP and PIAW
Method PICP PIAW
LUBE Method with Gradient Descent 0.78 ± 0.09 2.60 ± 0.49 LUBE Method with LSTM [25] 0.82 ± 0.08 2.58 ± 0.51 NSGA-II based 2-stage model (Grid search) 0.83 ± 0.06 1.68 ± 0.26 NSGA-II based 3-stage model (Using single random number) 1.00 ± 0.00 1.78 ± 0.33 NSGA-II based 3-stage model (Using two random numbers) 1.00 ± 0.00 1.82 ± 0.40
Fig. 10.
PIs for CPI Fuel & Light using the 2-stage model (left) and LUBE + GD (right)
Fig. 11.
PIs for CPI Fuel & Light using the 3-stage model (single rand no.) ( left ) and the 3-stage model (two random no.) ( right
It is interesting to note that the NSGA-II based 2-stage and 3-stage models outperformed the LSTM based models across all the three datasets. This is primarily because the problem is now explicitly modeled as a bi-objective optimization problem and the superior exploration and exploitation skills of the NSGA-II also immensely contributed. Now we discuss the results of the empirical attainment function (EAF), which is suggested by Fonseca et al. [49]. He argued rightly that it is a challenging task to analyze Pareto Optimal fronts for all 30 runs of an EMO, which we constructed for all datasets in our study. The EAF plot describes the probabilistic distribution of outcomes generated by the stochastic algorithm in objective space [40, 41]. The EAF plots depicted in Fig. 12 through 16 describe three types of attainment surfaces, namely the best, median and the worst of the 2-stage model and 3-stage models.
Fig.12.
EAF plot for CPI food & beverages using the 2-stage model(left) and the 3-stage model with single random number (right).
Fig.13.
EAF plot for CPI Food & Beverages using the 3-stage model with two random numbers (Left) and CPI Headline using the 2-stage model (right)
Fig.14.
EAF plot for CPI Headline inflation using the 3-stage model with single random number (Left) and the 3-stage model with two random numbers (right) Fig.15.
EAF for CPI Fuel & Light inflation using the 2-stage model (left) and the 3-stage model with single random number (right)
Fig.16.
EAF for CPI Fuel & Light inflation the 3-stage model with two random numbers Table 8:
Hyperparameters used for different CPI data sets.
Method Hyper parameters Food & Beverages Fuel & Light Headline
LUBE with GD soften 20 20 20
7. Conclusions
The paper proposes a novel three-stage and two-stage models namely Chaos + NSGA-II{SMAPE, DS} & Chaos + NSGA-II{SMAPE, DS} + NSGA-II{PICP, PIAW} for generating PIs for macroeconomic time series. The results in terms of PICP and PIAW on test sets indicate that the proposed models outperformed the LUBE + GD & LUBE + LSTM. The proposed models’ intuitive objective functions and low computational costs are the key advantages over LUBE method. The 3-stage models though showed improvement over the 2-stage models in terms of PICP but lag behind in terms of PIAW metric with the extra computational cost of running NSGA-II second time. Overall, the proposed model results are inspiring, and we recommend the applications of these NSGA-II based 2-stage and 3-stage models in related other economic and noneconomic time series data.
References [1] M. Krzywinski, N. Altman, Points of Significance: Error bars, Nat. Methods. 10 (2013) 921–922. https://doi.org/10.1038/nmeth.2659. [2] Y. Gal, Uncertainty in deep learning, Univ. Cambridge. 1 (2016) 3. [3] D.J.C. MacKay, A practical Bayesian framework for backpropagation networks, Neural Comput. 4 (1992) 448–472. [4] Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning, in: Int. Conf. Mach. Learn., 2016: pp. 1050–1059. [5] A. Khosravi, S. Nahavandi, D. Creighton, A.F. Atiya, Lower Upper Bound Estimation Method for Construction of Neural Network-Based Prediction Intervals, IEEE Trans. Neural Networks. 22 (2011) 337–346. https://doi.org/10.1109/TNN.2010.2096824. [6] T. Pearce, M. Zaki, A. Brintrup, A. Neely, High-quality prediction intervals for deep learning: A distribution-free, ensembled approach, Proc. 35th Int. Conf. Mach. Learn. ICML. (2018). [7] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2002) 182–197. https://doi.org/10.1109/4235.996017. [8] G. Papadopoulos, P.J. Edwards, A.F. Murray, Confidence estimation methods for neural networks: A practical comparison, IEEE Trans. Neural Networks. 12 (2001) 1278–1287. https://doi.org/10.1109/72.963764. [9] A. Khosravi, S. Nahavandi, D. Creighton, A.F. Atiya, Comprehensive review of neural network-based prediction intervals and new advances, IEEE Trans. Neural Networks. 22 (2011) 1341–1356. https://doi.org/10.1109/TNN.2011.2162110. [10] I.M. Galván, J.M. Valls, A. Cervantes, R. Aler, Multi-objective evolutionary optimization of prediction intervals for solar energy forecasting with neural networks, Inf. Sci. (Ny). 418 (2017) 363–382. [11] R. Tibshirani, A Comparison of Some Error Estimates for Neural Network Models, Neural Comput. 8 (1996) 152–163. https://doi.org/10.1162/neco.1996.8.1.152. [12] R.A. Dorfman, A note on the delta-method for finding variance formulae., Biometric Bull. 1 (1938) 129–137. [13] D.A. Nix, A.S. Weigend, Estimating the mean and variance of the target probability distribution, in: IEEE Int. Conf. Neural Networks - Conf. Proc., IEEE, 1994: pp. 55–60. https://doi.org/10.1109/icnn.1994.374138. [14] T. Heskes, Practical Confidence and Prediction Intervals, in: M.C. Mozer, M.I. Jordan, T. Petsche (Eds.), Adv. Neural Inf. Process. Syst. 9, MIT Press, 1997: pp. 176–182. http://papers.nips.cc/paper/1306-practical-confidence-and-prediction-intervals.pdf. [15] B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, in: I. Guyon, U. V Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Adv. Neural Inf. Process. Syst. 30, Curran Associates, Inc., 2017: pp. 6402–6413. [16] R. ak, Y.-F. Li, V. Vitelli, E. Zio, Multi-objective Genetic Algorithm Optimization of a Neural Network for Estimating Wind Speed Prediction Intervals, (2013). [17] C. Lian, Z. Zeng, W. Yao, H. Tang, C.L.P. Chen, Landslide Displacement Prediction With Uncertainty Based on Neural Networks With Random Hidden Weights, IEEE Trans. Neural Networks Learn. Syst. 27 (2016) 2683–2695. https://doi.org/10.1109/TNNLS.2015.2512283. [18] J. Wang, K. Fang, W. Pang, J. Sun, Wind power interval prediction based on improved PSO and BP neural network, J. Electr. Eng. Technol. 12 (2017) 989–995. [19] X. Sun, Z. Wang, J. Hu, Prediction interval construction for byproduct gas flow forecasting using optimized twin extreme learning machine, Math. Probl. Eng. 2017 (2017). [20] Y. Shen, X. Wang, J. Chen, Wind power forecasting using multi-objective evolutionary algorithms for wavelet neural network-optimized prediction intervals, Appl. Sci. 8 (2018) 185. [21] C. Wan, Z. Xu, P. Pinson, Z.Y. Dong, K.P. Wong, Optimal Prediction Intervals of Wind Power Generation, IEEE Trans. Power Syst. 29 (2014) 1166–1174. https://doi.org/10.1109/TPWRS.2013.2288100. [22] H. Quan, D. Srinivasan, A. Khosravi, Uncertainty handling using neural network-based prediction intervals for electrical load forecasting, Energy. 73 (2014) 916–925. [23] M.W. U Müller, Measuring uncertainty about long-run predictions, Rev Econ Stud. 83 (2016) 1711–1740. [24] M. Chudý, S. Karmakar, W.B. Wu, Long-term prediction intervals of economic time series, Empir. Econ. 58 (2020) 191–222. https://doi.org/10.1007/s00181-019-01689-2. [25] V. Sarveswararao, V. Ravi, Generating Prediction Intervals for Macroe- conomic variables using LSTM based LUBE Method, in: 2nd Int. Conf. Cybern. Cogn. Mach. Learn. Appl. (ICCCMLA), Goa, India., n.d. [26] B. Pratap, S. Sengupta, Macroeconomic Forecasting in India: Does Machine Learning Hold the Key to Better Forecasts?, RBI Working Paper Series, 2019. [27] E. Nakamura, Inflation forecasting using a neural network, Econ. Lett. 86 (2005) 373–378. https://econpapers.repec.org/RePEc:eee:ecolet:v:86:y:2005:i:3:p:373-378. [28] J.H. STOCK, M.W. WATSON, Why Has U.S. Inflation Become Harder to Forecast?, J. Money, Credit Bank. 39 (2007) 3–33. https://doi.org/10.1111/j.1538-4616.2007.00014.x. [29] M.C. Medeiros, G.F.R. Vasconcelos, Á. Veiga, E. Zilberman, Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods, J. Bus. Econ. Stat. 0 (2019) 1–22. https://doi.org/10.1080/07350015.2019.1637745. [30] V. Ravi, D. Pradeepkumar, K. Deb, Financial time series prediction using hybrids of chaos theory, multi-layer perceptron and multi-objective evolutionary algorithms, Swarm Evol. Comput. 36 (2017) 136–149. https://doi.org/10.1016/j.swevo.2017.05.003. [31] D. Pradeepkumar, V. Ravi, Forex rate prediction using chaos, neural network and particle swarm optimization, in: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Springer Verlag, 2014: pp. 363–375. https://doi.org/10.1007/978-3-319-11897-0_42. [32] D. Pradeepkumar, V. Ravi, FOREX Rate prediction using Chaos and Quantile Regression Random Forest, in: 2016 3rd Int. Conf. Recent Adv. Inf. Technol. RAIT 2016, Institute of Electrical and Electronics Engineers Inc., 2016: pp. 517–522. https://doi.org/10.1109/RAIT.2016.7507954. [33] D. Pradeepkumar, V. Ravi, FOREX rate prediction: A hybrid approach using chaos theory and multivariate adaptive regression splines, in: Adv. Intell. Syst. Comput., Springer Verlag, 2017: pp. 219–227. https://doi.org/10.1007/978-981-10-3153-3_22. [34] O.M. Kumar, K. Ravi, V. Ravi, MapReduce-based fuzzy very fast decision tree for constructing prediction intervals, Int. J. Big Data Intell. 6 (2019) 234–247. [35] V. Ravi, V. Tejasviram, A. Sharma, R.R. Khansama, Prediction Intervals via Support Vector-quantile Regression Random Forest Hybrid, in: Proc. 10th Annu. ACM India Comput. Conf., 2017: pp. 109–113. [36] G.J. Krishna, V. Ravi, Evolutionary computing applied to customer relationship management: A survey, Eng. Appl. Artif. Intell. 56 (2016) 30–59. [37] E.N. Lorenz, Deterministic nonperiodic flow, J. Atmos. Sci. 20 (1963) 130–141. [38] C.T. Dhanya, D. Nagesh Kumar, Nonlinear ensemble prediction of chaotic daily rainfall, Adv. Water Resour. 33 (2010) 327–347. https://doi.org/10.1016/j.advwatres.2010.01.001. [39] N.H. Packard, J.P. Crutchfield, J.D. Farmer, R.S. Shaw, Geometry from a time series, Phys. Rev. Lett. 45 (1980) 712–716. https://doi.org/10.1103/PhysRevLett.45.712. [40] M.T. Rosenstein, J.J. Collins, C.J. De Luca, A practical method for calculating largest Lyapunov exponents from small data sets, Phys. D Nonlinear Phenom. 65 (1993) 117–134. https://doi.org/10.1016/0167-2789(93)90009-P. [41] A. Lyapunov, Problème général de la stabilité du mouvement, in: Ann. La Fac. Des Sci. Toulouse Mathématiques, 1907: pp. 203–474. [42] L. Cao, Practical method for determining the minimum embedding dimension of a scalar time series, Phys. D Nonlinear Phenom. 110 (1997) 43–50. https://doi.org/10.1016/S0167-2789(97)00118-8. [43] C.A.C. Coello, G.B. Lamont, D.A. Van Veldhuizen, others, Evolutionary algorithms for solving multi-objective problems, Springer, 2007. [44] K. Deb, Multi-objective optimization using evolutionary algorithms, John Wiley & Sons, 2001. [45] A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, C.A.C. Coello, A survey of multiobjective evolutionary algorithms for data mining: Part I, IEEE Trans. Evol. Comput. 18 (2013) 4–19. [46] A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, C.A.C. Coello, Survey of multiobjective evolutionary algorithms for data mining: Part II, IEEE Trans. Evol. Comput. 18 (2013) 20–35. [47] A.J. Lawrance, Directionality and Reversibility in Time Series, Int. Stat. Rev. / Rev. Int. Stat. 59 (1991) 67. https://doi.org/10.2307/1403575. [48] R. Mushtaq, Augmented Dickey Fuller Test, SSRN Electron. J. (2012). https://doi.org/10.2139/ssrn.1911068. [49] C.M. Fonseca, A.P. Guerreiro, M. López-Ibáñez, L. Paquete, On the computation of the empirical attainment function, in: Int. Conf. Evol. Multi-Criterion Optim., 2011: pp. 106–120. [50] C.M. Fonseca, V.G. Da Fonseca, L. Paquete, Exploring the performance of stochastic multiobjective optimisers with the second-order attainment function, in: Int. Conf. Evol. Multi-Criterion Optim., 2005: pp. 250–264.[37] E.N. Lorenz, Deterministic nonperiodic flow, J. Atmos. Sci. 20 (1963) 130–141. [38] C.T. Dhanya, D. Nagesh Kumar, Nonlinear ensemble prediction of chaotic daily rainfall, Adv. Water Resour. 33 (2010) 327–347. https://doi.org/10.1016/j.advwatres.2010.01.001. [39] N.H. Packard, J.P. Crutchfield, J.D. Farmer, R.S. Shaw, Geometry from a time series, Phys. Rev. Lett. 45 (1980) 712–716. https://doi.org/10.1103/PhysRevLett.45.712. [40] M.T. Rosenstein, J.J. Collins, C.J. De Luca, A practical method for calculating largest Lyapunov exponents from small data sets, Phys. D Nonlinear Phenom. 65 (1993) 117–134. https://doi.org/10.1016/0167-2789(93)90009-P. [41] A. Lyapunov, Problème général de la stabilité du mouvement, in: Ann. La Fac. Des Sci. Toulouse Mathématiques, 1907: pp. 203–474. [42] L. Cao, Practical method for determining the minimum embedding dimension of a scalar time series, Phys. D Nonlinear Phenom. 110 (1997) 43–50. https://doi.org/10.1016/S0167-2789(97)00118-8. [43] C.A.C. Coello, G.B. Lamont, D.A. Van Veldhuizen, others, Evolutionary algorithms for solving multi-objective problems, Springer, 2007. [44] K. Deb, Multi-objective optimization using evolutionary algorithms, John Wiley & Sons, 2001. [45] A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, C.A.C. Coello, A survey of multiobjective evolutionary algorithms for data mining: Part I, IEEE Trans. Evol. Comput. 18 (2013) 4–19. [46] A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, C.A.C. Coello, Survey of multiobjective evolutionary algorithms for data mining: Part II, IEEE Trans. Evol. Comput. 18 (2013) 20–35. [47] A.J. Lawrance, Directionality and Reversibility in Time Series, Int. Stat. Rev. / Rev. Int. Stat. 59 (1991) 67. https://doi.org/10.2307/1403575. [48] R. Mushtaq, Augmented Dickey Fuller Test, SSRN Electron. J. (2012). https://doi.org/10.2139/ssrn.1911068. [49] C.M. Fonseca, A.P. Guerreiro, M. López-Ibáñez, L. Paquete, On the computation of the empirical attainment function, in: Int. Conf. Evol. Multi-Criterion Optim., 2011: pp. 106–120. [50] C.M. Fonseca, V.G. Da Fonseca, L. Paquete, Exploring the performance of stochastic multiobjective optimisers with the second-order attainment function, in: Int. Conf. Evol. Multi-Criterion Optim., 2005: pp. 250–264.