[PDF] Combination of window-sliding and prediction range method based on LSTM model for predicting cryptocurrency

Abstract

Full PDF

Combination of window-sliding and prediction range method based on LSTM model for predicting cryptocurrency

Paraphrase: Some of the ideas comes from my master’s dissertation in University of Southampton, which might be with some similarity in Turnitin. I have contacted my supervisor, he agreed to use this idea as it’s not a formal publishment. 一作：

Yifan Yao 二作：

Lina Wang bstract:

The present study aims to establish the model of the cryptocurrency price trend based on financial theory using the LSTM model with multiple combinations between the window length and the predicting horizons, the random walk model is also applied with different parameter settings. The object of this dissertation is the cryptocurrency, primarily the Bitcoin and Ethereum, of which the prices exhibit high volatility. Quantitative analysis is adopted as the method of this dissertation. The research tool is python programming language. Tensorflow package is employed to model and analyze research topics. The results of this study show the limitations of the LSTM and Random walk model for price prediction while demonstrating the different characteristics of both models with different parameter settings, providing a balance between the model’s accuracy and the model’s practicality. Keywords––cryptocurrency, long short-term memory neural network( LSTM), window-sliding, random walk model

In the literature of contemporary cryptocurrency analysis, the topic of general dynamics for digital currencies is a popularity. [1]

The transaction amount of cryptocurrencies has skyrocketed in 2017 with the super exponential growth in the capital market. [2]

However, the ovement of cryptocurrencies exhibits the high volatility which adds more uncertainty in the transaction market.

Most articles on cryptocurrency and machine learning focus on the problems of model prediction, [3][4] but many of them ignore the mathematical principles behind the model and with the regardless of the relationship between accuracy and parameter settings. This leads to some seemingly accurate models that are not generally practical. This article will explore the relationship between mathematical principles and model accuracy and discuss the essence through phenomena. Considering that there are two theories in the financial market, one is that the stock price is predictable [6], and the other is that the stock price is completely unpredictable [5] , which indicates the price is a random walk, so the machine learning model described below (e.g., LSTM and RNN) will verify the predictable hypotheses, and random walk theory is also applied in this article, which will be researched based on the previous study. [7] [8] [9]

The article will explore the model’s performance based on these two algorithms. The Long-Short-Term memory (LSTM) and recurrent neural network (RNN) are frequently applied in this field, which are preferred over the conventional multilayer perceptron. {10}

Sean McNally, compared the RNN and the LSTM model used on bitcoin [11] , for the RNN implementation part, the author first taken he temporal length window by the autocorrelation function . In the LSTM part, the previous research [12] has illustrated that compared with the RNN, the LSTM outperforms RNN and ARIMA at learning long term dependencies. The ARIMA(

Autoregressive Integrated Moving Average ) model is a type of time series model which is often used in the price prediction [13][14] .A model comparison table is as below [12] : The results of different model performance

Model Results Model Temporal _Length Sensitivity Specificity Precision Accuracy RMSE LSTM 100 37% 61.30% 35.50% 52.78% 6.87% RNN 20 40.40% 56.65% 39.08% 50.25% 5.45% ARIMA 170 14.7 1 1 50.05% 53.74%

Table [1] he table shows that the precision and accuracy do not have significant difference between the two models, both LSTM and RNN models are capable on the training data with LSTM is more applicable to the long-term dependencies. The Long-Short-Term memory (LSTM) and recurrent neural network (RNN) are frequently applied in this field. [10]

As for the multiple window length settings, (15) ， it is applied different window sizes based on the LSTM model to capture better features of the equipment , which concluded that various time window sizes have the positive impacts for recognizing various temporal dependencies among features, while (16) , used 10 combinations of sliding windows with prediction ranges to fully explore the accuracy improvement possibility for deep learning and concluded that if the window length is small and the prediction range is far ahead simultaneously, the RMSE will become lower than the basic method. To sum up, In accordance with volatility, a nonlinear model should be applied to this topic. Many scholars have compared the RNN and the LSTM. According to the results, the LSTM model outperforms RNN since it is more suitable for long-dependencies. Significantly, the window sliding method with the different prediction range variable should be applied in this article. Furthermore, the theory of andom walks in cryptocurrency prices is also experimented with respect to the predictable hypothesis of price. RNN represents the recurrent neural network, time is a significant impact factor for RNN. [17]

The output comes out with each moment’s input combined with the state of the current model. In the Figure [1], the output ℎ ! comes out with both the input 𝑥 ! and the hidden state from moment t-1, which is provided by the looped edge. Theoretically, the recurrent neural network can be capable of sequences of arbitrary length. However, in practice, the problem of gradient dissipation or explosion will happen during the optimization for the too long sequence. Furthermore, the dissipation of the gradient will make the weight of previous layer not updated during the forward propagation, on the contract, the gradient explosion will make training process unstable, thus, the model cannot obtain the optimal parameters. Figure[1] The structure of RNN

Given the 3 moments RNN unit, Figure. [2], assuming that the left input 𝑆 " is a given value and no activation function exists in the neuron. Subsequently, the forward process is expressed as: 𝑆 ! = 𝑊 " 𝑋 ! + 𝑊 𝑆 $ + 𝑏 ! 𝑂 ! = 𝑊 $ 𝑆 ! + 𝑏 % 𝑆 % = 𝑊 " 𝑋 % + 𝑊 𝑆 ! + 𝑏 ! 𝑂 % = 𝑊 $ 𝑆 % + 𝑏 % 𝑆 & = 𝑊 " 𝑋 & + 𝑊 𝑆 % + 𝑏 ! 𝑂 & = 𝑊 $ 𝑆 & + 𝑏 % At the time of t =

3, the loss function can be written as 𝐿 = 12 (𝑌 − 𝑂 ) $ RNN training is virtually to seek partial derivatives of 𝑊 " , 𝑊 % , 𝑊 & , 𝑏 ’ , 𝑏 $ , adjusting them in order to obtain the minimum of 𝐿 . According to the chain rule ： 𝛿𝐿 & 𝛿𝑊 $ = 𝛿𝐿 & 𝛿𝑂 & 𝛿𝑂 & 𝛿𝑊 $ 𝛿𝐿 & 𝛿𝑊 " = 𝛿𝐿 & 𝛿𝑂 & 𝛿𝑂 & 𝛿𝑆 & 𝛿𝑆 & 𝛿𝑊 " + 𝛿𝐿 & 𝛿𝑂 & 𝛿𝑂 & 𝛿𝑆 & 𝛿𝑆 & 𝛿𝑆 % 𝛿𝑆 % 𝛿𝑊 " + 𝛿𝐿 & 𝛿𝑂 & 𝛿𝑂 & 𝛿𝑆 & 𝛿𝑆 & 𝛿𝑆 % 𝛿𝑆 % 𝛿𝑆 ! 𝛿𝑆 ! 𝛿𝑊 " 𝛿𝐿 & 𝛿𝑊 " = 𝛿𝐿 & 𝛿𝑂 & 𝛿𝑂 & 𝛿𝑆 & 𝛿𝑆 & 𝛿𝑊 + 𝛿𝐿 & 𝛿𝑂 & 𝛿𝑂 & 𝛿𝑆 & 𝛿𝑆 & 𝛿𝑆 % 𝛿𝑆 % 𝛿𝑊 + 𝛿𝐿 & 𝛿𝑂 & 𝛿𝑂 & 𝛿𝑆 & 𝛿𝑆 & 𝛿𝑆 % 𝛿𝑆 % 𝛿𝑆 ! 𝛿𝑆 ! 𝛿𝑊 Briefed as: ) ! (* " = ∑ () (+ (+ (, !-." ( ∏ (, $ (, $%& !/.-0’ ) (, ’ (* " () ! (* ( = ∑ () (+ (+ (, !-." ( ∏ (, $ (, $%& !/.-0’ ) (, ’ (* ( This formula suggests that the ∏ ’( ! ’( !" )*+,-! part causes the gradient dissipation or explosion. With the activation function added, it is expressed as: 𝑆 * = 𝑡𝑎𝑛ℎ(𝑊 " 𝑋 * + 𝑊 " 𝑆 */! + 𝑏 ! ) concluded that: * 𝛿𝑆 */!)*+,-! = 2 𝑡𝑎𝑛ℎ ′ 𝑊 Where tanh derivative is always below 1. With the increase in 𝑡 , the above formula’s value turns closer to zero as long as 𝑊 & is above 0 and below 1 as well, leading to the disappearance of the gradient. Subsequently, the above formula will become more and more infinite if 𝑊 & is large, thus producing a gradient explosion, which explains why the LSTM is introduced. Figure[2] The inner structure of RNN

Mathematical Explanation of LSTM Model ： LSTM represents the Long-Short-Term memory, an RNN type. The 𝐶 ! is called current cell state which can be expressed as: 𝑐 ! = 𝑓 ! ⨂𝑐 !1’ + 𝑖 ! ⨂𝑡𝑎𝑛ℎ(𝑊 [ℎ !1’ , 𝑥 ! ] + 𝑏 ) 𝑓 ! is called the forget gate, which can be expressed as: 𝑓 ! = 𝜎(𝑊 [ℎ !1’ , 𝑥 ! ] + 𝑏 ) deciding which features can be employed for the calculation of 𝐶 ! from 𝐶 !1’ .The current hidden output can be expressed as: ℎ ! = 𝑜 ! ⨂𝑡𝑎𝑛ℎ(𝑐 ! ) Besides, the input and output gates are expressed respectively as: 𝑖 ! = 𝜎(𝑊 [ℎ !1’ , 𝑥 ! ] + 𝑏 ) 𝑜 ! = 𝜎(𝑊 " [ℎ !1’ , 𝑥 ! ] + 𝑏 ) The above formulas show the activation function of 3 gates is sigmoid, revealing that the output of these three gates is either close to 0 or close to 1. This makes (2 (2 = 𝑓 ! , (7 (7 = 𝑜 ! part is 0 or 1. When it is 1, the gradient can be transmitted well in the LSTM, significantly reducing the probability of the gradient dissipation. When the gate is 0, the information at the previous moment does not mpacts the current moment indicating that there is not required to transmit the gradient goes back to update the parameters. [18] Accordingly, this explains the reason why the gradient can be solved using the LSTM model. See in Figure[3]

Figure[3] The structure of LSTM ： For the time series { 𝑥 ! }, if it satisfies 𝑥 ! = 𝑥 !1’ + 𝑤 ! , where 𝑤 ! denotes a white noise with a mean of 0 and a variance of 𝜎 $ , the sequence { 𝑥 ! } will be a random walk. [19] By definition, the 𝑡 at any 𝑥 ! moment refers to the sum of all historical white noise sequences that do not exceed the 𝑡 moment, so it is concluded that: 𝑥 ! = 𝑤 ! + 𝑤 !1’ + 𝑤 !1$ + ⋯ + 𝑤 " The sequence mean and variance of random walk are presented as follows: 𝜇 % = 𝑣𝑎𝑟(𝑥 ! ) = 𝑣𝑎𝑟(𝑤 ! ) + 𝑣𝑎𝑟(𝑤 !1’ ) + ⋯ 𝑣𝑎𝑟(𝑤 " ) t × 𝑣𝑎𝑟(𝑤 ! ) = t 𝜎 $ Though the mean does not change with time 𝑡 , due to the variance is the function that relates with the 𝑡 , the random walk does not satisfy the stability. As time 𝑡 and the variance of 𝑥 ! are regulated, the stability is up-regulated. For the given interval 𝑘 , the random walk covariance is performed as: Cov ( 𝑥 ! , 𝑥 !0- ) = Cov( 𝑥 ! , 𝑥 ! + 𝑤 !0’ + . . . +𝑤 - ) = Cov( 𝑥 ! , 𝑥 ! + ∑ 𝐶𝑜𝑣(𝑥 ! , 𝑤 ) -5 . !0’ = Cov( 𝑥 ! , 𝑥 ! ) + 0 = t 𝜎 $ From the concluded variance and covariance, the autocorrelation function 𝜌 - (t) is calculated as below: 𝜌 - (t) = , % )=>?@(% )=>?@(% ) = AB + =AB + =(A0C)B + = ’=’0C/A learly, the autocorrelation function is related with the time 𝑡 and the interval 𝑘 , indicating that if the random walk model has a long time series while the interval is quite small, the autocorrelation coefficient is approximated as 1. In other words, if there is a model predicting the stock price based on time t as the forecast for the 𝑡 +1 value, the correlation coefficient between the actual value and the predicted value equals to the stock price sequence of 𝑘 = 1 . In other words, the forecast of today's price as tomorrow's price is also very close to 1,which will mislead us that the model is accurate. ． Data Collection

The data are all collected from the CoinMarketCap [20] , which is authoritative website committed to cryptocurrency market value statistics. Only the Bitcoin and Ethereum data are adopted to train the LSTM model and the random walk model. The data column of both cryptocurrencies covering the open, high, low, close value daily, the transaction volume and the market capitalization. The raw ranges from April 2017 to December 2020 for nearly 3 years span. The training size parameter is 0.8, while the test size reaches 0.2. .1 Training Process of Random Walk Model

From the preliminaries illustrated below, the random walk model will learn the parameter 𝜎 , which is the only parameter of the random walk. The Figure[4] shows the model performance : Figure[4]

Singe point random walk model performace

Based on the preliminaries, the single point random walk model seems performing well, which is in accordance with expectation. The model just predicts the next day, so 𝑘 = 1 . Besides, the time span is 3 years, suggesting the 𝑡 is very large, so 𝜌 - = 1, implying that the forecast of next day is just the repeat of the current day, and due to the single point method selection, the error will reset every time which means every next input will all be the true data. Figure. [5][6] suggests that the prediction line is similar to the copy in the orizontal direction. The model seems accurate is attributed to the mathematical nature of random walk rather than the training process. Here, the model trained by the data in 2017 shows the details of the copy in the horizontal direction. Figure.[5] The details of single point prediction on Bitcoin Figure.[6] The details of single point prediction on Ethereum As mentioned below, if the model intends to ignore the misleading accuracy caused by the nature of random walks, increasing the value f 𝑘 can solve this problem. That is to say, the interval of the random walk step will be larger instead of +1 days.

Therefore, a multi - point prediction method is proposed. In such way, the error cannot be reset, which will be exacerbated by subsequent predictions. The training result can be seen below: Figure[7]

Figure[7] Full interval random walk performance

Obviously, changing the value of 𝑘 will cause a significant reduction in the model accuracy, 𝜌 - (𝑡) will not approach to 1 with the increase of 𝑘 .That is to say, the result of the model is not associated with the nature of the Random Walk model. What is more, because the errors will be compounded by subsequent predictions, the predicting line is penalized seriously. What need to be noticed is that the Random Walk model is defined as 𝑥 ! =𝑥 !1’ + 𝑤 ! .That is, the price of the day is randomly changed based on the price of the previous day while the price difference is all ncluded in the random item 𝑤 ! . It can be seen from the above random walk model that the time series of the securities price will be in a random state and will not exhibit a certain observable or statistically determined trend. Compared with the machine learning model, the random walk model only explores the random item 𝑤 ! , it does not learn from the inputs and learn any parameters or weights of the model. That is why whether the single-point model or the multi-point model are both not the ideal solution for predicting the trend of cryptocurrency. The LSTM created is a 2-dimensional model using only the close price and the transaction volume features, considering the price of changes daily is an immense difference every period as the Figure. [8] shows below, which means the model will not converge, so the normalizing operation might be required. [21]

Figure[8] Daily price changes on Bitcoin and Ethereum

For the training data, to normalize the price changes, the following equation [1] is used, 𝑝 represents the current window price while the 𝑝 " is the next window price. So the input and output will be a percentage format. For the test data, the output will be denormalized as a direct real price of prediction is expected to visualize, for the denormalization, the equation [2] will be used. 𝑛 = ( E , E - − 1) Equation[1] 𝑝 = 𝑝 " ( 𝑛 + 1 ) Equation[2]

Here the model uses MAE （ Mean-Absolute-Error ） equation to validate the error between the predicted value and the true value, which is the average of absolute errors that can better reflect the actual situation of the prediction value error. MAE = ’F ∑ |(𝑦 − 𝑦L | fter the selection of parameters, the training dataset is used to train the model. The merge date starts from the 2017 to 2020 and the split size is 0.8, so the training dataset is mainly from 05-2017 to 10-2019. The Table[2] shows the Bitcoin training process of the model, it is obviously that from the epoch 18, the model started to converge as it lastly nearly stays at the MAE = 0.0330. And the Figure[13] shows the LSTM training process of the Bitcoin. LSTM single point prediction training process epoch 18/20 19/20 20/20 Step-loss 0.031 0.029 0.031 mean_absolute_error 0.031 0.029 0.031 val_loss 0.023 0.024 0.023 val_mean_absolute_error 0.023 0.024 0.023 Table[2]

After the convergence of the model, it is applied to the test dataset, which ranges from 11-2019 to 12-2020. The performance of the model on the Bitcoin test dataset is showed in the Figure[9]. As in the Figure. [12], both training set and the test set are all stop to decrease at epoch 20, after the epoch 20, the training set error will till go decrease, but the error on the test set will start to increase due to the model overfit problem. The Figure[10][11]shows the performance of the model on the Ethereum. Figure.[9] The performance of LSTM with single point prediction on Bitcoin

Figure[10] he performance of LSTM with single point prediction on Ethereum

Figure.[11] The training error on Ethereum

Figure[12] The model loss on Bitcoin he model in this part used the point to point method. The point to point prediction is the process of making the model predict one single point value each time and plot the corresponding position in the figure, after predicting this point the window will slide to next point with the complete test data. Besides, the point to point method seems to be more accurate than the full interval prediction, whereas it does not imply that the point to point model outperforms the full interval model, since the error generated by each single prediction is reset each time, the neural network itself does not need to know the time series itself, all the inputs are based on the real value in every next prediction. For the ignorance of the errors, the model seems unsurprisingly accurate. Furthermore, in the Fig. [15], it is suggested that the predicted value is more like a horizontal translation of the true value. For instance, from the mid-May to mid-June 2019, several prices increased, and the peaks were following the fluctuations of the true values, which has an obvious hysteresis. In other words, the deep learning LSTM model regenerates an autoregressive model of order 𝑝 , in these datasets area, the predicted value is the weighted sum of the previous 𝑝 values, as define below: 𝑃𝑟𝑒𝑑𝑃𝑟𝑖𝑐𝑒 = 𝑤 " + 𝑤 ’ ∗ 𝑃𝑟𝑖𝑐𝑒 !1’ + ⋯ 𝑤 E ∗ 𝑃𝑟𝑖𝑐𝑒 !1E + 𝜖 ! , 𝜖 ! ~𝑁(0, 𝜎) here the next prediction will only be the true 𝑃𝑟𝑖𝑐𝑒 !1E value with the calculate weight because the point to point method will ignore the error of every previous prediction, which largely reduces the inaccuracy. Therefore, in order to maximize the advantages of LSTM based on time series and avoid the model updating the error at every step, the model will be improved from the following two indicators, the first is window length, and the second is the prediction range.

Window length is the historical time used by LSTM, and the prediction range refers to the range of backward prediction by the data trained in window length. Figure[13]

Figure[13] The explanation of window length and prediction range

Unlike the limitation of point to point training method, the multiple timepoint method is more practical. Likewise, it initializes the test window and keeps moving to predict next point. Besides, it will ove forward a full window size as well as resets the window with true test data while it moves to the 𝑋 ! point where the input window is already constituted by full past 𝑋 !1’ predictions. Thus, during the prediction, the error will not be fully reset, whereas the error will be accumulated in each full predicted window length, and the error will be reset again in a new window length. For this reason, the model will be more practical. It is neither as deceptive as single-point prediction, nor does it completely detour the model from the trajectory of the real point. Figure.[14]. Figure.[15] The model’s performance with window length=10 prediction range = 5 The model’s performance with window length=10 prediction range = 10

Figure[16]

The model’s performance with window length=10 prediction range = 15

Figure[17]

The box-plot of different prediction range with Bitcoin MAE

Figure.[18]

The box-plot of different prediction range with Ethereum MAE

The Figure. [14][15][16] show that the multiple sequence LSTM does not perform well as expected. Besides, the red line in the figure is the prediction range.

In the training process, the prediction range is set respectively at [5,10,15] while the window length is set at 10.

The prediction of the model in each range does not reflect the price of the next trend, and the model seems to only predict the upward trend of the trend, while the price decline trend model does not seem to be aware. This may due to the selection of parameters or the selection of the length of the window, which reduces the model accuracy. In addition, the figure[17][18] point out that when the window length is fixed, as the number of prediction points increases, the MAE increases accordingly. Which indicates that in the condition of the same window length selection, the model will be more accurate with the less points amount. .2.3 Fixed Multi-Point Method Prediction with different window length selection

The previous part verifies the impact of different amount points selection on the model when the window length is fixed. This part will verify the impact of different window length on the accuracy of the model when the point amount is fixed. Similarly, the red line in the figure [19][20][21] refer to the different window length settings. In the training process, the window length is set respectively at [10,90,100] while the prediction range is set at 5. The figure[19] shows that at the condition of window_len=10, the model prediction trend performs similarly as the previous part, which s eems to only predict the upward trend of the trend with the regardless of the decrease trend, while when the window_len = 50, the model could reflect the correct decrease trend generally and when the window_len = 90, the model could reflect all the trend but is not basically right, especially during the May 2019 -August 2019, the decrease trend of Bitcoin prediction is totally wrong. From the figure[22][23], it could be concluded that with the fixed prediction range, the model accuracy decreases with the increase of the window length, which is caused by the accumulation of errors in the model.

Figure.[19]

The model’s performance with window length=10 prediction range = 5

Figure.[20]

The model’s performance with window length=50 prediction range = 5

Figure.[21]

The model’s performance with window length=90 prediction range = 5

Figure.[22] Figure.[23]

The box-plot of different window length with Bitcoin MAE. The box-plot of different window length with Ethereum MAE onclusion

According to the Table [3][4], after using different combinations of window length and prediction range it is found that when window length = 10 predict range = 5, the Bitcoin and Ethereum LSTM models reach the smallest errors, which are 0.037 and 0.113 respectively. Besides, it can be seen that although the single-point method has the smallest error, it is the result of the error being reset every time. However, in the real financial price market, only predicting the price trend of the next day is impractical. Thus, predicting the price over a period of time with a proper error reset frequency is more practical, that is, to have a certain prediction range. According to the Table [5], it shows the relationship between the interval of days and the accuracy. It can be seen that the error is not as large as expected. Therefore, it can be concluded that with a certain model accuracy guaranteed, the model has the best balance of practicality and accuracy based on the combination of window_length=10 and prediction range=5. In summary, it can be concluded from the study of Random Walk model and LSTM model that it is not appropriate to only focus on the model accuracy, considering the parameter setting and mathematical meaning as well as practicality also matters. Therefore, the balance between model practicality and accuracy is particularly important. Limitation and future work The range of window length settings is relatively large, the future research can be carried out within 10. Besides, the research objects are limited to Bitcoin and Ethereum, and more cryptocurrency can be introduced for experimental modeling. The results of fixed window length with different prediction range indow _Length = 10 Prediction_Range

LSTM_MAE_Bitcoin

LSTM_MAE_Ethereum

Table [3]

The results of fixed prediction range with different window length

Prediction_Range = 5 Window_Length

10 50 90

LSTM_MAE_Bitcoin

LSTM_MAE_Ethereum

Table[4] The error subtraction based on single day window_length=10 Days_Interval (based on single day)

4 9 14

Error_Subtraction_Bitcoin

Error_Subtraction_Ethereum able[5] Reference： [1] Alessandretti, Laura, et al. "Machine learning the cryptocurrency market."

Available at SSRN 3183792 (2018). [2] ElBahrawy, Abeer, et al. "Evolutionary dynamics of the cryptocurrency market."

Royal Society open science

Neurocomputing

Journal of Financial Research

Journal of marketing

Economics Letters

148 (2016): 80-82. [9] Shah, Devavrat, and Kang Zhang. "Bayesian regression and Bitcoin." . IEEE, 2014. [10]Elman, Jeffrey L. "Finding structure in time."

Cognitive science . IEEE, 2018. [12] Brown, Stephen J., William N. Goetzmann, and Alok Kumar. "The Dow theory: William Peter Hamilton's track record reconsidered."

The Journal of finance

Energy policy

Advances in Neural Information Processing Systems . 2013 [15] Xia, Tangbin, et al. "An ensemble framework based on convolutional bi-directional LSTM with multiple time windows for remaining useful life estimation."

Computers in Industry

115 (2020): 103182. [16]Liu, Yangdong, et al. "Short-term travel time prediction by deep learning: A comparison of different LSTM-DNN models." . IEEE, 2017. [17]Keren, Gil, and Björn Schuller. "Convolutional RNN: an enhanced model for extracting features from sequential data." . IEEE, 2016. [18]Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. "Learning to forget: Continual prediction with LSTM." (1999): 850-855. [19] Barro, Robert J., and Xavier Sala-i-Martin. "Convergence."

Journal of political Economy

Mastering Bitcoin: unlocking digital cryptocurrencies . " O'Reilly Media, Inc.", 2014. [21] Haferkorn, Martin, and Josué Manuel Quintana Diaz. "Seasonality and interconnectivity within cryptocurrencies-an analysis on the basis of bitcoin, litecoin and namecoin."

International Workshop on Enterprise Applications and Services in the Finance Industry . Springer, Cham, 2014.. Springer, Cham, 2014.