[PDF] Volatility Forecasting with 1-dimensional CNNs via transfer learning

Abstract

Volatility is a natural risk measure in finance as it quantifies the variation of stock prices. A frequently considered problem in mathematical finance is to forecast different estimates of volatility. What makes it promising to use deep learning methods for the prediction of volatility is the fact, that stock price returns satisfy some common properties, referred to as `stylized facts'. Also, the amount of data used can be high, favoring the application of neural networks. We used 10 years of daily prices for hundreds of frequently traded stocks, and compared different CNN architectures: some networks use only the considered stock, but we tried out a construction which, for training, uses much more series, but not the considered stocks. Essentially, this is an application of transfer learning, and its performance turns out to be much better in terms of prediction error. We also compare our dilated causal CNNs to the classical ARIMA method using an automatic model selection procedure.

Full PDF

VVolatility Forecasting with 1-dimensional CNNsvia transfer learning

Bernadett Aradi ∗ , G´abor Petneh´azi † , and J´ozsef G´all ‡ Department of Applied Mathematics and Probability Theory,University of Debrecen Doctoral School of Mathematical and Computational Sciences,University of Debrecen

Abstract

Volatility is a natural risk measure in ﬁnance as it quantiﬁes the vari-ation of stock prices. A frequently considered problem in mathemati-cal ﬁnance is to forecast diﬀerent estimates of volatility. What makes itpromising to use deep learning methods for the prediction of volatility isthe fact, that stock price returns satisfy some common properties, referredto as ‘stylized facts’. Also, the amount of data used can be high, favoringthe application of neural networks. We used 10 years of daily prices forhundreds of frequently traded stocks, and compared diﬀerent CNN archi-tectures: some networks use only the considered stock, but we tried outa construction which, for training, uses much more series, but not theconsidered stocks. Essentially, this is an application of transfer learning,and its performance turns out to be much better in terms of predictionerror. We also compare our dilated causal CNNs to the classical ARIMAmethod using an automatic model selection procedure.

Keywords: volatility forecasting, dilated causal CNNs, transfer learning

Volatility (the variation of prices) has great importance in ﬁnance as a naturalrisk measure. However, it is not observable, so the notion covers diﬀerent esti-mates of the true price variability.Volatility estimates might be computed from high frequency intraday data [An-dersen et al., 2003], or from the daily range of prices [Chou et al., 2010], but theyare probably most often computed simply as the standard deviation of daily re-turns. There is also a considerable confusion around the concept of volatility[Taleb and Goldstein, 2007]. We chose to examine a pretty standard estimateof stock volatility: the 21-day moving standard deviations of daily logarithmicreturns. ∗ [email protected] † [email protected] ‡ [email protected] a r X i v : . [ q -f i n . S T ] S e p n this study, we aim to build a deep learning based general forecasting modelto predict future values of these volatility estimates. More speciﬁcally, we traina one dimensional convolutional neural network on multiple stocks volatilityhistory, and compare its forecasting performance to that of models that weretrained on a single stocks data only. We show that a general model mightperform comparably or even better, than the stock speciﬁc models, which mayjustify the application of deep learning methods to ﬁnancial forecasting. Stock market returns are often modeled using random walk hypothesis, eventhough it was invalidated empirically by more studies, see, e.g. Lo and MacKin-lay [1988]. However, if we intend to use deep learning methods for forecastingvolatility, it is more important, that these returns have some documented com-mon properties, usually referred to as stylized facts [Cont [2001], Engle andPatton [2001]]. These patterns in the behaviour of the diﬀerent assets’ returnscan come very handy, and they suggest that the variability of asset prices isforecastable.Some of these similarities are the following. Volatilities cluster (persistence),that is, they display positive autocorrelation. Returns and volatilities are neg-atively correlated, so ﬁnancial asset returns are typically more volatile duringrecessions. Also, positive and negative shocks in returns have diﬀerent impacton volatility. In general, trading volume is positively correlated with volatility.Volatility exhibits mean reversion as well, meaning that, in the long run, itcorverges to a normal volatility level. Finally, exogenous variables (e.g., otherassets or deterministic events) can have an impact on volatility. Thus, theremight be general economic factors that inﬂuence the volatilities of individualstocks.These patterns, in accordance with Schwert [1989], stating that there is weakevidence that macroeconomic volatility can help predict stock volatility, im-ply that the price development of diﬀerent stocks might have common drivingforces. Furthermore, knowledge extracted from the price behaviour of some as-sets might be useful to describe that of some other assets.This key idea has already been applied by Sirignano and Cont [2019], how-ever, to forecast only the direction of price movements, using a high frequencydatabase of market quotes and transactions. They found that a universal modeltrained for all stocks outperforms asset-speciﬁc models. The authors claim itis evidence of a universal price formation mechanism. Those previous ﬁndingsencourage us to study if the remarkable generality in stock volatility formationcan help volatility forecasting. That is, we study if the volatility history of mul-tiple stocks can be used in a joint system to predict the future volatility of theindividual securities.

Convolutional neural networks are most often used with images. LeCun et al.[1989] applied them to handwritten digit recognition and thereby launched a rev-olution in image processing. Since then, better and better CNN architectures2ave been proposed to solve diﬃcult image classiﬁcation and object detectionproblems, and now their performance is often comparable to that of humanexperts [Russakovsky et al., 2015]. Due to this undeniable success in computervision, we usually associate CNNs with images.However, they can also be used with other data. We can even let go of therestriction of having two dimensions: 1 or 3 dimensional convolutions can beused in pretty much the same way.Actually, the time delay neural network of Waibel et al. [1989] was the ﬁrst everCNN—a convolutional network applied to the time dimension.CNNs can extract local features, which is a useful property, since variables spa-tially or temporarily nearby are often highly correlated [LeCun et al., 1995].These features are learnt by backpropagation, thus CNNs build a perfectly self-acting feature extractor. They are also very eﬃcient in the number of parame-ters, due to the weight sharing in space or time.So, convolutional neural networks can be applied to time series and to imagesin a similar manner. Extracting local patterns might be just as useful in thetime domain.Bai et al. [2018] claim that while the general belief is that sequence modeling isbest handled by recurrent neural networks, convolutional neural networks mighteven outperform them, considering generic architectures. In the light of this,it is not surprising, that in the past few years several CNN architectures weresuccessfully applied to time series forecasting.Mittelman [2015] used fully convolutional networks to time series modeling, re-placing the usual subsamplings and upsamplings by upsampling the ﬁlter of the l th layer by a factor of 2 l − . Yi et al. [2017] presented structure learning al-gorithms for CNN models, exploiting the covariance structure of multiple timeseries. Bi´nkowski et al. [2017] proposed a CNN architecture to forecast multi-variate asynchronous time series. Borovykh et al. [2017] applied a CNN inspiredby the WaveNet [Oord et al., 2016], using dilated convolutions. Dilations allowan exponential expansion of the receptive ﬁeld, without loss of coverage [Yu andKoltun, 2015].Some of the mentioned studies applied the proposed methods to ﬁnancial data-sets, since they often pose a challenge to traditional time series forecastingalgorithms.In this article, we are going to apply convolutional networks to series of stockreturn volatilities, and use the learned patterns to predict the subsequent valuesof the series.A CNN seems a good choice for learning from multiple time series, since itcan recognize diﬀerent local patterns in the data. It has enough complexity toaccount for various time series phenomena—many more than what might bepresent in a single time series. We may also expect this jointly learnt modelto help avoid overﬁtting—instead of just memorizing the given time series his-tory, the algorithm might learn general time series behavior from a much richersource. We have downloaded 10 years (from the beginning of 2009 to the end of 2018) ofdaily prices for hundreds of frequently traded stocks—constituents of the S&P300 stock market index. The dataset was obtained through the Python moduleof Quandl . Volatilities were estimated as 21-day moving standard deviations ofdaily logarithmic returns. The estimates were annualized by multiplying eachvalue by the square root of 252. After removing stocks with more than 10missing observations, 440 volatility series remained. Each series was split tooverlapping 64-day subseries, which were fed to the algorithm to predict thefollowing, 65th value. The data was standardized by subtracting the total meanand dividing by the total standard deviation of the whole training set. Motivated by the recent successes of CNNs in time series forecasting, we choseto use a dilated causal 1 dimensional convolutional neural network. The inputsto this network are 64-step sequences of the computed volatility estimates, whilethe outputs are the subsequent, 65th value, so that we look one step ahead intothe future. The causal convolution means that the output at one point in timeonly depends on inputs up to that point, and the data is padded, such that theinput and the output have the same length. Dilated convolution (or convolutionwith holes) makes the ﬁlter larger by dilating it with zeros. The dilation rate ofthe l th layer is set to 2 l − , which allows an exponential receptive ﬁeld growth,and enables a relatively shallow network to look into a relatively distant past.We use 6 causal convolutional layers with exponentially increasing dilation rates.Each layer uses 8 ﬁlters with a kernel size of 2, and a relu activation function.It is then followed by a ﬁnal convolutional layer with a kernel size of 1 anda single ﬁlter, so that the output shape matches that of the given time seriessequence. The networks were trained for 300 epochs, using the adadelta [Zeiler,2012] optimizer. We have randomly chosen 10 stocks, and we used two CNN forecasters for each.The ﬁrst (so-called individual) model learns from the volatility history of thegiven stock only. The second (joint) model learns from all stocks’ volatilities,except the chosen 10 stocks. It means that the second model learns from morethan 400 times as much data, however it totally disregards the time series thatwe are forecasting. This method can be considered as a way of using transferlearning : in the training set we have diﬀerent, but very similar time series thanin the test set.We also applied an ARIMA model, in order to extend the comparison to asimpler and more classical time series forecasting method. The models weretrained and tested on separate time periods: the ﬁrst 70% of the availablenearly 10 years long time period was used for training the models, while theremaining 30% was the evaluation set. We produced one-day-ahead forecasts,and compared the models performance in terms of forecast error and directionalaccuracy. Results

The results of the forecast comparisons are available in Table 1. The metricswere averaged over the 10 stocks under study.RMSE (root mean squared error) and SMAPE (symmetric mean absolute per-centage error) are the reported regression metrics, while directional accuracyand F1 score describe the forecasts ability to get the directions right. We usedboth RMSE and SMAPE in order to show absolute and relative error measuresas well. Accuracy and F1 score are commonly used metrics for binary classiﬁ-cation.The single-stock CNNs poor performance probably stems from the limited datavolume. Neural networks excel when theres a huge train set, and they mightstruggle with such data scarcity. Their unexploited complexity does more harmthan good. For this reason, we have also compared our joint convolutional neu-ral network to simple ARIMA models ﬁtted to the individual volatility series.Following Hyndman et al. [2007], we used successive KPSS tests [Kwiatkowskiet al., 1992] to choose the order of diﬀerencing. Then we used grid search toﬁnd the proper number of autoregressive and moving average terms between 0and 3. We have chosen the best model based on AIC [Akaike, 1974].Our CNN trained on multiple stocks outperformed ARIMA forecasts of the in-dividual stock volatilities according to all metrics, even though the ARIMA pa-rameters were chosen using a systematic procedure, while the CNN parameterswere chosen rather arbitrarily. The convolutional neural networks performancecould probably have been further optimized by using grid search to ﬁnd optimalhyperparameters.The joint CNN model outperformed the single models in terms of forecast er-ror and directional accuracy as well. Figure 1 displays the average distanceof forecasted and true values in terms of RMSE. Figure 2 shows directionalaccuracies. CNN Individual ARIMA CNN JointValue ForecastsRMSE 0.0283 0.0261 0.0154SMAPE 9.6324 4.9468 3.9358Direction ForecastsAccuracy 0.5333 0.5161 0.6262F1 0.5379 0.4182 0.6835Table 1: Evaluation metrics averaged over stocks

We trained a one dimensional convolutional neural network on multiple stocksvolatility rate history, and compared its forecasting performance to benchmarkmodels trained on single series. We found that the deep learning method couldtake advantage of the multiplied data volume and produce better results, consid-ering either value or direction forecasts and diﬀerent measures of the goodnessof the predictions. It suggests that the generality of stock prices allows a dataexpansion that might enable deep learning methods to outperform traditional5

NN Individual ARIMA CNN JointFISBBTREGNTMOETRADEKMBBACIP 0.0296 0.0238 0.01210.0333 0.0196 0.01790.0237 0.0118 0.01410.0391 0.0387 0.02160.0143 0.0232 0.00930.0395 0.0617 0.01820.0292 0.0139 0.01550.0129 0.0151 0.00920.0387 0.0334 0.02350.0224 0.0200 0.0128 0.010.020.030.040.050.06

Figure 1: RMSE Scores

CNN Individual ARIMA CNN JointFISBBTREGNTMOETRADEKMBBACIP 0.5015 0.4844 0.63990.5476 0.5738 0.61460.5342 0.5067 0.63840.5000 0.5306 0.63990.5179 0.5231 0.60120.5119 0.5127 0.64730.5342 0.5156 0.59670.5774 0.5335 0.66370.5491 0.4665 0.61460.5595 0.5142 0.6057 0.4750.5000.5250.5500.5750.6000.6250.650

Figure 2: Accuracy Scores6ime series models in (short-term) ﬁnancial forecasting.These ﬁndings open up research opportunities regarding the ﬁnancial applica-tion of deep learning methods.It should be explored if the results apply to diﬀerent markets and to diﬀerentforecasting horizons. For example, it would be worth examining if our jointlylearned models can help forecasting volatilities of less frequently traded stocks.Or if it works with diﬀerent data frequencies. We still used very small data—afew hundred stocks with daily price observations. Using intraday stock marketdata would seem more encouraging.Also, further research should study, if similar joint machine learning modelscould be applied to time series with higher diversity. Volatilities express a highdegree of similarity, which justiﬁes the models performance. However, learninggeneral time series patterns might have a much broader scope.

References

Hirotugu Akaike. A new look at the statistical model identiﬁcation. In

SelectedPapers of Hirotugu Akaike , pages 215–222. Springer, 1974.Torben G Andersen, Tim Bollerslev, Francis X Diebold, and Paul Labys. Mod-eling and forecasting realized volatility.

Econometrica , 71(2):579–625, 2003.Shaojie Bai, J Zico Kolter, and Vladlen Koltun. Convolutional sequence mod-eling revisited. 2018.Miko(cid:32)laj Bi´nkowski, Gautier Marti, and Philippe Donnat. Autoregressive con-volutional neural networks for asynchronous time series. arXiv preprintarXiv:1703.04122 , 2017.Anastasia Borovykh, Sander Bohte, and Cornelis W Oosterlee. Conditionaltime series forecasting with convolutional neural networks. arXiv preprintarXiv:1703.04691 , 2017.Ray Yeutien Chou, Hengchih Chou, and Nathan Liu. Range volatility modelsand their applications in ﬁnance. In

Handbook of quantitative ﬁnance and riskmanagement , pages 1273–1281. Springer, 2010.Rama Cont. Empirical properties of asset returns: stylized facts and statisticalissues. 2001.Robert F Engle and Andrew J Patton. What good is a volatility model?

QUAN-TITATIVE FINANCE , 1:237–245, 2001.Rob J Hyndman, Yeasmin Khandakar, et al.

Automatic time series for fore-casting: the forecast package for R . Number 6/07. Monash University, De-partment of Econometrics and Business Statistics , 2007.Denis Kwiatkowski, Peter CB Phillips, Peter Schmidt, and Yongcheol Shin.Testing the null hypothesis of stationarity against the alternative of a unitroot: How sure are we that economic time series have a unit root?

Journalof econometrics , 54(1-3):159–178, 1992.7ann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard EHoward, Wayne Hubbard, and Lawrence D Jackel. Backpropagation appliedto handwritten zip code recognition.

Neural computation , 1(4):541–551, 1989.Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech,and time series.

The handbook of brain theory and neural networks , 3361(10):1995, 1995.Andrew W Lo and A Craig MacKinlay. Stock market prices do not follow ran-dom walks: Evidence from a simple speciﬁcation test.

The review of ﬁnancialstudies , 1(1):41–66, 1988.Roni Mittelman. Time-series modeling with undecimated fully convolutionalneural networks. arXiv preprint arXiv:1508.00317 , 2015.Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, OriolVinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and KorayKavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprintarXiv:1609.03499 , 2016.Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, SeanMa, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein,et al. Imagenet large scale visual recognition challenge.

International journalof computer vision , 115(3):211–252, 2015.G William Schwert. Why does stock market volatility change over time?

Thejournal of ﬁnance , 44(5):1115–1153, 1989.Justin Sirignano and Rama Cont. Universal features of price formation in ﬁnan-cial markets: perspectives from deep learning.

Quantitative Finance , pages1–11, 2019.Nassim Nicholas Taleb and DG Goldstein. We dont quite know what we aretalking about when we talk about volatility.

Journal of Portfolio Management ,33(4), 2007.A Waibel, T Hanazawa, G Hinton, K Shikano, and KJ Lang. Phoneme recog-nition using time-delay neural networks.

IEEE Transactions on Acoustics,Speech, and Signal Processing , 37(3):328–339, 1989.Subin Yi, Janghoon Ju, Man-Ki Yoon, and Jaesik Choi. Grouped convolutionalneural networks for multivariate time series. arXiv preprint arXiv:1703.09938 ,2017.Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated con-volutions. arXiv preprint arXiv:1511.07122 , 2015.Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprintarXiv:1212.5701arXiv preprintarXiv:1212.5701