COVID19-HPSMP: COVID-19 Adopted Hybrid and Parallel Deep Information Fusion Framework for Stock Price Movement Prediction
Farnoush Ronaghi, Mohammad Salimibeni, Farnoosh Naderkhani, Arash Mohammadi
CCOVID19-HPSMP: COVID-19 Adopted Hybrid and Parallel DeepInformation Fusion Framework for Stock Price Movement Prediction
Farnoush Ronaghi † , Mohammad Salimibeni † , Farnoosh Naderkhani † , and Arash Mohammadi † † Concordia Institute for Information Systems Engineering, Concordia University,Emails:’ { f ronagh, m alimib } @encs.concordia.ca; { farnoosh.naderkhani, arash.mohammadi } @concordia.caCoresponding Author: Arash Moohammadi; Email: [email protected]; Tel: (+1) 514-848-2424 Ext.2712; Address: 1455 De Maisonneuve Blv. W., EV-009.187, Montreal, QC, Canada, H3G-1M8. Abstract
The novel of coronavirus (COVID-19) has suddenly and abruptly changed the world as we knewat the start of the 3 rd decade of the 21 st century. Particularly, COVID-19 pandemic has negativelyaffected financial econometrics and stock markets across the globe. Artificial Intelligence (AI) andMachine Learning (ML)-based prediction models, especially Deep Neural Network (DNN) archi-tectures, have the potential to act as a key enabling factor to reduce the adverse effects of theCOVID-19 pandemic and future possible ones on financial markets. In this regard, first, a uniqueCOVID-19 related PRIce MOvement prediction (COVID19 PRIMO) dataset is introduced in thispaper, which incorporates effects of social media trends related to COVID-19 on stock market pricemovements. Afterwards, a novel hybrid and parallel DNN-based framework is proposed that inte-grates different and diversified learning architectures. Referred to as the COVID-19 adopted Hybridand Parallel deep fusion framework for Stock price Movement Prediction (COVID19-HPSMP), in-novative fusion strategies are used to combine scattered social media news related to COVID-19with historical mark data. The proposed COVID19-HPSMP consists of two parallel paths (hencehybrid), one based on Convolutional Neural Network (CNN) with Local/Global Attention modules,and one integrated CNN and Bi-directional Long Short term Memory (BLSTM) path. The twoparallel paths are followed by a multilayer fusion layer acting as a fusion centre that combines local-ized features. Performance evaluations are performed based on the introduced COVID19 PRIMOdataset illustrating superior performance of the proposed framework. Keywords:
COVID-19 Pandemic, Deep Neural Networks, Hybrid Models, Information Fusion,Stock Movement Prediction.
January 8, 2021 a r X i v : . [ q -f i n . S T ] J a n . Introduction The novel of coronavirus (COVID-19) has suddenly and abruptly changed the world as weknew at the end of the 2 nd decade of the 21 st century. The global COVID-19 pandemic causedmarket volatility (Mazur et al., 2020; Baek et al., 2020) rocketing upward around the world. Inparticular, the pandemic has negatively triggered several sectors including but not limited to stockmarkets, global supply chains, labor markets, and consumption behaviors. Disruptions of suchsectors, especially the stock markets (Bustos & Pomares-Quimbaya, 2020; Al-Awadhi et al., 2020;Saleh Ahmar & Boj del Val, 2020), can adversely affect the global economy. The United Statesvolatility levels in the mid-March of 2020 are similar to those last seen during October 1987; after1929 to 1939, and; during in 2008. In September 2008, the Dow Jones Industrial Average fell 777 . Literature Review:
Stock market movement prediction is a key and challenging problem infinancial econometrics as such has attracted extensive recent research focus (Frankel, 1995; Ronaghiet al., 2017; Mohammadi et al., 2017; Edwards et al., 2007; Bollen & H. Mao, 2011; Jiang, 2020;Hu et al., 2019; Koshiyama et al., 2020; Schumaker & Chen, 2009). It is widely acknowledged thatinvestors need high-quality data to make informed and accurate decisions. Particularly, in times ofmarket crisis, specifically during the recent COVID-19 pandemic, investors need advanced Big-DataAnalytic and Information Technologies to acquire timely and accurate data. Using high-qualitydata, investors can perform fast analysis and decision making in the market volatility and reactquickly to the fast changing conditions. Any positive or negative news related to the stock marketcrisis can have a ripple effect on the investors’ decision-making process within the stock markets.During the pandemic area, typically, stock movement prediction becomes significantly challenging2s stock markets tend to face high fluctuations. Consequently, it is of paramount importance todevelop innovative and advanced processing and learning solutions to accurately predict stockmovements for achieving maximum potential profit. This has resulted in a recent surge of interestin ML/AI-based prediction techniques (Hu et al., 2019; Anik et al., 2019) and fusion of multi-modal information sources. In the context of stock price movement prediction, historical stockprices are typically fused with information obtained from media news. For the latter, in additionto the conventional news platforms, recently, extensive interest is shown towards utilization ofInternet-based news resources, such as social media for development of ML/AI predictive models.The manuscript focuses on this topic and examines the role of COVID-19 related social media newson behavior of Dow Jones market.Recent advancements and developments in the field of ML and AI, in particular, Deep NeuralNetworks (DNNs), have motivated different research works to incorporate such advanced modelingtechniques for prediction and forecasting tasks in stock markets (Tetlock, 2007). DNN-based solu-tions are data-driven techniques that learn the underlying dynamics of the stock price movementsthrough processing of a large amount of data. DNN-based methodologies are, typically, data hungryand will not perform well in absence of a large and diversified set of data resources. Availabilityof public news media, Internet-based news channels, and social media can pave the way to bet-ter train DNN models and further increase utilization of AI within stock markets. This researchfield, however, is still in its infancy due to its high dependence on the reliability and quality ofthe information available through Internet-based news channels and social media resources (Huet al., 2019). Furthermore, such data sources can not be directly used for prediction tasks (Luss& D’Aspremont, 2015) due to the highly correlated nature of stock market price movements. Totackle the aforementioned issues, there is an unmet and timely quest to develop and design: (i)Hybrid processing/learning models based on different and diversified learning architectures to cap-ture underlying correlations and variabilities of the data sources, and; (ii) Smart fusion strategiesto combine scattered social media news with historical mark data. The paper aims to take the firststep towards addressing this gap.
Contributions:
The main objective of the proposed DNN-based predictive model is to constructa new information fusion framework to analyses and interpret ever-changing trends during theCOVID-19 pandemic area. In this regard, first, a unique and real COVID-19 related PRIce MOve-3ent prediction (COVID19 PRIMO) dataset (Ronaghi et al.) ” is constructed to incorporate effectsof internet-based and social media trends related to COVID-19 on stock market price movements.The main component of the constructed COVID19 PRIMO dataset is based on Twitter messages.It is well known that news and media move stock prices (Fama, 1998; Huang & Li, 2020; Yunet al., 2019). Nowadays, information reaches out to the public via different news platforms rangingfrom newspaper, radio and television to social media and Internet-based venues. In this area, socialmedia, especially Twitter, is a popular and widely used platform to share personalized opinion ondifferent topics. Twitter is also used extensively by politicians who potentially have high impacton stock price movements. Based on a survey on Statista (Clement, 2020), from the first quarterof 2017 to 2020, Twitter had 186 million active users worldwide.Based on the constructed COVID19 PRIMO dataset, the paper proposes a data-driven (DNN-based) COVID-19 adopted Hybrid and Parallel deep fusion framework for Stock price MovementPrediction (COVID19-HPSMP) that uses information fusion to combine COVID-19 related Twitterdata with extended horizon market historical data. More specifically, in contrary to the existingdata-driven movement prediction models, where a single DL model is used (Ronaghi et al., 2017),the proposed COVID19-HPSMP is a hybrid framework with two parallel paths, i.e., one based onConvolutional Neural Network (CNN) with Local/Global Attention modules, and one integratedCNN and Bi-directional Long Short term Memory (BLSTM). The former path is incorporatedwithin the COVID19-HPSMP framework to extract temporal features, while the latter path is usedto extract spatial features. The two parallel paths are followed by a multilayer fusion layer actingas a fusion centre that combines localized features extracted in each of the two parallel paths. TheCOVID19 PRIMO dataset is used to evaluate the performance of the proposed COVID19-HPSMPframework, which illustrates its superior performance compared to its stand-alone (non-hybrid)counterparts.The remainder of the paper is organized as follows: Section 2 introduces the COVID19 PRIMOdataset and formulates the stock movement prediction task. The COVID19-HPSMP hybrid frame-work is presented in Section 3. The implementation study and results are presented in Section 4.Finally, Section 5 concludes the paper. The COVID19 PRIMO dataset is accessible through the following page:https://github.com/MSBeni/COVID19 PRIMO . Problem Definition and COVID19 PRIMO In this section, first, the COVID19 PRIMO dataset is introduced, which is constructed based onthe Dow Jones stock market index and its associated Twitter messages for the period of 01/01/2016to 30/07/2020. The focus is on the problem of stock price movement prediction as close observationof market movements can reveal presence of a significant amount of trading targets with minormovement ratios. More specifically, the paper focuses on investigating effects of COVID-19 pan-demic on stock price movement prediction. In this paper, stock movement prediction is modeled asa two-class classification problem based on the adjusted closing price of the underlying stocks. Theadjusted closing price is commonly utilized to compute the associated stock dividends and earn-ings (Xie et al., 2013). Furthermore, the adjusted closing price is beneficial to learn and predictfluctuations in the stock market (Li et al., 2014; Rekabsaz & et al., 2017).We have prepared a new dataset for the aforementioned prediction problem, which can facilitateanalysis and evaluation of potential impacts of a pandemic on stock market and can provide pricelessinsights to combat future possible pandemic. The constructed COVID19 PRIMO dataset consists oftwo components, i.e., historical prices and Twitter messages. The first component, historical data, isobtained from Dow Jones stock market. With the ticker of DJI. Dow Jones is a stock market indexthat measures the performance of 30 large companies like Apple, Boeing, and Microsoft. Historicalstock market prices are obtained from the Yahoo finance. For this task, we used the Yahoo financelibrary in Python to collect the data from the Yahoo API. For some of the stocks, we also usedAlpha Vantage APIs . The data is prepared based on three different temporal resolutions, i.e.,daily; weekly, and; monthly. The daily prices are used in our model described later for the task ofstock movement prediction.Capitalizing on the facts identified in Section 1, for the news component of the COVID-19price movement prediction dataset, we focused on Twitter. Fig. 1 shows the block diagram of theapproach followed to collect and analyze Twitter messages. Web scraping from the Twitter searchengine is utilized to build the Twitter dataset. The official API of the Twitter has some limitationsthat restricts the extent of text that can be extracted. Additionally, the official API of the Twittercuts the tweets at times, which in turn results in items with missing data. We have developed a http:https://pypi.org/project/yahoo-finance/ http:https://pypi.org/project/alpha-vantage/ igure 1: Block diagram of the procedure designed to collect and prepare Tweet component of the COVID19 PRIMO. localized API to address the aforementioned issues. The localized API uses Twitter search engineand directly collects the required dataset from Twitter. We set up our data collection platform basedon scraping the twitter website. The twitter web scraping returns the Tweet text content with arange of useful attributes, for example, T weet − ID , Tweet Created at, Retweet, Text, FavoriteCount, Hashtag Text, User ID, Followers Count, Friends Count, Statuses Count, User Created at,and Location. To collect informative public Tweets, we added a constraint to our implementationto collect tweets retweeted more than once. Many other unnecessary attributes regarding a tweetwere also removed from the data gathering session to focus on the essential information such asdate, tweet text, and number of retweets. Fig. 2 illustrates an illustrative example of raw tweetscollected by web scraping.A critical challenge is scraping the raw content of Twitter data. Such a process takes exten-sive time and needs manual and cumbersome pre-processing procedures. We retrieved Dow Jonesand COV ID −
19 tweets by querying symbols $
DJ I ,
DOW , Covid
19,
Covid
19,
Covid − CoronaV irus . Additionally, the corresponding data associated with historical prices are col-lected. The constructed dataset includes related tweets from 01/01/2016 to 30/07/2020 for theDow Jones stock index. Not every day is considered as a trading day, i.e., weekends and holidaysare not among the trading dates and ought to be out of the analysis scope. To better organizeand use the input, we subtract the number of days in a year from the number of weekends, the6 igure 2: Illustrative raw Tweets samples in the COVID19 PRIMO datatest before the pre-processing step. number of half trading days, and the number of market holidays. More specifically, our dataframeis created by combining historical prices and Tweet corpora and matching them to the tradingdays. Consequently, we considered 1 ,
152 trading days from January 2016 to July 2020 to build ourdataset. The COVID19 PRIMO dataset is then divided into a training set from January 2016 toJanuary 2020, and a validation set from January 2020 to February 2020. Data from 01/03/2020 to30/07/2020 is kept to be used for test purposes.
In this sub-section, we visualize the existing relations between different parameters of theCOVID19 PRIMO dataste, particularly that of COVID-19 tweets with other parameters. As statedabove, the introduced dataset is constructed based on the Dow Jones stock market index and itsassociated Twitter messages for the period of 01/01/2016 to 30/07/2020. The COVID-19 pandemiccrisis covers a fraction of the data represented in the COVID19 PRIMO dataset but plays an essen-tial role in predicting the pandemic’s effects on the stock market movements. The COVID-19 relatedtweets appeared in 2020 (from February to July), specifically starting to show up from the end ofFebruary. The dates containing the COVID-19 are stamped with True, making it possible to con-sider its distribution in the whole frame. Fig. 3 visualizes different aspects of the COVID19 PRIMOdataset and illustrates relation of COVID-19 with other parameters of the dataset. Violin plot isshown in Fig. 3(a), where the distribution of COVID-19 tweets and the market movements areillustrated. As shown in Fig. 3(b), the COVID-19 tweets are less than 200 days, which is expected7 a) (b)(c)Figure 3: (a) The Distribution of The COVID-19 Tweets and Target. (b) The Number of COVID-19 Tweets. (c)Correlation matrix between different variables. given the recent emergence of the pandemic. Fig. 3(c) shows the correlation between several dif-ferent variables affecting the COVID19 PRIMO dataset. Market prices, including Adjusted closeprice (Adj. Close); Open price (Open); High price (High); Low price (Low); calculated target (Tar-get), and; Presence of COVID-19 in tweets are depicted in this figure. For example, the correlationbetween normalized adjusted close price and normalized high price is 0 .
83. Finally, Fig. 4 is a gridof scatter plot used to visualize bivariate relationships between combinations of variables. Fig. 4is included to have a big picture of the distribution of the data and better understand existingrelations between different parameters in the dataset. Fig. 4 shows the relationship for a differentcombination of variables in a DataFrame as a matrix of plots. The orange dots, show the COVID-89 related data in the dataset, while the blue dots, represent the lack of COVID-19 related data.Fig. 4 can potentially depict the bivariate relationships between different market price data andCOVID-19 together with the relation between the Target recognized in this period of time withthe pandemic data.
Figure 4: Grid of scatter plots. The orange dots show COVID-19 related data points, while the blue dots representlack of COVID-19 related data. igure 5: The proposed COVID19-HPSMP framework.
3. Proposed COVID19-HPSMP Framework
In this section, we describe the constituent components used in the development of the proposedCOVID19-HPSMP framework. As stated previously, the main architecture of the COVID19-HPSMPis developed based on DNNs. The prominent advantage of DNNs is their ability to extract meaning-ful patterns from raw data through multiple non-linear transformations and approximation of com-plex non-linear functions (Al-Dulaimi et al., 2019). More specifically, the proposed COVID19-HPSMPis a data-driven (deep learning model) designed based on hybrid or multiple-model strategies. TheCOVID19-HPSMP framework extracts and interprets the available news corpus via temporal at-tention modeling based two key principles, i.e., “Diverse Influence” and “Sequential Context De-pendency”. To achieve these objectives, the COVID19-HPSMP is designed as a hybrid multi-modalfusion framework that integrates information obtained from stock market historical data and so-cial media (Twitter data). The proposed hybrid framework consists of three paths, two parallelpaths, i.e., the CNN Local/Glocal path, and; the CNN-LSTM path, together with a fusion path.Fig. 5 illustrates the overall structure of the proposed framework. The fusion path composed offully connected layers that combine extracted features from each of the two parallel paths. Eachof the two parallel paths within the COVID19-HPSMP framework are constructed based on thefollowing two main components:(i)
Word Embedding Module : This module is used to calculate embedded vectors for Twitterdata. For this purpose, Glove (Pennington et al., 2014), as a pre-trained unsupervised model,is used within the word embedding module of each of the two parallel paths within theproposed COVID19-HPSMP framework. 10 igure 6: The CNN Local/Global path of the COVID19-HPSMP framework. (ii)
Attention Module : The main objective of this module is to extract specific words with highestattention weight. The COVID19-HPSMP is a hybrid model where each of the two parallelpaths (i.e., the CNN Local/Glocal path, and; the CNN-LSTM path) is a unique attentionmodule extracting different related features. The rationale behind such a hybrid and parallelstructure is the significance of the attention network and the intuition that extracting differentattention-related features would improve the overall performance of the model.In what follows, we present each of the three constituent paths of the proposed COVID19-HPSMPframework.
The first parallel path of the proposed COVID19-HPSMP framework is a CNN-based lo-cal/global attention model designed to capture and extract spatial features from the input data.11ore specifically, the CNN-based path consists of Local and Global attention layers, which aredescribed in details below.
Intuitively speaking, a word embedding model produces representations for each word in theTwitter corpus. Let us denote the l th Tweet among the available set of N T Tweets with T ( l ) ,for (1 ≤ l ≤ N T ). Furthermore, consider that Tweet T ( l ) contains N ( l ) W number of words. Theembedding can be thought of as a linear operator (function) that takes as an input a one-hotvector e ( l ) i ∈ R N W corresponding to the i th word of Tweet T ( l ) , for (1 ≤ i ≤ N ( l ) W ). Note that,here N W denotes the number of words in the overall vocabulary. The embedding then maps theone-hot vector into a dense feature vector x ( l ) i = [ x ( l ) i, , . . . , x ( l ) i,N F ] T ∈ R N F , which consists of N F scalar features for the i th word of the l th Tweet within the Twitter corpus. The feature vector x ( l ) i is obtained based on a trainable weighting matrix (to be learned) of the embedding layer as follows x ( l ) i = W ( Emb ) e ( l ) i . (1)The embedding layer’s output together with price values are provided as a concatenated input tothe Local Attention Layer (LAL). The LAL focuses on the words, which are more informative withina localized window. More specifically, let the l th Tweet be represented by N ( l ) W word embedding as( x ( l )1 , . . . , x ( l ) m , . . . , x ( l ) N ( l ) W ), where x ( l ) m is the middle (center) word within the embedding sequence ofthe l th Tweet. Local attention process is achieved via a sliding window of length W rolling over theword embedding sequence of T ( l ) . Attention score s ( l ) i for the i th word of T ( l ) is computed based onan attention weighting score W ( l,LA ) i ∈ R W × N F , for (1 ≤ i ≤ N ( l ) W ), and its associated bias vector b ( l,LA ) i as follows s ( l,LA ) i = σ (cid:16) X ( l,LA ) i ◦ W ( l,LA ) i + b ( l,LA ) i (cid:17) , (2)where ◦ denotes the Hadamard product (element-wise multiplication), σ · is the sigmoid activationfunction, and X ( l,LA ) i (cid:44) (cid:2) x ( l ) m + − W +12 , . . . x ( l ) i , . . . x ( l ) m + W +12 ] T , (3)where superscript T denotes transpose operator. The attention score s ( l ) i is used as a weight forthe words to form localized word embedding as follows ˆ x ( l,LA ) i = s ( l,LA ) i x ( l ) i . A higher attention12core can be interpreted as higher importance associated with that specific word than the others.The weighted sequences then go through a Convolutional layer with a kernel size of 15, which isdesigned to avoid overfitting. A Max-Pooling layer is then implemented after the convolution oneto creates position invariance over larger local regions and down-sample the input. Addition of theMax-Pooling layer also leads to a faster convergence rate by selecting superior invariant features,which in turn improves generalization performance. The output of the LAL is the provided as input to a Global Attention layer (GAL). This scoringprocess of the GAL is similar in nature to that of the LAL (Eqs. (2)-(3)). However, the attentionscore, now denoted by s ( l,GA ) i , is computed through the entire input, i.e., s ( l,GA ) i = σ (cid:16) X ( l,GA ) ◦ W ( l,GA ) i + b ( l,GA ) i (cid:17) , (4)where W ( l,GA ) i ∈ R N ( l ) W × N F , and X ( G,Att ) (cid:44) (cid:2) x , . . . , x N ( l ) W (cid:3) T . (5)By applying global attention, the effect of uninformative words will be diminished, and the globalsemantic meaning will be captured more precisely through the CNN path. This completes de-scription of the CNN Local/Global path of the proposed COVID19-HPSMP f ramework . Next, wepresent the CNN-BLSTM path. The second parallel path of the proposed COVID19-HPSMP is a hybrid CNN and BLSTMattention model, referred to as the CNN-BLSTM path. Similar to the CNN Local/Global path,in the first step, the Twitter messages are provided as input to a “Word Embedding layer”. Asstated previously, a pre-trained unsupervised Glove model (Pennington et al., 2014) is used asthe word embedding layer within the COVID19-HPSMP framework. Afterwards, the corpus andprices are encoded by a CNN layer to extract general contextual features. An attention layer isassigned across all the vectors to calculate the weighted corpus. At the next step, a second CNNlayer is implemented to capture and learn more fine-tuned features. The first CNN layer has 50number of filters with a window size of 25. The second CNN layer has 100 filters with a window13 igure 7: The CNN-LSTM path of the COVID19-HPSMP framework. size of 25. The first Attention layer is used to capture essential and unique features to provideinsight into the vector of the data including tweets and prices. The second Attention layer actson each vector and calculates the weighted mean of these encoded corpus vectors to represent theoverall sequential context information. A global Max-Pooling layer is then applied to capture theessential features and reduce the framework’s complexity. Global Max-Pooling is similar to theregular version but with pool size equals to the size of the input. At the next stage, an attention-based BLSTM layer is designed to remember what has previously learned to better understand theinput. The attention-based BLSTM layer is described next.
To encode temporal information based on the available set of news corpus and financial time-series data, BLSTM is incorporated within the COVID19-HPSMP hybrid framework. Learningbased on financial time-series data is a sequence learning task for which BLSTMs are considered asthe state-of-the-art DNN architectures. The LSTM architecture is initially developed by Hochreiterand Schmidhuber (Hochreiter & Schmidhuber, 1997) to address the vanishing and exploding gra-dient problem of conventional Recurrent Neural Networks (RNNs). Since then, LSTM models havegained significant popularity owing to their extensions, advancements and successful applicationsin different domains. Generally speaking, LSTM is a memory-based architecture that uses differentgating functions and a memory state to manage process if information through time (Di et al.,14018). LSTM works based on the following update model at each time step (denoted by t ) i t = σ ( W i x t + U i h t − + b i ) (6) f t = σ ( W f x t + U f h t − + b f ) (7) g t = tanh( W c x t + U c h t − + b c ) (8) c t = f t ◦ c t − + i t (cid:12) g t (9) o t = σ ( W o x t + U o h t − + b o ) (10) h t = o t ◦ tanh( c t ) (11)where W i , W o , W f , W c , and U i , U o , U f , U c are weight matrices; Terms b i , b o , b f , b c are biasvectors, and; tan( · ) represents element-wise hyperbolic tangent activation function. Furthermore, h t and h t − represent the current and previous hidden states, respectively. In the context of theproposed COVID19-HPSMP and to encode the temporal layer, we adopt Bidirectional version ofthe LSTM (BDLSTM) to feed the i th word embedding. BLSTM can access both the precedingand succeeding contexts. It separates the hidden layer into two parts, forward state sequence andbackward state sequence based on an iterative process. The final component of the proposed COVID19-HPSMP framework is the Fusion Path withthree fully connected layers for fusing features extracted from each of the two underlying parallelpaths and performing the final price movement prediction task. The first fusion layer has 100number of neurons and uses “tan” activation function, while the second fusion layer has 50 numberof neurons with the same activation function. The final layer of the Fusion Path, has 1 neuronand uses Rectified Linear Unit (ReLU) as its activation function to produce the price movementpredictions. The input to the Fusion Path is constructed by concatenating the output of the CNNLocal/Global path, which is a flattened 1-Dimensional feature vector, with that of the CNN-BLSTMpath.
4. Experiments
Experimental results and comparisons are presented in this section to evaluate the proposedhybrid COVID19-HPSMP framework for the task of stock movement prediction. As stated previ-ously, the problem at hand is a classification one with the following expected outputs: (i) On one15 able 1: Accuracy comparisons.
Model Variations Accuracy
The COVID19-HPSMP Framework 66.48Standalone CNN Local/Global Model 64.65Standalone CNN-LSTM Model 62.06hand, within a 5 days prediction horizon, if the adjusted stock price of a specific day is more thanthat of the previous day, the output of that specific day would be 1. Then, the sum of the outputvalues is computed over the 5 days horizon and if the sum is greater than a pre-defined thresholdof 3, we consider the final output for that 5 day horizon to be 1, denoting a rise, and; (ii) On theother hand, when the adjusted stock price associated with a specific day is less than its previousday, value 0 is assigned as the output of that specific day. When the number of such 0 outputvalues within the 5 days window is more than 3, we consider the final output to be 0, representingthe fall prediction/state.
To perform the evaluations, the available Twitter news corpora is tokenized and words occurringless than 5 times are removed to construct the vocabulary. It is worth noting that removing wordswith limited usage will reduce the associated memory cost of the DNN models. As stated above,we consider a five day horizon and used a batch size of 64 within 15 epoch. In addition, Glove,which is an unsupervised word embedding algorithm, is used within the embedding modules of thetwo parallel paths of the COVID19-HPSMP. For comparison purposes, three different models areimplemented as follows:(ii)
The proposed COVID19-HPSMP Framework : The proposed hybrid COVID19-HPSMP frame-work developed in Section 3 is the first implemented stock movement prediction model. TheCOVID19-HPSMP consists of 2 parallel paths and a fusion path integrating extracted fea-tures of each of the two parallel paths.(ii)
Stand-Alone CNN Local/Global Model : The second implemented movement prediction modelis the CNN Local/Global path implemented independently (stand alone as a single model). Toimplement the stand alone version of the CNN Global/Local model, initialization is performed16ollowing the guideline provided in Reference (Seo et al., 2017). A pre-trained Glove (Pen-nington et al., 2014) is used for weighting corpus within the word embedding layer. In theLAL, we use window of size 5 with a sigmoid function ( σ ). Total of 80 filters are implementedwithin the LAL. In the GAL, we used 50 filters of length 2 and 3. Finally, a fully connectedlayer with 0 . Stand-Alone CNN-BLSTM Model : The third implemented movement prediction model is theCNN-BLSTM path implemented independently as a single model. Similar to the stand-aloneCNN Global/Local model, a pre-trained Glove (Pennington et al., 2014) is used for weightingcorpus within the embedding layer. A convolutional layer with a 64 number of filters and awindow size of 25 is followed by an attention layer. To extract essential features and reducethe framework’s complexity, a max-pooling layer is designed. The output of max-poolinglayer is the input of next layer which is attention-based Bidirectional LSTM with 250 hiddenlayers. Finally, two fully connected layers are considered with 300 and 1 number of hiddenneurons, respectively, to form the price movement prediction results.These three implemented models are trained with Adam optimizer (Kingma & Ba, 2014) with alearning rate of 0 . . et al. , 2016) to fine tune different hyper-parameters. In this sub-section, we represent different experimental results to evaluate the performance ofthe proposed COVID19-HPSMP framework for the stock movement prediction. The accuracy of theproposed models areas follows: 64 .
65% for the stand-alone CNN-based local/global; 62 .
06% for thestand-alone CNN-LSTM, and 66 .
48% for the hybrid attention model, i.e., the COVID19-HPSMPframework. The accuracy of all three implemented models are shown in Fig. 8(a). The accuracy is afraction of correct predictions to the total number of predictions. The loss function associated withthe evaluated models is illustrated Fig. 8(b). Loss function demonstrates the distinction betweenthe output of the model and the target value in order to show the probability of misclassification.17 a) (b)Figure 8: (a) Accuracy of the Models. (b) Loss of the Models.Figure 9: Accuracy and loss contrast of the different price movement prediction models.
We demonstrate the performance of the baseline models in Table 1 comparing performance ofthe three implemented models. As it can be observed, the hybrid model (the COVID19-HPSMP)outperforms its counterparts. It is worth mentioning that the achieved accuracy of 66 .
48% issignificant, although in absolute terms it seems to be low. First, please note that average accuracies18chieved in the literature for the task of price movement prediction is around 50%. Second, theselower accuracies are obtained based on a much wider window of information compared to thelimited duration of the introduced COVID19 PRIMO dataset. The limited duration of the datasetis due to recent emergence of the COVID-19 pandemic.
5. Conclusion
Motivated by abrupt, sudden, and negative effects of COVID-19 pandemic on stock markets,first, the paper introduced a unique COVID-19 related PRIce MOvement prediction (COVID19 PRIMO)dataset. The constructed dataset incorporates effects of social media trends related to COVID-19on stock market price movements. Based on the constructed COVID19 PRIMO dataset, the pa-per then proposed a novel data-driven (DNN-based) COVID-19 adopted Hybrid and Parallel deepfusion framework for Stock price Movement Prediction (COVID19-HPSMP). The proposed frame-work uses information fusion to combine COVID-19 related Twitter data with extended horizonmarket historical data. More specifically, in contrary to the existing data-driven stock price move-ment prediction models, where a single DNN model is used, the COVID19-HPSMP framework isa hybrid model consisting of two parallel paths (i.e., the CNN Local/Glocal path, and; the CNN-LSTM path) and a fusion path that combines localized features. Each of the two parallel paths is aunique attention module extracting different attention related features. The rationale behind sucha hybrid and parallel structure is the significance of the attention network and the intuition thatextracting different attention-related features would improve the overall performance of the model.The proposed COVID19-HPSMP architecture can predict the stock price movements during thepandemic crisis to forecast sudden sharp movements (fall or rise) in the stock market. Based on theresults of the COVID19-HPSMP architecture, we can predict the stock market’s fluctuations withmore than 66% accuracy, which will hopefully be a metric to be more prepared for the unexpectedhavocs.
References
Abadi, M., & et al. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems.ArXiv:1603.04467.Al-Awadhi, A., Alsaifi, K., Al-Awadhi, A., & Alhammadi, S. (2020). Death and contagious infectious diseases: Impactof the covid-19 virus on stock market returns.
Journal of Behavioral and Experimental Finance . l-Dulaimi, A., Zabihi, S., Asif, A., & Mohammadi, A. (2019). A multimodal and hybrid deep neural network modelfor remaining useful life estimation. Computers in Industry , , 186–196.Anik, M., Arefin, M., & Dewan, M. (2019). An intelligent technique for stock market prediction. International JointConference on Computational Intelligence. Algorithms for Intelligent Systems .Baek, S., Mohanty, S., & Glambosky, M. (2020). Covid-19 and stock market volatility: An industry level analysis.
Finance Research Letters .Bollen, J., & H. Mao, H. (2011). Twitter mood as a stock market predictor.
Computer , , 91–94.Bustos, O., & Pomares-Quimbaya, A. (2020). Stock market movement forecast: A systematic review. Expert Systemswith Applications , .Chong, E., Han, C., & Park, F. (2019). Deep learning networks for stock market analysis and prediction: Methodology,data representations, and case studies. Expert Systems with Applications , Deep Learning Essentials: your Hands-on Guide to the Fundamentals ofDeep Learning and Neural Network Modeling . Packt Publishing - ebooks Account.Edwards, R., Bassetti, W., & Magee, J. (2007).
Technical Analysis of Stock Trends . CRC Press.Fama, E. (1998). Market efficiency, long-term returns, and behavioral finance.
Journal of Financial Economics , (pp.283–306).Frankel, J. (1995).
Financial Markets And Monetary Policy . MIT Press.Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory.
Neural Computation , , 1735–1780.Hoseinzade, E., & Haratizadeh, S. (2019). Cnnpred: Cnn-based stock market prediction using a diverse set ofvariables. Expert Systems with Applications , .Hu, Z., Liu, W., Bian, J., & Liu, X. (2019). Listening to chaotic whispers: A deep learning framework for news-orientedstock trend prediction. ACM International Conference on Web Search and Data Mining .Huang, J., & Li, J. (2020). Using social media mining technology to improve stock price forecast accuracy.
Journalof Forcasting , (pp. 104–116).Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariateshift. ArXiv:1502.03167.Jiang, W. (2020). Applications of deep learning in stock market prediction: Recent progress. ArXiv:2003.01859.Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. ArXiv:1412.6980.Koshiyama, S., Firoozye, N., & Treleaven, P. (2020). Algorithms in future capital markets.
SSRN .Li, X., Xie, H., Chen, L., Wang, J., & Deng, X. (2014). News impact on stock price return via sentiment analysis.
Knowledge-Based Systems , , 14–23.Luss, R., & D’Aspremont, A. (2015). Predicting abnormal returns from news using text classification. QuantitativeFinance , , 999–1012.Mazur, M., Dang, M., & Vega, M. (2020). Covid-19 and the march 2020 stock market crash. evidence from s&p1500. Finance Research Letters .Mohammadi, A., Zhang, X., & Plataniotis, K. (2017). Interactive gaussian-sum filtering for estimating systematic isk in financial econometrics. IEEE Global Conference on Signal and Information Processing (GlobalSIP) , (pp.903–907).Pennington, J., Socher, R., & Manning, C. (2014). Glove: Glove vector of word representation.
Empirical Methodsin Natural Language Processing (EMNLP) .Radojicic, D., & Kredatus, S. (2020). The impact of stock market price fourier transform analysis on the gatedrecurrent unit classifier model.
Expert Systems with Applications , .Rekabsaz, N., & et al. (2017). Volatility prediction using financial disclosures sentiments with word embedding-basedir models. Association for Computational Linguistics , (pp. 1712–1721).Rezaei, H., Faaljou, H., & Mansourfar, G. (2020). Stock price prediction using deep learning and frequency decom-position.
Expert Systems with Applications .Ronaghi, F., Salimibeni, M., Naderkhani, F., & Mohammadi, A. (2020). Covid19 primo dataset.https://github.com/MSBeni/COVID19 PRIMO
IEEE International Conference on Information Fusion(FUSION) , (pp. 1–7).Saleh Ahmar, A., & Boj del Val, E. (2020). Suttearima: Short-term forecasting method, a case: Covid-19 and stockmarket in spain.
Science of the Total Environment .Schumaker, R., & Chen, H. (2009). Textual analysis of stock market prediction using breaking financial news: Theazfin text system.
ACM Transactions on Information Systems .Seo, S., Huang, J., Yang, H., & Liu, Y. (2017). Interpretable convolutional neural networks with dual local andglobal attention for review rating prediction.
Proceedings of the Recsys , (pp. 297–305).Seong, N., & Nam, K. (2021). Predicting stock movements based on financial news with segmentation.
ExpertSystems with Applications , .Tetlock, P. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance , , 1139–1168.Xie, B., Passonneau, R., Wu, L., & Creamer, G. (2013). Semantic frames to predict stock price movement. Associationfor Computational Linguistics , (pp. 873–883).Yun, H., Sim, G., & Seok, J. (2019). Stock prices prediction using the title of newspaper articles with korean nat-ural language processing.
International Conference on Artificial Intelligence in Information and Communication(ICAIIC) , (pp. 019–021).Zhang, Y., Chu, G., & Shen, D. (2020). The role of investor attention in predicting stock prices: The long short-termmemory networks perspective.
Finance Research Letters ..