Mining the Relationship Between COVID-19 Sentiment and Market Performance
GGraphical Abstract
Mining the Relationship Between COVID-19 Sentiment and Market Performance
Ziyuan Xia,Jeffrey Chen a r X i v : . [ ec on . GN ] J a n ighlights Mining the Relationship Between COVID-19 Sentiment and Market Performance
Ziyuan Xia,Jeffrey Chen• Research highlights item 1• Research highlights item 2• Research highlights item 3 ining the Relationship Between COVID-19 Sentiment andMarket Performance
Ziyuan Xia a , ∗ ,1 , Jeffrey Chen b ,2 a b A R T I C L E I N F O
Keywords :CoronavirusCOVID-19 and financeStock MarketIndex PredictionFinancial market risk
A B S T R A C T
At the beginning of the COVID-19 outbreak in March, we observed one of the largest stockmarket crashes in history. Within the months following this, a volatile bullish climb back topre-pandemic performances and higher. In this paper we study the stock market behavior duringthe initial few months of the COVID-19 pandemic in relation to COVID-19 sentiment. Usingtext sentiment analysis of Twitter data, we look at tweets that contain key words in relation tothe COVID-19 pandemic and the sentiment of the tweet to understand whether sentiment canbe used as an indicator for stock market performance. There has been previous research doneon applying natural language processing and text sentiment analysis to understand the stockmarket performance, given how prevalent the impact of COVID-19 is to the economy, we wantto further the application of these techniques to understand the relationship that COVID-19 haswith stock market performance. Our findings show that there is a strong relationship to COVID-19 sentiment derived from tweets that could be used to predict stock market performance in thefuture.
1. Introduction
Understanding and predicting the stock market behavior has always been a major goal for academia and industry,but an extremely difficult one. The efficient market hypothesis, which is an earlier theory of stock market prices byeconomist Eugene Fama, states that in an efficient market the current market prices is reflective upon all availableand relevant information Fama (1965); Fama, Fisher, Jensen and Roll (1969). Any changes in stock market prices area result of new information that is revealed, independent of existing information. As such, it was believed that themarket price was unpredictable because new information is also unpredictable. Weaker forms of market efficiency areconsequences of incomplete information to the public, causing the true value of a stock price to not be representedaccurately. When information asymmetry is reduced, we see market anomalies and volatility reduce or disappear. Byincreasing the access to information and transparency to the public, strong market efficiency is achieved Fama (1991).In theory, strong market efficiency is achieved when we have perfect information. However, easily accessibleinformation is not always readily available to the general public, and when it is, there is delay from the actual event tonews reporting. Although news can be unpredictable, there are certain indicators that can be derived from online socialmedia platforms and tools that could act as form of asymmetric new information to predict stock market performances.In this paper, we want to understand how public sentiment from social media platforms can be used to determinestock market performance. More specifically, how the public sentiment of the severity of the COVID-19 pandemicwith regards to how it is being handled, infection rates, and confidence in economy and government, affects consumerinvestment behavior and as a result, stock market performance Stojkoski, Utkovski, Jolakoski, Tevdovski and Kocarev(2020).The reaction of the major global equity markets to COVID-19 in the early 2020 can be basically divided into thefollowing parts in terms of timing Chernozhukov, Kasahara and Schrimpf (2021).• Early stage of the crisis: Infection is limited to Asia, with only the Hong Kong stock market falling more signifi-cantly. 11 January 2020, the first fatal case of New Coronary Pneumonia is reported in Wuhan, China, confirmingthe human-to-human nature of the virus. Wuhan was closed on January 23, and confirmed cases continued to [email protected] (Z. Xia); [email protected] (J. Chen)
ORCID (s): (Z. Xia) Technology Management and Innovation, New York University Decision, Operations & Information Technologies, University of Maryland Robert H.Smith School of Business
Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 1 of 10 ining the Relationship Between COVID-19 Sentiment and Market Performance emerge in Thailand, Japan, Korea, Hong Kong, and Singapore. The World Health Organization announced onJanuary 31 that it had upgraded the NCCP outbreak to an "international public health emergency", followedby the Diamond Princess incident in Japan and the declaration of community transmission in Korea, etc Liu,Choo and Lee (2020). However, although the global market was concerned about the resumption of work inChina and the outbreak in Asia, the confirmed cases were still mainly in Asia, so the stock markets of variouscountries did not yet reflect the NCCP outbreak significantly. However, as the confirmed cases are still mainlyin Asia, the global stock markets have not yet reflected the NCCP epidemic significantly, with only small gainsand losses Zhang, Hu and Ji (2020).• In late February, a massive outbreak of Newcastle pneumonia occurred in Europe, with the number of newlydiagnosed patients rising; after the partial closure of the Italian Lombardy region on March 4, European andglobal stock markets took a sharp turn for the worse. With the news that the healthcare system in Europe and theU.S. was on the verge of a tipping point in March, global stock markets fell with violent shocks, and U.S. stockstriggered four market meltdowns between mid-March and the end of March Huang and Zheng (2020). The DowJones collapsed , points on March 12 (down on that day) and fell another , points on March 17(down on that day), the largest drop in history, triggering global panic ZEREN and HIZARCI (2020).From January to Mid-March 2020, European and U.S. stocks fell by a deep −24 . to −28 . , while Asian marketssuch as Japan, Korea and New Zealand fell by a relatively small amount ( −18 . to −22 . ). As the country where theoutbreak originated, Shanghai and Hong Kong stocks fell only −10 . and −18 . , respectively, due to the significantdecrease in new cases announced since March, making them the rare markets that resisted the global stock marketmeltdown. By April 2020, though infection rates have not improved Manski and Molinari (2020), the stock markethas been in steady recovery, indicating that the news of growing coronavirus cases was no longer negatively impactingthe stock market Topcu and Gulal (2020); Salisu and Vo (2020). As of June 8, 2020, the WHO had announced that theCOVID-19 situation is still worsening, but at the same time, the stock market was still growing for the fourth week ina row, with the S&P 500 returning to it’s position before the pandemic Gunther Capelle-Blancard (2020).The pandemic’s impact on the economy Keane and Neal (2021), and public opinion and confidence in the govern-ment and institutions responsible for handling the pandemic, with regards to areas in supply chain, leadership, policyand research, can have a high impact on consumer behavior in investing and spending. A study conducted on consumerpanic on over 54 countries using the first fourth months of the 2020 using Google search data showed that consumerpanic occurred at various levels of severity over the course of the four months, with some cases happening earlier,and some later. This study showed us how the severity of consumer panic can occurs when a government establishesmovement restrictions (Consumer panic in the COVID-19 Pandemic, Keane and Neal). Another study done on thepolicies of mandating face masks, stay-at-home orders, and restricting non-essential businesses have effectively re-duced the spread of Covid-19. (Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the US,Chernozhukov, Kasahara). Consumer perception of an economic recovery or government’s competency can be causedthrough the enactment of government policy and its impact on supply chain. This new information could then bereflected in the value of the a stock during specific economic and political times. Other research has found that forpositive and negative COVID-19 information has a heavy impact on market volatility Baek, Mohanty and Glambosky(2020). Therefore we want to capture the population sentiment with an indiscriminate sample of new information thatcould reflect the public perception of the COVID-19 situation. The method we proposed is natural language processingfor text sentiment analysis.We found Twitter to be an ideal source of capturing natural language public sentiment data, as it is an social mediaplatform with over 300 million monthly active users. Early 2010 research of understanding behavioral economics froma societal mood states indicates that it is possible to use text sentiment using deep learning models to correlate withthe performance of the Dow Jones Industrial Average (DJIA) over time Bollen, Mao and Zeng (2011). Later researchfound that in a short-window event study of a UK based political event, Twitter messages (tweets) were collected andfiltered to labels pertaining to a political event. With later statistical forecast proving evidence of causation betweenthe public sentiment, and the closing price with a slight time lag Nisar and Yeung (2018). We further the research ofusing text sentiment analysis in order to assess whether public opinion representative of a consumer’s confidence inthe COVID-19 situation could also be used to predict the stock market indexes Baker, Bloom, Davis, Kost, Sammonand Viratyosin (2020).To conduct our research, we used data extracted from the Twitter API Twitter. The technology affordance of socialmedia platforms like Twitter allow us to filter the text data from Twitter that contains keywords which indicate that Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 2 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
Table 1
Keywords used to crawling COVID-19 TweetsEpidemiccovid-19 corona virus sars-cov-2 pandemic mask stay home work from home endemicbreathing China Wuhan lock down outbreak positive testing site asymptonmatic epidemicquarantine vaccine CDC isolation N95 KN95 transmission community spread flu shotPanic-Buyingtoilet paper pasta rice flour hoarding fruit vegetables panic buying supermarket the message is pertaining to COVID-19. We then record the sentiment score, normalize the scores, and compare it tostock market index performance over time. For the stock market indexes, we used the historical data of the S&P 500,DJIA, and NASDAQ that was retrieved using Yahoo Finance API Yahoo.
2. Data
In this section, we will describe how the data used in our study was obtained. This is a fast, efficient and autonomousway to obtain real-time data and store it on a server. The Twitter data will help us to monitor the sentiment of Twitterusers towards COVID-19 and can be generalized to any other topic after more experiments. In the stock market, weuse Yahoo Finance’s API to obtain data on the US stock market.
We built a sentiment analysis tool that extracts COVID-19 related tweets every thirty minutes. A sentiment analysismodel trained on the Twitter data was used to analyze these stored tweets and output a sentiment score every 30 minutes.The sentiment scores are defined in the range [−1 , , , and (0 , +1] for negative sentiment, neutral sentiment, andpositive sentiment.We created a cluster on the Google Cloud Platform (GCP) Google and built a Twitter text data streaming sentimentanalysis tool with Flume, PySpark, and PyTorch. Twitter offers developer accounts, which we use to access Twitter’sAPI and set up the Flume Collecter on GCP cluster to collect streaming tweets. The keywords are shown in Fig. 1 setto fetch tweets from API.The tool reads the Tweets streaming from Hadoop Distributed File System (HDFS) first, which can scan the HDFSpath and convert all readable text files in the path to RDD format. After we get the completed streaming data file, weneed to clean the text file because Flume streams all texts given by Twitter API. After reviewing the raw data text file,we found that these texts were encoded by Unicode and contained different international languages. The texts alsoincludes emojis, Apple Emoticon Package, href and HTTP links etc. The regular expression is the most popular wayto clean text data. A regular expression, regex or regexp is a sequence of characters that define a search pattern. Wealso built a Unicode range check to filter out Chinese, Korean and Japanese. When people write a sentence in theselanguages, they will not include a space between every two words. The best way to split these sentences is by using aNature Language Package to mark the sentences then split them. We simply split characters in Chinese, Korean andJapanese and keep other languages as the original. After the initial cleaning of the metadata, we stored these tweets onthe server and process our pre-trained PyTorch sentiment analysis model to generate the sentiment score for the tweetscollected per thirty minutes intervals. On the server, we ran the Yahoo Finance API and recorded the daily stock market data of the daily opening andclosing prices, as well as trading volume for the three major stock indices (NASDAQ Composite, DJIA, and S&P 500).The stock market is not open on weekends and holidays, but still has trades occurring. Therefore we use Lagrangianinterpolation to fill in all the missing data so that it corresponds to the daily sentiment analysis scores.Financial time series data (especially stock prices) are subject to constant fluctuations due to seasonality, noise,and automatic corrections Jiang, Zhao and Shao (2020). Traditional forecasting methods (FTS) use moving averagesand differentials to reduce forecast noise. However, FTS is often unstable and there is overlap between useful signalsand noise, which makes traditional denoising methods ineffective.Wavelet analysis has made impressive achievements in the fields of image and signal processing. It can compensatefor the shortcomings of Fourier analysis, and is therefore gradually being introduced into the economic and financial
Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 3 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
Figure 1:
Sentiment Score for COVID-19 Tweets fields. The wavelet transform has a unique advantage in solving traditional time series analysis problems because it candecompose and reconstruct financial time series data from a wide range of time and frequency domains. The wavelettransformation essentially uses multi-scale features to denoise the data set, thus separating the useful signal fromthe noise efficiently Qiu, Wang and Zhou (2020) used the coif3 wavelet function for three decomposition layers andevaluated the effect of the wavelet transform by the signal-to-noise ratio (SNR) and root mean square error (RMSE).Thehigher the SNR, the smaller the RMSE, and the higher the SNR, the smaller the RMSE, and the lower the denoisingof the wavelet transform, the better the results. By using the following equationSNR = 10 × log ⎡⎢⎢⎣ ∑ 𝑁𝑗 =1 𝑥 𝑗 ∑ 𝑁𝑗 =1 ( 𝑥 𝑗 − ̂𝑥 𝑗 ) ⎤⎥⎥⎦ (1)we can denoise the collected stock data, we can get the cleaned data for our method. In the sentiment analysis, we found that based on existing models trained on Twitter data, the average sentimentscore for 98.1% of the dates was greater than 0 (positive), which could be due to the fact that the majority of Twitterusers have a positive view of the COVID-19 situation, or due to the fact that the models used in our analysis weretrained on regular Twitter data, which makes the models more likely to be positive than negative in the analysis. Whentweeting about COVID-19, the results were somewhat on the positive side. Based on this perception, and to justifythe study, we normalized the obtained sentiment scores by 𝑆 𝑖 = 𝑆 𝑖 − ̄𝑆 and compared them to the three major stockindices, and the results are shown in Fig. 2. Based on a comparison of sentiment scores and stock market changes, we can perform a preliminary analysis onthe effect of Twitter-related changes in COVID-19 sentiment on stock prices.The Dow Jones Industrial Average has often been a key indicator of the overall market performance for the U.S.stock market. As the oldest stock price index in the world with a history of over 100 years, it is comprised of only30 constituent corporations, of which, are the 30 largest and most well-known listed companies in the United States.However, with more than 10,000 stocks listed on the U.S. stock market, many experts and scholars doubt the capabilityof the DJIA to be an effective market index, with their 30 constituents. Still, we should note that the 30 constituentsare all significant corporations in the United States, each with a large reference value that could be used as an indicatorof the overall market performance for an investors reference. We can also find from the generated sentiment analysisthat the Dow Jones is the one that is most closely correlated with the change in sentiment scores.
Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 4 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
Figure 2:
Stock Index
The NASDAQ index was created in 1971 as a key indicator of technology stocks around the world. The constituentsinclude all shares listed on the NASDAQ in the United States and is a key indicator of technology stocks aroundthe world. The NASDAQ index has more than 5,000 constituents, covering all aspects of biotechnology, such ascomputer hardware, software, semiconductors, network communications, etc. It is the preferred reference for investingin technology stocks. But relatively, the changes in the index of technology stocks relative to the changes in sentimentscores relatively large differences, which may be because technology stocks and the company’s own technology leveland product technology research and development more relevant, less affected by the COVID-19 situation of publicsentiment.The S&P 500 Index is an overall measure of the top 500 publicly traded companies in the U.S. The rating companyStandard & Poor’s has selected 500 leading companies in various industries (and the 500 largest companies in the U.S.with the highest market capitalization) in the U.S. stock market based on market capitalization and liquidity, selectedto cover the two major U.S. stock exchanges (New York Stock Exchange and Nasdaq Stock Exchange). The S&P 500contains more companies than the Dow Jones Industrial Average, and therefore better reflects changes in the stockmarket and is more risk diversified. In addition, the S&P 500 and the Dow Jones Industrials use different weightings,with the Dow being weighted by stock price and the S&P 500 being weighted by market capitalization, which betterreflects the actual value of a company’s stock and can even reflect the rise and fall of the U.S. economy.
Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 5 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
3. Method
Predicting stock prices is nothing new. In the field of econometrics, many different methods have been applied tothe prediction of stock prices. One of the famous model is Financial Time Series (FTS) Tsay (2005). FTS modelinghas a long history, having first revolutionized algorithmic trading in the early 1970’s. FTS analysis consists of twotypes of analysis: fundamental and technical. However, both types of analysis have been challenged by the efficientmarket hypothesis (EMH) Malkiel (2003), a controversial hypothesis that has been around since 1970.Since its introduction in 1970, the EMH has been controversial, assuming that stock prices are ultimately unpre-dictable. This does not limit the study to FTS modeling by using linear, nonlinear, and ML-based models. Becausefinancial time series are non-stationary, non-linear, and noisy, it is difficult for traditional statistical models to pre-dict them accurately. In recent years, more and more studies have attempted to apply deep learning to stock marketforecasting, although it is still far from perfect.In Lin, Guo and Hu (2013) propose a support vector machine (SVM) based stock prediction method to developa two-part feature selection and prediction model, and demonstrate that the method has better generalization abilitythan traditional methods. In Wanjawa and Muchemi (2014) propose a neural network for predicting stock prices usinga feedforward multilayer perceptron with backpropagation of errors. The results show that the model is capable ofpredicting a typical stock market.The research entered the LSTM era in 2017 and the proliferation of research using LSTM networks to process timeseries data. LSTM was proposed by Hochreiter and Schmidhuber (1997) and recently refined and popularized by AlexGraves. Zhao, Rao, Tu and Shi (2017) propose to add a time-weighted function to LSTM and the results outperformother models.Qiu et al. (2020) It combines the LSTM and an attention mechanism to design an attention-based LSTM thencompares it with the LSTM model, the LSTM model with wavelet denoising, and the gated recurrent unit(GRU)neural network model to show the advantages of the incorporation with the attention mechanism. Around the sametime, a new architecture of neural network, Deep Wide Area Neural Network (DWNN), is proposed. The results showthat the DWNN model can reduce the mean squared error of the forecast by 30% compared to the conventional RNNmodel. Kim and Won (2018) proposed to integrate CNN and DWNN models into a single model, which can reducethe mean-squared error of forecasts by 30% compared to conventional RNN models. A hybrid neural network modelis proposed for a quantitative stock selection strategy to determine stock market trends and then to predict stock pricesusing LSTM, and a hybrid neural network model is proposed for a quantitative timing strategy to increase profits. InJiang, Tang, Chen, Wang and Huang (2018) use LSTM neural network and RNN to build models and find that LSTMcan be better applied to stock prediction. In their paper Jin, Yang and Liu (2019) added investors’ sentiment propensityto the model analysis and introduced empirical modal decomposition (EMD) in combination with LSTM to obtainmore accurate Stock Prediction. LSTM models based on attentional mechanisms are common in speech and imagerecognition, but are rarely used in finance.For financial markets, deep learning methods cannot be applied directly to the stock market. Specifically, the abilityof algorithms such as LSTM to handle serial data has been proven in scientific research over a long period of time.However, for stock analysis, the predictive power of these algorithms is far from adequate. The stock market is notat all as simple as the analysis of serial data. There have been thousands of projects in the scientific field trying touse LSTM or other time series analysis methods to predict stocks, but there are few published algorithms that can bepractically applied in the market. The so-called deep learning is just an inductive method based on fitting historicaldata. If deep learning is used to make stock predictions, the long-term returns will definitely be negative because themarket is changing, the laws are changing, and history may repeat itself but it will not be the same.It is not meaningless to apply deep learning to the stock market. For example, sentiment analysis can be appliedto news, social media, etc. to analyze the overall sentiment of the market towards a particular stock or a particularsector Haroon and Rizvi (2020). The stock market is essentially a game process, and observing the game processthrough sentiment analysis is a better entry point for introducing existing analytic algorithms into the financial market.Stock prediction is never a matter of putting a simple time series data into deep learning and making money. There isa wide variety of data on stock buy points, trading volume, historical prices, etc., and they serve different purposes.Instead of trying to uncover the complex mathematical models of the stock financial market, we should change theentry point and analyze the relationship between stock price changes and emotional fluctuations from the perspectiveof the emotions of ordinary stockholders who buy stocks Schumaker and Chen (2009).
Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 6 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
Table 2
Performance of the main forecasting models in the DJIA data setModel Accuracy(Normal) F1-Score(Normal) Accuracy(Covid-19) F1-Score(Covid-19) DifferenceARIMA 48.16% 0.345 43.40% 0.298 -4.76%TCN 56.48% 0.514 48.69% 0.441 -7.79%CNN 53.28% 0.522 49.13% 0.473 -4.15%LSTM 55.71% 0.508 51.21% 0.451 -4.50%
In 2020, one of the most important events in the world is COVID-19, a global epidemic that has affected all indus-tries and caused extreme volatility in stock market prices. Due to COVID-19 and its impact on work life, we believethat starting with people’s sentiment towards COVID-19 is an excellent attempt to analyze stock prices by exposing theimpact of the public’s sentiment towards the pandemic on social media with their investments and information on thefinancial sector in such an unprecedented pandemic. After the COVID-19 outbreak, many researchers turned their at-tention to social media and tried to uncover useful information related to COVID-19. Nowadays, Twitter is consideredone of the reliable indicators for analyzing the spread of epidemics, and the data generated by users’ activities on socialmedia is becoming one of the important bases for discovering ways to track and analyze epidemic outbreaks Polyzos,Samitas and Spyridou (2020). Thus, we use both time series stock index and COVID-19 tweets sentiment analysisscore as input data for the LSTM model.
In this paper, we use traditional financial time series of neural networks Chen, Chen, Huang, Huang and Chen(2016) and the information propagation formula can be written as 𝑓 𝑡 = 𝜎 ( 𝑊 𝑖𝑓 𝑥 𝑡 + 𝑏 𝑖𝑓 + 𝑊 ℎ𝑓 ℎ 𝑖 −1 + 𝑏 ℎ𝑓 ) (2) 𝑖 𝑡 = 𝜎 ( 𝑊 𝑖𝑖 𝑥 𝑡 + 𝑏 𝑖𝑖 + 𝑊 ℎ𝑖 ℎ 𝑡 −1 + 𝑏 ℎ𝑖 ) (3) 𝑜 𝑡 = 𝜎 ( 𝑊 𝑖𝑜 𝑥 𝑡 + 𝑏 𝑖𝑜 + 𝑊 ℎ𝑜 ℎ 𝑡 −1 + 𝑏 ℎ𝑜 ) (4) 𝑔 𝑡 = tanh( 𝑊 𝑖𝑔 𝑥 𝑡 + 𝑏 𝑖𝑔 + 𝑊 ℎ𝑔 ℎ 𝑡 −1 + 𝑏 ℎ𝑔 ) (5) 𝑐 𝑡 = 𝑓 𝑡 ⊙ 𝑐 𝑡 −1 + 𝑖 𝑡 ⊙ 𝑔 𝑡 (6) ℎ 𝑡 = 𝑜 𝑡 ⊙ tanh 𝑐 𝑡 (7)where 𝑓 , 𝑖, 𝑜 represents the proportionality coefficients of forgetting, input, and output, respectively, and 𝑔, 𝑐, ℎ rep-resents the candidate state, cell state, and hidden layer state, respectively. The scale coefficients are all used with asigmoid function to limit the range of coefficients, and the candidate state is related to the information of the inputand the hidden state of the previous time layer. The cell state can be considered as a kind of memory cell, and whenupdating the memory cell, the previous memory is selected to be partially forgotten and the new information is partiallyaccepted, and the hidden layer values get information directly from the current memory cell state into a valve output(the output coefficient 𝑜 𝑡 ).
4. Experiment
The pre-trained models are used to predict the bench mark for the stock price changes during COVID-19. We usepartial data before the September-October 2020 U.S. election as a test set to avoid the dramatic impact of other majorevents on stock prices. The Table 2 shows the performance of the major stock market forecasting models on stockprice forecasts for the COVID-19 period. As expected, existing deep neural network stock market prediction modelsare unable to effectively predict stock market prices when major events occur. This is because these prediction modelsare trained based on data from the past decades and are fitted based on the patterns of those decades and cannot copewith sudden major events or changes. We can find that under normal conditions, some existing prediction models suchas TCN, CNN and LSTM can obtain an accuracy rate greater than and an F1-Score higher than . . However,when testing the trained stock market prediction models with the data during the epidemic, only the LSTM obtainedslightly higher than accuracy and none of the tested models could obtain an F1-Score higher than 0.5 for theepidemic test data. Also, every tested prediction model showed more than a drop in accuracy when predicting thetest data during COVID-19. Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 7 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
Table 3
Input Data for Different ModelsModel ARIMA CNN TCN WB-TCN LSTM S-LSTMRaw Data R R R N R S + RProcessed Training Data R R R word embedding R sentiment score + R
Figure 3:
The predict results of the tested methods
Table 4
Input Data for Different Models Model TCN WB-TCN LSTM S-LSTMAccuracy 75.26% 83.11% 80.31% 92.04%
The algorithm we use to uncover the relationship between stock market and social media sentiment is Sentiment(S)-LSTM, which, as mentioned before, can combine past stock market data to extract time-series patterns and combinethem with the social media sentiment scores we obtain from our sentiment analysis model to learn and obtain a stockmarket index prediction model. The input data types for different tested model are shown in Table 3, where 1) Timeseries stock price data R: stock price dataset consisting of daily records of the Dow Jones Industrial Average; 2) Textnews data N: news dataset consisting of historical news from the Reddit WorldNews channel; 3) Twitter sentimentanalysis data S: sentiment scores generated based on relevant Twitter data.We tested different models using the Dow Jones Industrial Index Close Prize for each day of September 2020 asa test dataset. The input data are the Dow Jones Industrial Index Close Prize for the previous three days, the Twittersentiment scores for the previous three days, and the COVID-19 related news text data for the previous three days.This is due to the relative stability of the U.S. stock market in September 2020, as well as the absence of events ofgreat impact both domestically and internationally in the United States. Compared to November, the U.S. stock marketis more volatile due to the election, which is not conducive to testing the performance of different models relative toCOVID-19. The results of the test are shown in the Figure 3.We use equation | 𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 − 𝑃 𝑡𝑟𝑢𝑡ℎ 𝑃 𝑡𝑟𝑢𝑡ℎ | to calculate the accuracy of different tested models and shown in Table 4. Inthis test, we can clearly see that S-LSTM is the most accurate prediction algorithm, with a significant improvementcompared to the traditional LSTM. Meanwhile, WB-TCN, which combines news text output, also shows advantagesover traditional TCN and LSTM. This proves that the model combining more outputs has unparalleled advantages indealing with unexpected events. Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 8 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
5. Discussion
As we have shown in Table 2, LSTM neural network may not be enough to predict stock market price. It is combinedwith text sentiment over a short period of time in our experiment, as seen in Section 4, that we can make full use ofthis model. Based on the results, our implicates that when sentiment is applied to neural networks, most notably theS-LSTM model, it will yield better results and can be used as a model for closing price of stocks to a high degree ofaccuracy.Quantitative analysis work can be divided into three parts from a broad perspective: macro, meso and micro.Macro, through the study of macroeconomic data to analyze and judge the future economic trends, meso is based onindustry data, industry trends, rotation, etc., while micro is based on the company’s fundamental data, stock selection.Economic trends will largely be reflected in the stock market, the economic form is good, the stock market for theprobability of bull market. When the economic fundamentals are weak and downward pressure is high, the stock marketis lukewarm. For example, in 2018, the overall economic downturn, the stock market fell more, and stock market trendsand macroeconomic trends show a strong correlation. Macroeconomic forecasts will be the most important referencewhen making investment decisions, and industry and firm-level analysis will be done on this basis.Existing models basically use stock market data from the past decades for bench marking, to measure the generalmarket trend, and to forecast future trends based on macro data analysis. However, the strategies among them take intoaccount fewer factors and are unlikely to explain the complexity of the stock market, and need to be used in conjunctionwith other strategies.From the performance of different models on the test set of stock market data during COVID-19, we can see that theevent sequence has a clear deficiency for complex stock market prediction. Even the published stock market forecastingmodels are not able to predict stock market fluctuations well when encountering large unexpected events. In this case,a prediction model that contains more information can be more advantageous. Whether it is news information or socialmedia information, it allows the model to obtain more macro information and make more reliable forecasts.In terms of specific model inputs, the emotional responses of users on Twitter caused by COVID-19 can sig-nificantly improve the prediction accuracy of the LSTM model. Compared with the news text output for TCN, theimprovement of Twitter sentiment analysis is greater.
6. Conclusion
We want to understand if we can further the research of using text sentiment from social media as news to predictstock market prices. With social media as a source of news, we use text sentiment to further the econometric domainin understanding and predicting stock market prices. We accomplish the highest model accuracy through an S-LSTMmodel. We highlight the data processing required for text sentiment analysis as well as financial time series data onmarket indexes performances over time. With a different fit for each index, we measured the performance of ARIMA,TCN, CNN and LSTM models with DJIA data set. Lastly, we ran an experiment on the time periods of September -October 2020 pre-trained to discover than it was the S-LSTM gave us the best fit. We found a range of correlation ofCOVID-19 sentiment with prices. In conclusion, further research of applying text sentiment to predict stock marketperformance is necessary. The more insight that text sentiment data can give us, the more news or information we areable to apply to reach full information and reflect the true value of the stock price.
CRediT authorship contribution statement
Ziyuan Xia:
Conceptualization of this study, Methodology, Modeling tool, Economic analysis, and Writing.
Jef-frey Chen:
Data curation, Big data, Cloud computing, and Technical analysis.
References
Baek, S., Mohanty, S.K., Glambosky, M., 2020. Covid-19 and stock market volatility: An industry level analysis. Finance Research Letters 37,101748.Baker, S.R., Bloom, N., Davis, S.J., Kost, K., Sammon, M., Viratyosin, T., 2020. The unprecedented stock market reaction to covid-19. The Reviewof Asset Pricing Studies 10, 742–758.Bollen, J., Mao, H., Zeng, X., 2011. Twitter mood predicts the stock market. Journal of computational science 2, 1–8.Chen, J.F., Chen, W.L., Huang, C.P., Huang, S.H., Chen, A.P., 2016. Financial time-series data analysis using deep convolutional neural networks,in: 2016 7th International Conference on Cloud Computing and Big Data (CCBD), IEEE. pp. 87–92.
Ziyuan Xia et al.:
Preprint submitted to Elsevier
Page 9 of 10ining the Relationship Between COVID-19 Sentiment and Market Performance
Chernozhukov, V., Kasahara, H., Schrimpf, P., 2021. Causal impact of masks, policies, behavior on early covid-19 pandemic in the u.s. Journal ofEconometrics 220, 23 – 62.Fama, E.F., 1965. The behavior of stock market prices journal of business, vol. 38 .Fama, E.F., 1991. Efficient capital markets: Ii. The journal of finance 46, 1575–1617.Fama, E.F., Fisher, L., Jensen, M.C., Roll, R., 1969. The adjustment of stock prices to new information. International economic review 10, 1–21.Google, . Financial services | google cloud. https://cloud.google.com/solutions/financial-services . (Accessed on 12/27/2020).Gunther Capelle-Blancard, A.D., 2020. The stock market and the economy: Insights from the covid-19 crisis | vox, cepr policy portal. https://voxeu.org/article/stock-market-and-economy-insights-covid-19-crisis . (Accessed on 12/27/2020).Haroon, O., Rizvi, S.A.R., 2020. Covid-19: Media coverage and financial markets behavior—a sectoral inquiry. Journal of Behavioral and Experi-mental Finance , 100343.Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735–1780.Huang, W., Zheng, Y., 2020. Covid-19: Structural changes in the relationship between investor sentiment and crude oil futures price. EnergyResearch Letters 1, 13685.Jiang, F., Zhao, Z., Shao, X., 2020. Time series analysis of covid-19 infection curve: A change-point perspective. Journal of econometrics .Jiang, Q., Tang, C., Chen, C., Wang, X., Huang, Q., 2018. Stock price forecast based on lstm neural network, in: International Conference onManagement Science and Engineering Management, Springer. pp. 393–408.Jin, Z., Yang, Y., Liu, Y., 2019. Stock closing price prediction based on sentiment analysis and lstm. Neural Computing and Applications , 1–17.Keane, M., Neal, T., 2021. Consumer panic in the covid-19 pandemic. Journal of Econometrics 220, 86 – 105.Kim, H.Y., Won, C.H., 2018. Forecasting the volatility of stock price index: A hybrid model integrating lstm with multiple garch-type models.Expert Systems with Applications 103, 25–37.Lin, Y., Guo, H., Hu, J., 2013. An svm-based approach for stock market trend prediction, in: The 2013 international joint conference on neuralnetworks (IJCNN), IEEE. pp. 1–7.Liu, M., Choo, W.C., Lee, C.C., 2020. The response of the stock market to the announcement of global pandemic. Emerging Markets Finance andTrade 56, 3562–3577.Malkiel, B.G., 2003. The efficient market hypothesis and its critics. Journal of economic perspectives 17, 59–82.Manski, C.F., Molinari, F., 2020. Estimating the covid-19 infection rate: Anatomy of an inference problem. Journal of Econometrics .Nisar, T.M., Yeung, M., 2018. Twitter as a tool for forecasting stock market movements: A short-window event study. The journal of finance anddata science 4, 101–119.Polyzos, S., Samitas, A., Spyridou, A.E., 2020. Tourism demand and the covid-19 pandemic: An lstm approach. Tourism Recreation Research ,1–13.Qiu, J., Wang, B., Zhou, C., 2020. Forecasting stock prices with long-short term memory neural network based on attention mechanism. PloS one15, e0227222.Salisu, A.A., Vo, X.V., 2020. Predicting stock returns in the presence of covid-19 pandemic: The role of health news. International Review ofFinancial Analysis 71, 101546.Schumaker, R.P., Chen, H., 2009. Textual analysis of stock market prediction using breaking financial news: The azfin text system. ACM Transac-tions on Information Systems (TOIS) 27, 1–19.Stojkoski, V., Utkovski, Z., Jolakoski, P., Tevdovski, D., Kocarev, L., 2020. The socio-economic determinants of the coronavirus disease (covid-19)pandemic. arXiv preprint arXiv:2004.07947 .Topcu, M., Gulal, O.S., 2020. The impact of covid-19 on emerging stock markets. Finance Research Letters 36, 101691.Tsay, R.S., 2005. Analysis of financial time series. volume 543. John wiley & sons.Twitter, . Twitter api documentation | docs | twitter developer. https://developer.twitter.com/en/docs/twitter-api . (Accessed on12/27/2020).Wanjawa, B.W., Muchemi, L., 2014. Ann model to predict stock prices at stock exchange markets. arXiv preprint arXiv:1502.06434 .Yahoo, . Yahoo finance api documentation (apidojo) | rakuten rapidapi. https://english.api.rakuten.net/apidojo/api/yahoo-finance1 . (Accessed on 12/27/2020).ZEREN, F., HIZARCI, A., 2020. The impact of covid-19 coronavirus on stock markets: Evidence from selected countries. Muhasebe ve Finansİncelemeleri Dergisi 3, 78–84.Zhang, D., Hu, M., Ji, Q., 2020. Financial markets under the global pandemic of covid-19. Finance Research Letters , 101528.Zhao, Z., Rao, R., Tu, S., Shi, J., 2017. Time-weighted lstm model with redefined labeling for stock trend prediction, in: 2017 ieee 29th internationalconference on tools with artificial intelligence (ictai), IEEE. pp. 1210–1217.Ziyuan Xia is a second-year student in the Management of Technology Master of Science program at the New York UniversityTandon School of Engineering. He received his bachelor’s degree in Economics with a minor in Textile & Clothing from theUniversity of California, Davis. His research interests lies in applying Data Analytic techniques within the fields of Management,Finance, and Economics.Jeffrey Chen is a second-year student in the Master’s of Information Systems program at the University of Maryland Robert H.Smith School of Business. His research interests are in deep learning, AI, and operation management.
Ziyuan Xia et al.: