Topic, Sentiment and Impact Analysis: COVID19 Information Seeking on Social Media
Md Abul Bashar, Richi Nayak, Thirunavukarasu Balasubramaniam
TTopic, Sentiment and Impact Analysis: COVID19 InformationSeeking on Social Media
Md Abul Bashar, Richi Nayak, Thirunavukarasu Balasubramaniam
Queensland University of TechnologyBrisbane, Queensland, Australia{m1.bashar,r.nayak,t.balasubramaniam}@qut.edu.au
ABSTRACT
When people notice something unusual, they discuss it on socialmedia. They leave traces of their emotions via text expressions. Asystematic collection, analysis, and interpretation of social mediadata across time and space can give insights on local outbreaks,mental health, and social issues. Such timely insights can help in de-veloping strategies and resources with an appropriate and efficientresponse. This study analysed a large Spatio-temporal tweet datasetof the Australian sphere related to COVID19. The methodologyincluded a volume analysis, dynamic topic modelling, sentimentdetection, and semantic brand score to obtain an insight on theCOVID19 pandemic outbreak and public discussion in differentstates and cities of Australia over time. The obtained insights arecompared with independently observed phenomena such as gov-ernment reported instances.
CCS CONCEPTS • Applied computing → Sociology . KEYWORDS
COVID19, Sentiment Analysis, Topic Analysis, Impact Analysis,CNN, ULMFit, Dynamic Topic Modelling, SBS
ACM Reference Format:
Md Abul Bashar, Richi Nayak, Thirunavukarasu Balasubramaniam. 2018.Topic, Sentiment and Impact Analysis: COVID19 Information Seeking onSocial Media. In
Woodstock ’18: ACM Symposium on Neural Gaze Detection,June 03–05, 2018, Woodstock, NY.
ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/1122445.1122456
An outbreak of infectious diseases such as COVID19 has a devas-tating impact on society with severe socio-economic consequences.The COVID19 pandemic has already caused the largest global reces-sion in history; global stock markets have crashed, travel and tradeindustries are losing billions, schools are closed, and health caresystems are overwhelmed. Mental health and social issues creepup as people fear of catching the disease or losing loved ones, asthey lose jobs, or as they are required to stay in isolation.
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
Woodstock ’18, June 03–05, 2018, Woodstock, NY © 2018 Association for Computing Machinery.ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00https://doi.org/10.1145/1122445.1122456
An insight into an outbreak is essential for controlling infectiousdiseases and identifying subsequent mental and social issues [2].This will help in reducing costs to the economy over the long termand bringing harmony to the society. Especially, early detectionhelps in placing strategies and resources for an appropriate andefficient response. On social media, people discuss things that theyobserve in community [2] They leave traces of their emotions viatext expressions [9]. A systematic collection, analysis, and interpre-tation of social media data can give insight into an outbreak. Twitteris one of the most popular micro-blogging social media websiteswhere users express their thoughts and opinions on real-worldevents[7]. Social scientists have used tweet datasets for variouspurposes such as investigating public opinion of Hurricane Irene[15] and election result prediction [20].Recently, the spatio-temporal texts collected from Sina-Weibo(Twitter alike microblogging system in China) was analysed tounderstand public opinions on COVID19 related topics [10]. Thestatic topic modelling technique (LDA) and Random Forest classifierwere used to group tweets into topics for analysis. Studies havealso been published to analyse climate change-related tweets andunderstand what are the topics of discussion, how the tweet volumeand sentiment changed over time [1, 3, 7]. Authors in [13] appliedtopic modelling on a corpus of geotagged tweets collected from theLondon sphere. Topic modelling has also been used to estimate thesimilarity between users in a location-based social network [14]and to estimate the relatedness of businesses based on businessdescriptions [19].In this paper, we analyse a large Spatio-temporal tweet datasetof the Australian sphere Twitter containing certain keywords re-lating to COVID19. The methodology included a volume analysis,Dynamic Topic Modelling [5], Sentiment Detection [16], and Se-mantic Brand Score (SBS) [8] to obtain an insight into COVID19outbreak in different states and cities of Australia over the time.The obtained insights are compared with independently ob-served phenomena such as government reported instances andnews on newspapers. To our knowledge, ours is a first in-depthstudy of understanding Australian people’s perception of this on-going COVID19 pandemic using a large Twitter data collection.More specifically, this study makes the following main contribu-tions. (a) Investigates how closely the insights into local outbreakmatch independently observed phenomena in space and time. (b)Understands what topics related to COVID19 have been discussedin communities and how they change over time. (c) Understandsthe COVID19 related sentiments in communities over time. (d) In-vestigates the impact of COVID19 related concepts or words on Location of author or tweet or a location mentioned in the tweet is Australia or anyof its cities a r X i v : . [ c s . S I] A ug oodstock ’18, June 03–05, 2018, Woodstock, NY Bashar, Richi and Thiru, et al. Figure 1: The Experimental Workflow social media discussion. (e) Proposes a simple but effective CNNarchitecture for sentiment analysis.
The aim of this study is to use social media analysis to uncoverwhat is happening in communities and to give insight into (a)how the virus and lockdown is affecting community emotions, (2)understand the main topics or themes emerging and evolving in theconversation and (3) impact of different COVID19 related concepts.In this study, we conduct spatio-temporal analysis of volume,sentiment, topic, and impact to a large volume of COVID19 re-lated tweets from the Australian sphere. First, we collect a datasetof tweets from the Australian sphere containing geospatial andtemporal values. The dataset is then preprocessed and preparedfor volume analysis, sentiment analysis, dynamic topic modelling,and impact analysis. The volume analysis aims to identify basicgeospatial and temporal facts from the dataset which will facilitatesubsequent analysis such as sentiment and topic into context. Dy-namic topic modelling extracts topics present in the dataset andshows how those topics evolve over time. Sentiment analysis de-termines the sentiment of every tweet to show how communitysentiments change over time. Impact analysis generates networksof concepts or words from the text collection and uses those net-works to measure how differently the concepts or words impact adiscussion. The analytical findings are then discussed and evaluatedalong with the comparison with independent observations.
As we practice social distancing, our embrace of social media getsonly higher. The major social media platforms have emerged ascritical information purveyors during the expanding pandemic.TwitterâĂŹs number of active users in the first three months of2020 increased by 23% compared to the end of 2019, which is about12 million more users.We collected Twitter conversation in the Australian Sphere onCoronavirus since November 27th when the first break out oc-curred in China. The data collection is done via the QUT facil-ity of Digital Observatory using the Twitter Stream ApplicationProgramming Interface (API). The dataset consists of 2.9 milliontweets from 27 November 2019 to 7 April 2020. Every tweet in thedataset contains or uses as a hashtag at least one of the followingkeywords: coronavirus, covid19, covid-19, covid_19, coronoviru-soutbreak, covid2019, covid, and coronaoutbreak. https://bit.ly/2Z6RUvU The body of each tweet, i.e. tweet message, is used for analysingsentiment, topics and impact. Location and time information of eachtweet gives it spatio-temporal dimensions. The location informationfor each tweet comes from either of three sources based on theiravailability: (a) tweet location, i.e. the user was in when the tweetwas posted; (b) user location, i.e. residence of the user; or (c) alocation mentioned in the tweet message. The locations are mappedto capital cities, states, or the country Australia depending on howgranular level location information could be extracted. The timeinformation of each tweet comes from the time and day the tweetwas posted. Table 1 shows a few examples.For preprocessing, we removed stopwords, punctuations, invalidcharacters. We dropped any non English tweets. We fixed repeatingcharacters, converted text to lowercase, replaced an occurrence oflink or URL with a token namely xurl, and stemmed the text. Weused Named-entity Recognition (NER) technique from spacy toextract locations.For sentiment analysis, we collected the sentiment140 dataset from kaggle. This dataset contains 1.6 million annotated tweets.The tweets are annotated for classes of sentiments: positive andnegative. We train a classifier model using these tweets to detectsentiment in the collected dataset of 2.9 million tweets. We use thesame preprocessing as above to prepare this dataset. Analysing the volume of tweets posted from a particular area at aparticular time is an important step of data exploration that canprovide interesting insights into observations [7]. We analysis thenumber of tweets posted in each state and capital in Australia overtime.
Sentiment analysis is used to identify the emotional state or opin-ion polarity in the samples. We identify the sentiment of eachtweet using our proposed CNN based sentiment classifier. We thenaggregate tweets by location and time to obtain spatio-temporaldistribution of sentiments. The following section gives a summaryof the proposed CNN architecture for sentiment classification.
Figure 2 illustrates the architecture ofCNN model used to identify sentiments of COVID19 related tweets.The model uses word embedding [4, 18] to represent each word w in an n -dimensional word vector w ∈ R n where the dimension n is empirically set to 200. We represent a tweet t with m wordsas a matrix t ∈ R m × n utilising word embedding. We apply theconvolution operation to the tweet matrix with one stride.Each convolution operation applies a filter f i ∈ R h × n of size h . The convolution is a function c ( f i , t ) = r ( f i · t k : k + h − ) , where t k : k + h − is the k th vertical slice of the tweet matrix from position k to k + h − f i is the given filter and r is a Rectified Linear Unit(ReLU) function. The function c ( f i , t ) produces a feature c k similarto n Grams for each slice k , resulting in m − h + c i = max ( c ( f i , t )) . Max-pooling captures themost important feature for each filter. Empirically, based on the https://spacy.io/usage/linguistic-features opic, Sentiment and Impact Analysis: COVID19 Information Seeking on Social Media Woodstock ’18, June 03–05, 2018, Woodstock, NY Table 1: Example of tweets in the dataset. Location is extracted, @someone and @something is used to anonymise a personor an organisation mentioned in the tweet, a token URL is used to replace any occurrence of hyperlink or URL.
Location Tweet Text TimeAustralia RT @someone: Coronavirus patient sealed in a PLASTIC TUBE to avoid contamination URLl @something 22/01/2020 17:47Melbourne RT @someone: Me seeing the doomsday clock going to a 100 seconds, Australia on fire and the coronavirus all trendingon the same day \n
Figure 2: CNN Model for Sentiment Identification accuracy improvement in ten-fold cross validation, 256 filters areused for h ∈ { , } and 512 filters for h ∈ { } . As there are a totalof 1024 filters (256+256+512) in the proposed model, the 1024 mostimportant features are learned from the convolution layer.We then pass these features to a fully connected hidden layerwith 256 perceptrons that use the ReLU activation function. Thisfully connected hidden layer learns the complex non-linear interac-tions between the features from the convolution layer and generates256 higher-level new features. Finally, we pass these 256 higher-level features to the output layer with a single perceptron that usesthe sigmoid activation function. The perceptron in the output layergenerates the probability of the tweet in our data collection beingpositive or negative sentiment.In this architecture (Figure 2), a proportion of units are randomlydropped-out from each layer except the output. This is done toprevent the co-adaptation of units in a layer and to reduce over-fitting. We set 50% units dropout in the input layer, the filters ofsize 3, and the fully connected hidden layer based on best empir-ical results. Only 20% units are set to dropout from the filters ofsizes 4 and 5. Python code for this model is available online athttps://github.com/mdabashar/sentiment_analysis. A variety of subjects or topics are usually discussed in the tweetsover time. Knowing those topics and how they evolve is importantto understand the dynamics of discussion related to coronavirus.Because of the large size of the tweet dataset, it is very difficult,if not impossible, to read all of the tweets for finding out theirtopics. We use an unsupervised machine learning technique knownas topic modelling [5] to discover subjects topics of discussion intweets and how those topics evolve over time. Topic models arethe most popular statistical methods that analyse the words in adocument collection to discover the themes that run through thedocuments, how those themes are connected, and how they changeover time [5, 6].In general, a topic modelling technique (e.g. Latent DirichletAllocation [6]) uses word co-occurrences within documents for finding topics in a document collection. Words occurring in thesame document are more likely coming from the same topics [6, 7];and documents that contain the same words are more likely consistof the same topics [6, 7]. We use each tweet as a document todiscover topics in our tweet collection.Static topic modelling [6] explicitly treats words exchangeablyand implicitly treats documents exchangeably [5]. However, theassumption of exchangeable documents is inappropriate for manycollections such as tweets, news articles, and scholarly articlesas they are evolving content. For example, tweets published indifferent time periods may be related to a specific topic namely coronavirus cure but the coronavirus cure can be much different inlater stages of time than the early stage. The themes in a tweet col-lection evolve, and it is of interest to explicitly model the dynamicsof the underlying topics.Dynamic topic modelling [5] extends static topic modelling [6] toincorporate topic evolution. Dynamic topic modelling can capturethe evolution of topics in a sequentially organised collection oftweets or documents. In this setting, tweets are grouped by weeks,and each weekâĂŹs tweets arise from a set of topics that haveevolved from the last weekâĂŹs topics. We use dynamic topicmodelling to observe topics of discussion and how they changeover time.Choosing a reasonable number of topics is important becausetoo few topics could lead to merging distinct topics whereas toomany topics could result in fragmented topics that otherwise couldmake a cohesive topic together. Therefore, we manually evaluatedtopic models with topics ranging from 5 to 50, to determine theoptimal number of topics. Other hyperparameters are set to thedefault value in gensim (the Python software library used for topicmodelling).To ensure that the topics discovered by dynamic topic modellingare meaningful and not dominated by the same top words, weremoved keywords and hashtags (e.g. covid19, coronavirus, etc.)that we used for collecting our tweets. We also removed rare words(i.e. words with a very low frequency) to reduce noise in the topics. oodstock ’18, June 03–05, 2018, Woodstock, NY Bashar, Richi and Thiru, et al.
The Semantic Brand Score (SBS) estimates the impact or importanceof concepts, brands, or entities in a text collection [8]. We use SBSto understand the impact of different COVID19 related concepts orentities on social media discussion.SBS is based on graph theory that combines methods of SocialNetwork and Semantic Analysis using the word co-occurrence net-work. It is estimated as the standardised sum of three components:prevalence, diversity, and connectivity.Prevalence
PREV ( c ) calculates the number of times a word orconcept c is mentioned in a text collection. Prevalence is associatedwith the idea of brand awareness assuming that when a concept isfrequently mentioned increases its recognition and recall for thosewho read it.Diversity DIV ( c ) of a word or concept c estimates the hetero-geneity of concepts surrounding the concept c . It is the degree ofcentrality in the co-occurrence network. The degree of centrality isestimated by counting the number of edges directly connected tothe concept node c .Connectivity CON ( c ) of a word or concept c estimates the con-nectivity of the concept c with respect to a general discourse. Itrepresents the ability of the concept node c to act as a bridge be-tween other nodes in the network. Connectivity is widely usedin social network analysis as a measure of influence or control ofinformation that goes beyond direct links. It is estimated as CON ( c ) = (cid:213) j (cid:44) k d jk ( c ) d jk where d jk is the number of the shortest paths linking any twonodes j and k , and d jk ( c ) is the number of those shortest paths thatcontain the given concept node c .The Semantic Brand Score is estimated as SBS ( c ) = PREV ( c ) − PREVstd ( PREV ) + DIV ( c ) − DIVstd ( DIV ) + CON ( c ) − CONstd ( CON ) where . represents the mean value and std represents the standarddeviation. The temporal dimension (27 November 2019 to 7 April 2020) ofthe tweet collection is discredited by weeks (roughly 17 weeks) ordays as appropriate to the nature of the analysis. The geospatialdimension is discredited by Australian Sates and capital cities. Thetweet user location that does not list city but lists the country asAustralia is categorised as Australia (au). The locations of a smallportion of tweets could not be extracted or mapped to our selectedcategories that we categorise as others (oth).
Figure 3 shows a word cloud generated from the entire tweet col-lection. It gives a quick look into the subjects Australian people arediscussing during the COVID19 pandemic. Subjects such as ‘stayhome’, ‘work from home’, ‘toilet paper crisis’ ‘slow the spread’ etc.are commonly discussed.Figure 4 shows the geospatial and temporal distribution of tweetcount. The significant changes in tweet counts over the locationsand weeks can be noted throughout the time period. For a closer
Figure 3: A Word Cloud Generated from the entire TweetCollectionFigure 4: Geospatial and Temporal Distribution of TweetCount examination, we separate geospatial and temporal dimensions inFigures 5 and 6 respectively.Figure 5a shows the number of tweet counts in states, territoriesand capital cities in Australia. Figure 5b shows the actual numberof COVID19 positive cases in states and territories in Australia. Astrong correlation can be noted between tweet counts and COVID19cases. The more the number of COVID19 cases in a location, themore the number of tweets there. For example, the highest numberof COVID19 related tweets were observed in Sydney (syd) (i.e.the capital city of New South Wells (NSW)), where the highestnumber of COVID19 cases occurred in NSW. The second and thethird-highest number of COVID19 related tweets were observedin Melbourne (mel) (i.e. the capital city of Victoria (VIC)) and VICrespectively, where the second-highest number of COVID19 casesoccurred in VIC. The same is true for Queensland (QLD). Othercities follow a similar pattern with minor order variations. (a) Tweet Count Distributed overStates, Territories and Capital Cities(b) COVID19 Cases Distribute overStates and Territories [11]
Figure 5: Correlation between
Tweet Counts and
COVID19Cases
Distributed over States, Territories and Capital Cities opic, Sentiment and Impact Analysis: COVID19 Information Seeking on Social Media Woodstock ’18, June 03–05, 2018, Woodstock, NY
Fig 6 shows the correlation between tweet counts and COVID19cases distributed over time. A comparison between Figures 6a and6b shows that the total number of COVID19 related tweets over timeis strongly correlated with the number of new COVID19 positivecases by the notification date.Figure 6a shows that when COVID19 hit China on 27 November2019, there were not many discussions held in Australian space.A noticeable number of coronavirus related tweets started to beposted after 60 days or around eight weeks, i.e. end of January. Nextone week the number increased and then started to fall. The mainburst of tweets started after another 30 days or 4 weeks, i.e. theend of February. This might be because this time several peoplein Australia from overseas were identified COVID19 positive. Thenumber exponentially increased for the next 20 days and reachedits peak by the third quarter of March. This exponential increasemight have occurred because during this time many Australianwere identified COVID19 positive and some of them were reporteddead. Then it started to fall gradually. This might be because duringthis time government initiatives and strict social distancing workedand the COVID19 infection death rate started to decrease. (a) Tweet Count Distributed overTime(b) COVID19 Cases Distribute overTime [11]
Figure 6: Correlation between
Tweet Counts and
COVID19Cases
Distributed over Time
The experimental performance comparison of the proposed CNNmodel and the pretrained language model ULMFiT [12] is givenin Table 2. ULMFiT is pretrained with Wikitext-103 that contains28,595 preprocessed Wikipedia articles and 103 million words [17].The model is then finetuned with 1.6 million tweets from the senti-ment1040 dataset without labels. Finally, we add an extra classifierlayer at the end of the model and train the model with the 1.6million tweets from the sentiment1040 dataset with labels. We usethe same architecture, hyper parameters and training strategy forULMFiT as described in [12]. We use 80%, 10%, and 10% split of datafor training, validation, and testing respectively.Table 2 shows that the proposed CNN achieves similar perfor-mance as of ULMFiT. The performance of the CNN-based model shows that a carefully designed simple model can achieve a simi-lar performance of a sophisticated model when a reasonably sizedtraining dataset is available. The significance of this finding is thatsophisticated pretrained language models, such as ULMFiT, arecomputationally expensive and memory intensive. Effectively us-ing them becomes difficult for monitoring (i.e. classifying) a largetweet collection in a resource-constrained environment commonlyavailable to practitioners. A simple model, such as our posed CNN,that achieves a similar performance can greatly help in this regard.
Table 2: Performance Comparison of ULMFiT and the Pro-posed CNN Model
CNN ULMFiTTrue Positive 63465 65144True Negative 63399 62300False Positive 16407 17738False Negative 16729 14808Accuracy 0.793 0.797Precision 0.795 0.786Recall 0.791 0.815F Score 0.793 0.800Cohen Kappa 0.586 0.593Area Under Curve 0.793 0.797The following results of sentiment analysis are based on theproposed CNN-based model. Figure 7 shows geospatial and tempo-ral distribution of positive sentiment tweet counts vs total tweetcounts for an overall observation. As soon as COVID19 hit theworld, the positive sentiments dropped sharply (from roughly 85%to roughly 48% on average). The percentage stayed there for up toaround 12 weeks. Then it gradually changed for three weeks witha very marginal increment. For the final two weeks, the incrementwas a bit more than the previous three weeks.A possible explanation of the trend can be explained as follows.As soon as COVID19 hit the world, the online community gotshocked by the news. It took some time for world leaders to comeup with plans on how to combat COVID19. During this period(12 weeks) people remained stressed. When the world leaders ex-plained their combat plans and ideas, twitter users talked aboutthose positive initiatives during this time period (three weeks). Dur-ing the final two weeks, the Australian government announcedsocial safety plans, e.g. economic aids to organisations, businesses,and individuals; it announced more strict rules for social distancingand the COVID19 infection curve started flattening. People startedto become a bit comfortable and discussed these positive aspects intheir tweets. Consequently, the number of positive tweets has in-creased. All these patterns show that by monitoring conversationaldynamics on social media, we can identify how people are feelingduring the pandemic of COVID19, and what initiatives are workingor making people comfortable.For a closer look into the trend, the geospatial and temporaldimensions are decoupled in Figure 9 and 8 respectively. Figure8 shows the volume of COVID19 related tweets (total volume),the volume of positive tweets related to COVID19 (positive vol-ume), and their ratio (positive vs total ratio). This figure showsthat, roughly at any time, among all the COVID19 related posts, oodstock ’18, June 03–05, 2018, Woodstock, NY Bashar, Richi and Thiru, et al. only 50% of them were positive. We see two significant drops inthe ratio of positive sentiments, one is at the beginning when theworld was hit by COVID19 and the next one is by week seven orthird quarter of January 2020. During this period there were notmany discussions of COVID19 in Australia. However, the seconddrop triggered an increase in the number of COVID19 related posts.In other words, this second drop alerted the community about theupcoming danger of COVID19. We can assume that the small num-ber of tweets related to COVID19 might come from the people whoare Journalists, social workers, health care workers, or people whoare conscious of health issues.During the period when a noticeable number of posts wererelated to COVID19 (week 8 to 18), there are two small drops inthe ratio of positive sentiments. One is in week 10 and another isin week 14. Both falls are followed by a significant increase in thenumber of COVID19 related posts. Even though these two dropsare small in sentiment ratio, but the drops in the number of positivetweets were large enough to initiate triggers. It ascertains thatmonitoring the positive sentiment tweets can signal us the triggerin the increase of COVID19 related posts.
Figure 7: Geospatial and Temporal Distribution Number ofPositive vs Total Tweet RatioFigure 8: Temporal Distribution of Positive and Total Vol-ume of Tweets
Figure 9a shows how the ratio of the number of positive tweetsvs total tweets varies in Australian states and territories. It showsthat all states and territories have the positive sentiment tweet ratioof around 0.5 except Northern Territory has a slightly better ratio.This implies there is emotional stress in people over all the statesand territories. However, this figure does not clearly capture thepositive sentiment drop as cities are averaged over in the states andterritories. In reality, some cities are affected more than others byCOVID19. Therefore, we add capital cities in Figure 5b along withstates and territories.Figure 9b shows the counts of COVID19 related tweets andpositives tweets in states, territories, and capital cities. Capitalcities and states that have a significant drop in positive tweet countare Sydney (syd), Melbourne (mel), Victoria (vic), and Queensland(qld). A comparison between Figure 9b and Figure 5b shows thatthese locations had most of the COVID19 cases. A drop in positive sentiment is correlated with the number of COVID19 cases. A dropin positive sentiment is also correlated with early mental healthissues, informing that the community might need an allocation ofmental health care resources in the near future.Two interesting facts in Figure 9b can be observed in variedbehaviour between two pairs of state and its capital city, (qld, bne)and (nsw, syd) pairs. There is a significant drop in positive tweets inqld but not in bne. The majority of COVID19 cases in Queenslandhappened in Gold Coast and other surrounding areas instead ofBrisbane. Again, there is a significant drop in positive tweet countin syd but not in nsw. The majority of COVID19 cases happened inSydney rather than the other parts of nsw. This again emphasisesthat a drop in positive sentiment is directly correlated with thenumber of COVID19 cases. (a) Geospatial Distribution of Posi-tive vs Total Number of Tweet Ratio(b) Geospatial Distribution of Positiveand Total Number of Tweets
Figure 9: Geospatial Distribution of Sentiment
This section shows some of the experimental results on how COVID19related topics changed over time semantically, morphologically andsentimentally. Figure 10 shows the evolution of five topics; Topic 0:controlling the spread, Topic 1: staying in isolation and workingfrom home, Topic 2: COVID19 cases, Topic 3: racism against theChinese community, and Topic 4: impact of COVID19 outbreakworldwide.Topics 0, 2 and 4 show a similar trend even though their magni-tude and change rate are different. A close investigation shows thatthese three topics share a high similarity of subject matters. Onthe other hand Topics 3 and 1 do not resemble any trend. However,they somewhat inversely following each other. It is apparent thatall the topics evolved over time in terms of semantics, morphologyand sentiment. For example, in Topic 0 that talks about controllingthe spread of coronavirus, the words ‘need’ and ‘worker’ newlyemerged during weeks 11 and 13, whereas the words ‘island’, ‘china’,‘travel’ and ‘ban’ lost their significance during week 12, 15, 16 and17 respectively. opic, Sentiment and Impact Analysis: COVID19 Information Seeking on Social Media Woodstock ’18, June 03–05, 2018, Woodstock, NY (a) Topic 0 Word Cloud (b) Topic 0 Evolution Over Time (c) Topic 1 Word Cloud (d) Topic 1 Evolution Over Time(e) Topic 2 Word Cloud (f) Topic 2 Evolution Over Time (g) Topic 3 Word Cloud (h) Topic 3 Evolution Over Time(i) Topic 4 Word Cloud (j) Topic 4 Evolution Over Time (k) Topic Sentiment ChangeOver Time
Figure 10: Topic Clouds and Topic Evolutions
Semantic Brand Score (SBS) can capture the impact of conceptsor words in text collections that might be useful for monitoringsocial matters or instances. During the COVID19 pandemic, somesocial instances were ‘stay home’, ‘positive cases’, ‘slow the spread’,‘wash your hands’, ‘toilet paper’, and ‘China’. Figure 11a shows thechange of SBS over time for some of the words on these instances.This figure shows that china had the highest SBS score most of thetime when compared with other words. A discussion of this SBS isgiven below. The second highest SBS is counted for the word ‘case’(i.e. positive cases). This might be because people were discussingCOVID19 positive cases and their implications on health, economyand jobs. Word ‘hand’ (i.e. wash your hands) had a stable SBS scorethrough the time except for a spike in week 15. Word ‘toilet’ (i.e.toilet paper) had low SBS but a spike in week 15 when there weresome toilet paper related instances in Australia (e.g. toilet papersold our in most of the stores, people fighting over buying toiletpapers, etc.).Figure 11b shows how the SBS score varied in space and time forthe word China. In a certain period and some places the word Chinahad high SBS in COVID19 related tweets. This means, China wasmentioned in a lot of tweets, in a variety of topics, and a lot of topicof discussion involved the word China. One reason may be manytweets discussed the first detection of COVID19 in China. However,a high SBS also indicates that many diverse topics were influencedby the word China; and many topics (i.e. intense) were discussed inrelation to the word China. This might have been influenced by thewrong assumption that China is responsible for the spreading of coronavirus as coronavirus was first detected in China. This kind ofassumption can disrupt social harmony. As SBS can identify suchincidence in space and time, it can be used for positive interventionsuch as providing a right information to communities and providingnecessary security to the vulnerable community. (a) Temporal Distribution of SBS(b) Geospatial and Temporal Distribution ofSBS for the word ‘China’
Figure 11: Distribution of SBS (Impact) oodstock ’18, June 03–05, 2018, Woodstock, NY Bashar, Richi and Thiru, et al.
This research shows that social media data analysis is a powerfulmethod for observing social phenomena relevant to an outbreakof an infectious disease such as COVID19. Collecting data throughtraditional surveys and clinical reports are time consuming andcostly, and can have a time lag of few weeks between the timeof medical diagnosis and the time when the data become avail-able. Unlike traditional methods, social media data analysis is timeand cost-effective that can uncover momentum and spontaneityin conversations. Besides, it can be done systematically and canbe generalised a wide range of objectives. This study analysed thediscussion dynamics of COVID19 on Twitter from geospatial andtemporal context using various methods. The analysis methodswere found effective in capturing interesting insights and directlycorrelated with real-world events.The overall coronavirus related discussion on Twitter representnegative aspects. People were concerned about jobs, economy, andisolation in addition to health and safety. However, initiatives suchas government subsidies had a positive influence on the discussion.For example, there were changes in published tweets and negativesentiments when the new coronavirus cases were found or deathsoccurred. The peaks in positive sentiment occurred during positiveinitiatives taken by leaders or any positive development in thehealth care sector. When the spread of new cases started to decline,the number of coronavirus related posts declined and the positivesentiment increased. Our observations show how social mediaplatforms can influence the publicâĂŹs risk perception, their hopeand reliance on different organisational initiatives. Even it canchange real-world behaviour that can have an impact on controlmeasures enacted to mitigate an outbreak.Dynamic topic modelling discovered a wide variety of topics indiscussion, covering consequences, initiatives, impacts and peoples’behaviour with coronavirus. These topics evolved and their signifi-cance changed over time. Topic analysis provides an understandingof community discussion of COVID19 with a reasonable objectivity,precision and generality. We found that most of the COVID19 re-lated discussion has a high concentration around a relatively smallnumber of influential topics. For example, at the beginning mostdiscussion was related to COVID19 outbreak in China, then theCOVID19 cases in Australia and health care, then stay home andjob loss. Topic modelling uncovered racism instances and SBS iden-tified the impact in discussion. For example, our analysis showedthat COVID19 pandemic created fear and that fear leaded for racismto thrive that disproportionately affecting marginalised groups.The findings can help government, emergency agencies, clin-icians, health practitioners and caregivers to better utilise socialmedia to understand the public opinion, sentiments, social andmental health issues related to COVID19. Such an understandingwill enable proactive decision making for prioritising supports ingeo-spatial locations. For example, timely disseminating and up-dating information related to social issues by the government cancontribute to stabilising social harmony.
Advanced analysis of social media data related a pandemic suchas COVID19 is critical to protect public health, maintain socialharmony and save lives. By leveraging anonymised and aggregated geo-spatial and temporal data from social media, institutions andorganizations can get insights into community discussion to under-stand and act based on how COVID19 spread is affecting people’slives and behaviour. Specifically, the government and emergencyagencies can use the insight to better understand the public opinionand sentiments to accelerate emergency responses and supportpost-pandemic management.
REFERENCES [1] Moloud Abdar, Mohammad Ehsan Basiri, Junjun Yin, Mahmoud Habibnezhad,Guangqing Chi, Shahla Nemati, and Somayeh Asadi. 2020. Energy choicesin Alaska: Mining people’s perception and attitudes from geotagged tweets.
Renewable and Sustainable Energy Reviews
Sustainability (Switzerland)
12, 5 (2020), 1–19. https://doi.org/10.3390/su12052122[4] M.A. Bashar and R. Nayak. 2019. QutNocturnal@HASOC’19: CNN for hatespeech and offensive content identification in Hindi language. In
CEUR WorkshopProceedings , Vol. 2517.[5] David M. Blei and John D. Lafferty. 2006. Dynamic topic models.
ACM Interna-tional Conference Proceeding Series
148 (2006), 113–120. https://doi.org/10.1145/1143844.1143859[6] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.
The Journal of machine Learning research
Social Network Analysis andMining
9, 1 (2019), 1–20. https://doi.org/10.1007/s13278-019-0568-8[8] Andrea Fronzetti Colladon. 2018. The Semantic Brand Score.
Journal of BusinessResearch
88, March (2018), 150–160. https://doi.org/10.1016/j.jbusres.2018.03.026[9] George Gkotsis, Anika Oellrich, Sumithra Velupillai, Maria Liakata, Tim J.P.Hubbard, Richard J.B. Dobson, and Rina Dutta. 2017. Characterisation of mentalhealth conditions in social media using Informed Deep Learning.
Scientific Reports
In-ternational Journal of Environmental Research and Public Health
Proceedings of the 56th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Papers) , Vol. 1. 328–339.[13] Guy Lansley and Paul A. Longley. 2016. The geography of Twitter topics inLondon.
Computers, Environment and Urban Systems
58 (7 2016), 85–96. https://doi.org/10.1016/j.compenvurbsys.2016.04.002[14] Gene Moo Lee, Liangfei Qiu, and Andrew B. Whinston. 2016. A Friend Like Me:Modeling Network Formation in a Location-Based Social Network.
Journal ofManagement Information Systems
33, 4 (10 2016), 1008–1033. https://doi.org/10.1080/07421222.2016.1267523[15] Benjamin Mandel, Aron Culotta, John Boulahanis, Danielle Stark, Bonnie Lewis,and Jeremy Rodrigue. 2012. A demographic analysis of online sentiment duringHurricane Irene.
Proceedings of the 2012 Workshop on Language in Social Media
Lsm (2012), 27–36.[16] Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysisalgorithms and applications: A survey.
Ain Shams Engineering Journal
5, 4 (2014),1093–1113. https://doi.org/10.1016/j.asej.2014.04.011[17] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016.Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016).[18] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficientestimation of word representations in vector space.
International Conference onLearning Representations (ICLR) Workshop (2013).[19] Zhan Shi, Gene Moo Lee, and Andrew B. Whinston. 2015. Toward a BetterMeasure of Business Proximity: Topic Modeling for Industry Intelligence.[20] Andranik Tumasjan, Timm O Sprenger, Philipp G Sandner, and Isabell M Welpe.2004. Predicting elections with twitter: What 140 characters reveal about politicalsentiment. In