How Have We Reacted To The COVID-19 Pandemic? Analyzing Changing Indian Emotions Through The Lens of Twitter
Rajdeep Mukherjee, Sriyash Poddar, Atharva Naik, Soham Dasgupta
HHow Have We Reacted To The COVID-19 Pandemic? AnalyzingChanging Indian Emotions Through The Lens of Twitter
Rajdeep Mukherjee
IIT Kharagpur, [email protected]
Sriyash Poddar ∗ [email protected] Kharagpur, India Atharva Naik ∗ [email protected] Kharagpur, India Soham Dasgupta
Mallya Aditi International School, Bangalore, [email protected]
ABSTRACT
Since its outbreak, the ongoing COVID-19 pandemic has caused un-precedented losses to human lives and economies around the world.As of 18th July 2020, the World Health Organization (WHO) hasreported more than 13 million confirmed cases including close to600,000 deaths across 216 countries and territories. Despite severalgovernment measures, India has gradually moved up the ranks tobecome the third worst-hit nation by the pandemic after the US andBrazil, thus causing widespread anxiety and fear among her citizens.As majority of the world’s population continues to remain confinedto their homes, more and more people have started relying on so-cial media platforms such as Twitter for expressing their feelingsand attitudes towards various aspects of the pandemic. With risingconcerns of mental well-being, it becomes imperative to analyzethe dynamics of public affect in order to anticipate any potentialthreats and take precautionary measures. Since affective states ofhuman mind are more nuanced than meager binary sentiments,here we propose a deep learning-based system to identify people’semotions from their tweets. We achieve competitive results on twobenchmark datasets for multi-label emotion classification. We thenuse our system to analyze the evolution of emotional responsesamong Indians as the pandemic continues to spread its wings. Wealso study the development of salient factors contributing towardsthe changes in attitudes over time. Finally, we discuss directionsto further improve our work and hope that our analysis can aid inbetter public health monitoring.
KEYWORDS
Covid-19, Pandemic, Fine-grained Sentiment Analysis, Multi-labelEmotion Classification, Tweets ∗ Both authors contributed equally to this research.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
CODS-COMAD ’21, January 02–04, 2021, IIIT Bangalore, India © 2020 Association for Computing Machinery.ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00https://doi.org/10.1145/1122445.1122456
The coronavirus disease 2019, or the COVID-19, caused by theSARS-Cov-2 virus, has subsequently taken the shape of a pandemicsince it’s outbreak in late December 2019, causing enormous lossesto societies and economies around the world [12]. Although theenforcement of several protective measures such as lockdown, socialdistancing, etc. have had positive effects on containing the spreadof the virus, the loneliness caused by self-confinement and reducedaccess to family, friends, and other social support systems havecaused severe psychological threats to the physical and mental well-being of people around the globe [2, 8, 10]. While dealing with suchunprecedented challenges, as more and more people express theiremotions and opinions towards various aspects of the pandemicthrough social media platforms such as Twitter, Facebook, etc., itbecomes essential to monitor the dynamics of public affect fromsuch user-generated textual content in order to understand generalconcerns and anticipate any potential threats.Human-written texts such as tweets reflect an author’s affectivestate or thought process. Although emotions and sentiments areclosely related, merely classifying the subjective content of a tweetinto positive, negative or neutral sentiment categories may notbe sufficient to understand a person’s true feelings. Under tryingsituations such as the COVID-19 pandemic, emotions can be evenmore complicated [23]. A person might feel optimistic about vaccinedevelopment or thankful towards the doctors or even both whilestill being positive. Therefore, in order to capture people’s emotionsfrom their tweets, we build a transformer-based [21] supervisedmulti-label emotion classifier that achieves state-of-the-art resultson two benchmark datasets,
AIT [19], and
SenWave [23].With record no. of cases being reported daily and more than 1 mil-lion confirmed cases to date, India has now become the third worst-hit nation by the pandemic after the US and Brazil. We, thereforeuse our classifier, trained on COVID-19-specific
SenWave dataset,to monitor the evolution of emotional attitudes among Indians to-wards the ongoing pandemic by analyzing tweets posted betweenMarch 1st 2020 and July 5th 2020. We also study the developmentof salient factors or triggers contributing towards the changes inemotions over time. We hope that our proposed system can aid theconcerned authorities in real-time identification of peoples’ mentalhealth conditions and concerns from their social media posts andmake timely interventions in fighting the global crisis. a r X i v : . [ c s . S I] A ug RELATED WORK
Studies based on social media data around the COVID-19 pandemichave seen a steep surge over the past few months. Prior works[1, 13, 14] have stressed the need for automatic detection and mon-itoring of public affects from user-generated content, especiallyduring trying times such as the current pandemic. Although recentbenchmark annotated datasets [5, 18, 19] have facilitated the train-ing and development of supervised deep learning-based classifiersfor automated sentiment analysis and emotion detection, their re-sults are not directly applicable to COVID-19 analysis [23] due toconsiderable differences in vocabulary between the training andtesting domains. Majority of recent works around COVID-19 suchas [11, 22] have thus relied on feature-engineering based meth-ods or supervised methods with limited training data [12]. As thepandemic continues to spread its wings, we draw our motivationfrom [14, 19, 23] for building a multi-label emotion classifier andstudying the evolution of public emotions towards the COVID-19from an Indian perspective, through the lens of Twitter. • AIT :
The
Affect in Tweets Dataset (or
AIT for short) wascreated in [19], as part of SemEval-2018 Task 1. This non-COVID-19 dataset consists of 7724 English tweets, each la-beled with either neutral or no emotion or one or more ofthe 11 emotions, anger, anticipation, disgust, fear, joy, love,optimism, pessimism, sadness, surprise , and trust . • SenWave :
We consider the 10K English tweets from thelargest fine-grained annotated Covid-19 tweets dataset cre-ated by the authors in [23]. Here, each tweet is labeled withone or more of the 10 emotions, optimistic, thankful, em-pathetic, pessimistic, anxious, sad, annoyed, denial (towardsconspiracy theories), official report , and joking . • Twitter_IN :
We use version 17 of the COVID-19 Twitterchatter dataset released by the authors in [3].
Twitter_IN consists of around 1.4 lac English tweets from India postedbetween January 25th 2020 and July 4th 2020.The version of
SenWave , obtained from the authors, contains anadditional label surprise . Although we use all the 11 categories totrain our models, however, we leave out surprise (since not docu-mented in [19]) and official (since not an affective state) from furtherconsideration while performing our analysis with
Twitter_IN . We preprocessed the raw tweets in all the datasets, first by removingthe mentions (@user), and URLs. We then filtered out noisy sym-bols such as "RT" (retweet). Since hashtags can provide meaningfulsemantics, we removed the from the and kept the word .We used a mapping between emoticons with their meanings andthe emoji package to respectively replace emoticons and emojisappearing in the tweets with their semantic phrases. We then useda slang_translator to convert online slangs , commonly used in https://en.wikipedia.org/wiki/List_of_emoticons/ https://pypi.org/project/emoji/ https://github.com/rishabhverma17/sms_slang_translator Figure 1: Model Architecture short text message conversations, into their actual phrases. For eg:"CUL8R" gets replaced by "see you later". We further expand thecontractions using contractions . Finally, we filtered out punctua-tions, line breaks, tabs and redundant blank characters, etc. sincethey do provide any relevant lexical or semantic information. We used RoBERTa-Base [15] as the backbone of our multi-label emo-tion classifier. A tweet t to be classified is first broken down by the RobertaTokenizer into its constituent tokens. A special token [CLS]is appended to obtain the final sequence [ CLS ] , t , t , ..., t n , which isthen passed through 12 layers of attention-based transformers to ob-tain a 768-dimensional contextualized representation of the tweet, h CLS . Motivated by prior works [19], we simultaneously obtain a194-dimensional feature vector derived from affect lexicons using
Empath [7]. We append the two to obtain a 962-dimensional vectorwhich is then passed through a fully connected layer with 11 outputneurons. At each neuron, we use the tanh activation function witha threshold of 0.33 to finally predict the presence/absence of one ofthe 11 emotions. hereafter, we refer to this model as EC
ROBERTA .A variant of our model with BERT-Base [6] as the backbone isreferred to as EC
BERT . Among our non-BERT variants, EC
CNN con-sists of five parallel 1-D convolutions, each with different kernelsizes ranging from 2-6 and
ReLU activation function, followed by1-D max-pooling layers. All five outputs are merged to obtain asingle representation vector for the tweet. EC
LSTM consists of asingle layer of unidirectional LSTM cells with 256 hidden units.Hidden state from the last LSTM cell is considered as the tweet rep-resentation vector. For both the above models, the obtained tweetvector is passed through a hidden layer consisting of 128 neuronsfollowed by a fully connected layer with 11 output neurons. Labelsat each neuron are predicted with sigmoid activation function. https://pypi.org/project/contractions/ able 1: Comparison of Results on AIT . Methods Jaccard Acc. F1-Macro F1-Micro
NTUA-SLP [4] 0.588 0.528 0.701Current Leader 0.594 0.565 0.704EC
CNN (Word2Vec) 0.475 0.409 0.601EC
CNN (Glove) 0.479 0.438 0.609EC
CNN (Glove + Empath) 0.489 0.445 0.613EC
LSTM (Glove + Empath) 0.509 0.463 0.625EC
BERT
ROBERTA
Table 2: Comparison of Results on
SenWave . Methods Acc. J.Acc. F1-Ma. F1-Mi. LRAP H.Loss
SenWave [23] EC BERT
ROBERTA
Apart from
Macro-F1 and
Micro-F1 , both [19] and [23] consider
Jaccard Accuracy as the principle metric to evaluate the multi-labelaccuracy of the models designed for multi-label emotion detec-tion. [23] additionally considers
Label Ranking average precision (or LRAP),
Hamming Loss , and a relaxed accuracy mesaure called
Weak Accuracy or just "Accurcay" to evaluate the models.
Table 1 compares our results with those of the baselines for the taskof multi-label emotion classification when the models are trainedon the
AIT dataset. An ablation study performed on EC
CNN demon-strates the advantage of using richer word representations suchas
Glove [20] over
Word2Vec [17]. It further establishes the impor-tance of using affect representations from text such as
Empath embeddings for sentiment analysis tasks. Among our non-BERTvariants, EC
LSTM performs the best. We obtain our best results withEC
ROBERTA , when trained end-to-end for 5 epochs with
AdamW optimizer [16] and a learning rate of 2 e −
5. As observed, we com-fortably outperform the scores of
NTUA-SLP [4], which achievedthe top rank for this subtask as part of the SemEval 2018 Task 1.With a better F1-Macro score, we narrowly surpass the results ofthe current top-ranked system on the leaderboard for this subtask,with the competition currently running in its post-evaluation phase.As can be observed from Table 2, EC
ROBERTA , when trained on
Sen-Wave dataset further achieves state-of-the-art results on four out ofsix metrics, including
Jaccard Accuracy , when compared with [23].
Owing to the fact that there were only three reported cases ofCOVID-19 in India before March 2020, and there exist very fewtweets in
Twitter_IN for the months of January and February, weconsider all tweets posted on or after Match 1st 2020 for our analysis.We first use our EC
ROBERTA classifier, trained on
SenWave dataset,to predict the emotions for all tweets under consideration from
Figure 2: Weekly percentage distribution of various publicemotions on Twitter.Figure 3: Count distribution of tweets containing a specificemotion per 5000 tweets posted. the
Twitter_IN dataset. Few tweets with their predicted emotionsare listed in Table 3. Figure 2 shows the percentage distribution oftweets, on a weekly basis, containing one or more from a total ofnine emotions. We observe that annoyed and thankful show con-trasting trends. First and second peaks of annoyance correspondwith the declaration of nationwide lockdown and the
Tablighi Ja-maat gatherings respectively. Although, people initially showedtheir gratitude towards the doctors and health workers for theirefforts in dealing with such unprecedented challenges, the senseof thankfulness gradually declined as people started to feel morehelpless and anxious with growing no. of cases each day. Here, wehighlight that the no. of tweets posted greatly varies across weeks.In Figure 3, we therefore create unequal-sized bins, each howevercontaining exactly 5000 tweets (except the last one, which containsaround 4,300 tweets), and observe the trends for six most relevantemotions. We observe that the trends of optimism and pessimism complement each other well as people gradually start to adjustthemselves with the new normal of life.In order to extract relevant factors or triggers affecting the publicsentiments and emotions over time, we make use of an unsupervisedautoencoder-based neural topic model
ABAE as proposed by RudianHe, et. al. in [9]. We separately run the ABAE algorithm with thetweets of each of the six emotions captured in Table 4 and report able 3: Few Examples of Single and Multi-label PredictionsTweet Predicted LabelsSingle Label
This is the time to fight Covid19 at present but some intelligent Generals are focusing on war and terrorism Annoyedlet us stay together and fight against Corona VirusCoronavirusPandemic Lockdown2 Optimisticit is a serious matter and thousands of students live are in danger due to increasing cases of COVID19Cancel exams and promote the students on moderation policy Anxious
Multiple Labels
Media is so obsessed with a particular community that they even misspell coronavirus Annoyed, Joking, SurpriseVery nice Measures taken for The Development Of Economy keeping in view the safest side Thankful, OptimisticThe first Covid 19 positive from Meghalaya Dr John Sailo Rintathiang passed away early this morningSailo 69 who was also the owner of Bethany hospital was tested positive on April 13 2020 Sad, Official Report
Table 4: Top contributing factors or aspects affecting different emotions on Twitter regarding COVID-19 pandemic.Emotion Top 10 aspectsAnxious family, symptom, test, treatment, rate, risk, mask, spread, zone, assault
Annoyed govt, politics, death, news, religion, jamaat, work, China, assault, border
Sad lockdown, death, distancing, life, family, economy, village, doctor, worker, school
Pessimistic price, business, infection, demise, peak, curve, communalism, war, situation, transport
Optimistic initiative, opportunity, measure, arogyasetuapp, IndiaFightsCorona, stayhome, contribution, change, support, action
Thankful doctor, service, staff, nurse, app, fund, assistance, leadership, IndiaFightsCorona (a) Annoyed (b) Optimism
Figure 4: Change in Subcategories of Emotional Triggers over time. their top contributing aspects. The extracted aspect terms for eachemotion are filtered and assigned meaningful sub-categories bymeans of a many-to-many mapping. In Figure 4, we plot the trendsof the subcategories for annoyed (Fig. 4a) and optimism (Fig. 4b ).In Fig. 4a, we observe a clear peak from March 28th 2020 toApril 8th 2020 due to the religious gatherings of a certain com-munity, thereby triggering widespread criticism and hatred fromthe public. In Fig. 4b, the plots show a high level of communitygratitude in general, with occasional peaks which may be attributedto the events targeted at raising solidarity among the public. Forthe technological measures , we see a gradual increase and a peaknear the launch date of the Arogya Setu App - developed by theIndian Government to help our citizens in this pandemic.
To summarize, we first build a multi-label emotion classifier thatachieves state-of-the-art results on two benchmark datasets,
AIT (non-COVID) and
SenWave (COVID-specific). We then use ourtrained model to predict the emotions from India-specific tweetsposted between March 1st 2020 and July 4th 2020. Using thesepredictions, we study the evolution of emotions among Indianstowards the ongoing COVID-19 pandemic. We further study thedevelopment of salient factors contributing towards the changes inattitudes and draw interesting inferences. In future, we would liketo extend our work with the
Valence-Arousal-Dominance or VADmodel and take a multi-tasking approach to build our classifiers.
EFERENCES [1] Wasim Ahmed, Peter A. Bath, Laura Sbaffi, and Gianluca Demartini. 2018. MoralPanic through the Lens of Twitter: An Analysis of Infectious Disease Outbreaks.
Proceedings of the 9th International Conference on Social Media and Society (2018).[2] Abdurahman Ammar, Kods Trabelsi, Mirjam Brach, Hassen Chtourou, OmarBoukhris, L. Masmoudi, Bassem Bouaziz, E. Bentlage, Daniel How, M. Ahmed,P. Mueller, N. Mueller, Ali Aloui, Omar Hammouda, Laisa Liane Paineiras-Domingos, A. Braakman-jansen, C. Wrede, Stefano Bastoni, Carlos Soares Per-nambuco, Leonardo Mataruna, M. Taheri, Kamran Irandoust, Aimen Khacharem,Nicola Luigi Bragazzi, Karim Chamari, Jane Matthews Glenn, Nicholas T. Bott,Fatma Gargouri, Latifa Chaari, Hadj Batatia, G. Mohamed Ali, Osama Abdelka-rim, Marion Jarraya, Kais El Abed, Najah Souissi, Lisette Van Gemert-pijnen,Bryan L. Riemann, L. Riemann, W. Moalla, Jonathan Gómez-Raja, Marina Ep-stein, Robbert Sanderman, and Sc. 2020. Effects of home confinement on mentalhealth and lifestyle behaviours during the COVID-19 outbreak: Insight from the"ECLB-COVID19" multi countries survey. medRxiv (2020).[3] Juan M. Banda, Ramya Tekumalla, Guanyu Wang, Jingyuan Yu, Tuo Liu, YuningDing, and Gerardo Chowell. 2020. A large-scale COVID-19 Twitter chatterdataset for open scientific research - an international collaboration.
ArXiv (2020).https://doi.org/10.5281/zenodo.3930903[4] Christos Baziotis, Athanasiou Nikolaos, Alexandra Chronopoulou, AthanasiaKolovou, Georgios Paraskevopoulos, Nikolaos Ellinas, Shrikanth S. Narayanan,and Alexandros Potamianos. 2018. NTUA-SLP at SemEval-2018 Task 1: PredictingAffective Content in Tweets with Deep Attentive RNNs and Transfer Learning.
ArXiv abs/1804.06658 (2018).[5] Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan S. Cowen, Gau-rav Nemade, and Sujith Ravi. 2020. GoEmotions: A Dataset of Fine-GrainedEmotions.
ArXiv abs/2005.00547 (2020).[6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding. In
NAACL-HLT .[7] Ethan Fast, Binbin Chen, and Michael S. Bernstein. 2016. Empath: UnderstandingTopic Signals in Large-Scale Text.
Proceedings of the 2016 CHI Conference onHuman Factors in Computing Systems (2016).[8] Junling Gao, Pinpin Zheng, Yingnan Jia, Hao Chen, Yimeng Mao, Suhong Chen,Yi Wang, Hua Fu, and Jun ming Dai. 2020. Mental health problems and socialmedia exposure during COVID-19 outbreak.
PLoS ONE
15 (2020).[9] Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2017. An Unsu-pervised Neural Attention Model for Aspect Extraction. In
ACL .[10] Pavan Hiremath, C S Suhas Kowshik, Maitri Manjunath, and Manjunath Shettar.2020. COVID 19: Impact of lock-down on mental health and tips to overcome.
Asian Journal of Psychiatry
51 (2020), 102088 – 102088.[11] Md. Yasin Kabir and Sanjay Madria. 2020. CoronaVis: A Real-time COVID-19Tweets Analyzer.
ArXiv abs/2004.13932 (2020).[12] Bennett Kleinberg, Isabelle van der Vegt, and Maximilian Mozes. 2020. MeasuringEmotions in the COVID-19 Real World Worry Dataset.
ArXiv abs/2004.04225(2020).[13] Angela Leis, Francesco Ronzano, Miguel Angel Mayer, Laura Inés Furlong, andFerran Sanz. 2019. Detecting Signs of Depression in Tweets in Spanish: Behavioraland Linguistic Analysis.
Journal of Medical Internet Research
21 (2019).[14] Xiaoya Li, Mingxin Zhou, Jiawei Wu, Arianna Yuan, Fei Wu, and Jiwei Li. 2020.Analyzing COVID-19 on Online Social Media: Trends, Sentiments and Emotions.
ArXiv abs/2005.14464 (2020).[15] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, OmerLevy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: ARobustly Optimized BERT Pretraining Approach.
ArXiv abs/1907.11692 (2019).[16] Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization.In
ICLR .[17] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean.2013. Distributed Representations of Words and Phrases and their Composition-ality.
ArXiv abs/1310.4546 (2013).[18] Saif M. Mohammad and Felipe Bravo-Marquez. 2017. WASSA-2017 Shared Taskon Emotion Intensity.
ArXiv abs/1708.03700 (2017).[19] Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and SvetlanaKiritchenko. 2018. SemEval-2018 Task 1: Affect in Tweets. In
SemEval@NAACL-HLT .[20] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove:Global Vectors for Word Representation. In
EMNLP .[21] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is Allyou Need.
ArXiv abs/1706.03762 (2017).[22] Jia Xue, Junxiang Chen, Chen Chen, Chengda Zheng, and Tingshao Zhu. 2020.Machine learning on Big Data from Twitter to understand public reactions toCOVID-19.
ArXiv abs/2005.08817 (2020).[23] Qiang Yang, Hind Alamro, Somayah Albaradei, Adil Salhi, Xiao ting Lv, Chang-sheng Ma, Manal Alshehri, Inji Jaber, Faroug Tifratene, Wei Wang, Takashi Gojobori, Carlos Marmolejo Duarte, Xin Gao, and Xiangliang Zhang. 2020. Sen-Wave: Monitoring the Global Sentiments under the COVID-19 Pandemic.