Extracting and categorising the reactions to COVID-19 by the South African public -- A social media study
EExtracting and categorising the reactions to COVID-19 by the SouthAfrican public - A social media study
Vukosi Marivate
Dept. Computer ScienceUniversity of PretoriaCSIR, South Africa [email protected]
Avashlin Moodley
CSIR, South Africa [email protected]
Athandiwe Saba
Mail and Guardian [email protected]
Abstract
Social Media can be used to extract dis-cussion topics during a disaster. Withthe COVID-19 pandemic impact on SouthAfrica, we need to understand how the lawand regulation promulgated by the gov-ernment in response to the pandemic con-trasts with discussion topics social mediausers have been engaging in. In this work,we expand on traditional media analysisby using Social Media discussions drivenby or directed to South African govern-ment officials. We find topics that aresimilar as well as different in some cases.The findings can inform further study intosocial media during disaster settings inSouth Africa and beyond.
It is a few months since the first reportedCOVID-19 case in South Africa (Kiewit, 2020).This has been a time in the country’s historythat has been categorised with unprecedentedgovernment engagement. Government tries tobalance their responses in taking care of thehealth crisis to also respond to the social andeconomic impacts (de Kadt, 2020). Researchcontinues to be done on studying the pandemicspread in South Africa (Mbuvha and Marwala,2020; Arashi et al., 2020; Marivate and Com-brink, 2020). We take a different direction inthis work in that we look at the engagementon social media as a window into some of the concerns that the public in South Africa mighthave.In this work, we extract patterns of socialmedia interactions between the South Africangovernment (through its representatives) andcitizens (the public). Extracting the discussionpatterns is important, even with the limitationthat social media is not a representation of allparts of our society. South Africa has a 60%internet penetration and 36% of the populationon any type of social media. If we concen-trate on Twitter, South Africa has just over 2.3million users on Twitter (Kemp, 2020), out ofan estimated total population of 58.8 million(Africa, 2019).We use an empirical approach to extract top-ics from gathered social media data from SouthAfrica. We analyse this data to answer the fol-lowing questions: What are the most discussedissues?; How have the discussions changedover time in relation to the announced policychanges by the government?; What topics aresimilar and dissimilar between official and pub-lic sources?
The COVID-19 response by the South Africangovernment has been multi-modal (briefings,announcements, question & answer sessions)as the government tries to bring citizens into a r X i v : . [ c s . S I] J un heir confidence (Dayile, 2020). The Ministerof Health, Dr. Zweli Mkhize, holds daily brief-ings on a number of topics related to the Health-care system’s response to COVID-19. Thepresident, Cyril Ramaphosa, has held briefingsbefore major changes (such as introducing anddeclaring a national state of disaster, announc-ing lockdown and lockdown levels etc.).The public has many opinions and reactionson the COVID-19 pandemic and interventions(by government and other sectors). There is aneed to find ways to better understand these re-actions. Social media analysis and traditionalmedia analysis can be useful in extracting top-ics that are important in tracking a pandemic(Ghosh et al., 2017). Analysis of media in-formation (traditional and social) can extractwhat may be concerns to citizens but can alsohelp us highlight gaps in the engagement (Day-ile, 2020; Ghosh et al., 2017). Social Me-dia, in South Africa and beyond, has beenused to engage in debate and organise groupsthat mobilise volunteers that assist the vulner-able (Hamann et al., 2020) or for COVID-19data collection efforts (Marivate and Combrink,2020), mirroring past use in times of disas-ter (Chae et al., 2014). We collected data from Twitter that covers theperiods of 01 March 2020 to May 17 2020.We collect government communication froma few official accounts (see Table 1). Tradi-tional media analysis has highlighted that jour-nalists had focused on President Ramaphosaand Dr Zweli Mkhize for comments when itcomes to to COVID-19 in South Africa (Day-ile, 2020). We utilised TWINT to collect thedata. The data collection was under ethicalclearance EBIT/88/2020 .We collected 919,019 social media posts https://github.com/twintproject/twint National Account Description @CyrilRamaphosa President of RSA@DrZweliMkhize Minister of Health, RSA@HealthZA National Department of Health [NDOH]@nicd sa National Institute of Communicable Diseases [NICD] RSA
Table 1: Twitter keywords (in the form of ac-counts) tracked for our study - -
02 2020 - -
09 2020 - -
16 2020 - -
23 2020 - -
30 2020 - -
06 2020 - -
13 2020 - -
20 2020 - -
27 2020 - -
04 2020 - - Week020000400006000080000100000120000 N u m b e r o f p o s t s Social media engagements per week
Figure 1: Weekly number of posts [Timeline] from 189,266 unique users. The data wassegmented into two subsets,
TweetsOfficial and
TweetsPublic . TweetsOfficial containstweets published by accounts in Table 1,
Tweet-sPublic contains the remaining tweets that ref-erence the accounts in Table 1.
TweetsOf-ficial contains 2,537 tweets from 4 uniqueusers.
TweetsOfficial contains a coherent mes-sage from government officials to the publicwhereas
TweetsPublic contains the public re-sponse which can be incoherent and noisy dueto the number of users contributing to the data.
Preliminary experiments with an unsupervisedtopic model built on
TweetsPublic producedtopics that could not be deciphered. Further-more, the model poorly segmented the datainto representative clusters. In work by (Ghoshet al., 2017; Gallagher et al., 2017), a super-vised topic modelling approach was used toseed the topic model with a manually curatedset of words to guide the formulation of top-cs. Our approach aims to test whether topicscreated from a model trained on
TweetsOffi-cial and supervised by curated seed words canbe used to supervise the training of a modeltrained on
TweetsPublic to identify associatedtopics in the larger, noisier corpus. If the
Tweet-sOfficial model can effectively supervise the
TweetsPublic model to find topics that are as-sociated with
TweetsOfficial topics, it will pro-vide a reflection of the public response to thedifferent topics of information propagated bygovernment. The models were trained with 20topics and trained for 100 iterations. The mod-els were used to label tweets with a topic labelto conduct analysis on the topics produced.
The topics produced by the
TweetsOfficial and
TweetsPublic model were very similardue to the effects of the supervised approachtaken. The topics in Table 2 provide a generaloverview of COVID-19 themes that were cen-tral to the engagements during the observedperiod. The keywords were similar for bothmodels thus they are illustrated together in onecolumn for brevity. The models produced top-ics directly related to COVID-19 and heath-care; the officials from Table 1; the imposedlockdown; and the policies applied by govern-ment for the lockdown (alcohol and cigarettebans, school closure, job loss mitigation andhygiene guidelines).
We use topic timelines to study the temporalproperties of topics. Since the topics from bothmodels are aligned in context, the objective isto analyse spikes in volume to understand theinteraction between government and the public.Key dates are highlighted to associate spikesin volume with events that transpired.Figure 2 illustrates topic timelines forlockdown-related topics. The lockdown topic
Topic Keywords Label1 travel,movement,year,old,male,female,travelled,relocation,italy,switzerland Travel cases,recoveries,death,confirmed,total,number,deaths,today,recovered,related CaseReports testing,tests,screening,conducted,eligible,listed,feeling,flattenthecurve,hotline,sick Testing/Screening alcohol,ly,bit,spreadthefacts,guide,wash,vvnfkf,step,dry,ih Alcohol/Hygience smoking,smoke,cigarettes,ukuthi,abantu,lesifo,bakithi,ukhozi fm,uma,ngoba Cigarettes lockdown,home,distancing,social,essential,hygiene,stay,groceries,grants,graphic Lockdown/Distancing avoid,touching,droplets,coughing,nose,markets,tv,pscp,spoiled,stray COVID-19Info children,school,earn,tax,reprieve,salary,option,considering,uif,provisions Schools/Jobs minister,president,mkhize,zweli,ramaphosa,command,dr,cyril,health,cyrilramaphosa Officials/CommandCouncil hospital,healthcare,masks,nurses,doctor,ppe,doctors,nurse,collecting,workers PPE/Healthcare fake,news,smart,kind,stayathome,help,spread,ps,message,deal Fake News Table 2: Highlighted topicsFigure 2: Official vs Public Topics on LockdownRelated Issues is consistently discussed with spikes in volumeoccurring near key dates. Official communica-tion spikes in volume at the start of lockdownbut receives limited coverage elsewhere. Thecigarettes topic receives spikes in public atten-tion near key dates, most notably in the periodbetween level 4 being announced and beingactive because government changed their deci-sion to allow cigarette sales in level 4 lockdownin that period. Official tweets don’t appear tofocus on this topic. The other topics experiencespikes aligned to the lockdown topic.Figure 3 illustrates topic timelines for topicsrelated to public officials, personal protectiveequipment (PPE) and fake news. Public en- igure 3: Official vs Public Topics on Other Issues gagement towards key officials consistently oc-curred throughout with spikes in volume nearkey dates. The spike in volume before theannouncement of level 4 lockdown can be at-tributed to the public’s eagerness for the lock-down to be eased. The PPE/healthcare topicreceives low volumes of attention but is consis-tently present in the public discourse. The fakenews topic peaks alongside the public officialtopic.
As a result of the supervised topic modellingapproach, the
TweetsOfficial and
TweetsPublic models had topics described by very similarkeywords. We use topic similarity heatmaps toanalyse the syntactic similarity between topicsfrom different models. The method consistsof creating sub-corpora for each topic fromeach model, train a vectorizer on the union ofa pair of sub-corpora, one from each model.Thereafter, a heatmap illustrates the pairwisecosine similarity of two models.Figure 4 illustrates the topic similarityheatmap for the official and public topics. Thestrong similarity seen on the diagonal of theheatmap indicates that related topics from eachmodel have a strong syntactic similarity to eachother. The ’heat’ seen in most of the heatmap
Figure 4: Topic similarity heatmap is a result of the data covering a limited do-main of public engagements with governmentofficials related to the pandemic. The veryhigh similarity seen for topics 2 (Case Reports),3(Testing/Screening) and 12(PPE/Heathcare)are likely due to these topics being informa-tive in nature resulting in the scope for publicopinion being limited.
The topics produced highlight central themesin the discourse between government officialsand the public. The topics address themes suchas lockdown and the restrictions put in place;general information on COVID-19.The topictimelines indicated that the public attemptedto engage more near key dates in the observedperiod. Peaks in volume provided insights intowhen certain topics received attention. Thehigh syntactic similarity seen between relatedtopics from each model indicates that the su-pervised topic modelling approach taken canbe used to find targeted insights from a largernoisier data set by seeding the training processwith a smaller and more coherent data set thatrelates to the domain being studied. The appli-cation of this approach can yield informativeinsights about the interactions between govern-ment officials and the public. Furthermore, thetopics can be used as features in downstreamsupervised tasks aimed at stronger citizen en-gagement on social media. eferences
Statistics South Africa. 2019. Mid-year popula-tion estimates. StatsSA.Mohammad Arashi, Andriette Bekker, MahdiSalehi, Sollie Millard, Barend Erasmus, TanitaCronje, and Mohammad Golpaygani. 2020.Spatial analysis and prediction of covid-19spread in south africa after lockdown. arXivpreprint arXiv:2005.09596 .Junghoon Chae, Dennis Thom, Yun Jang, SungYeKim, Thomas Ertl, and David S Ebert. 2014.Public behavior response analysis in disasterevents utilizing visual analytics of microblogdata.
Computers & Graphics , 38:51–60.Azola Dayile. 2020. Analysis of covid-19 me-dia coverage: Brief 2. Technical report, MediaMonitoring Africa.Ryan J Gallagher, Kyle Reing, David Kale, andGreg Ver Steeg. 2017. Anchored correlationexplanation: Topic modeling with minimal do-main knowledge.
Transactions of the Associa-tion for Computational Linguistics , 5:529–542.Saurav Ghosh, Prithwish Chakraborty, Elaine ONsoesie, Emily Cohn, Sumiko R Mekaru,John S Brownstein, and Naren Ramakrishnan.2017. Temporal topic modeling to assess asso-ciations between news trends and infectious dis-ease outbreaks.
Scientific reports , 7(1):1–12.Ralph Hamann, Annika Surmeier, Jody Delichte,and Scott Drimie. 2020. Local networks canhelp people in distress: South africa’s covid-19response needs them.
The Conversation .Julia de Kadt. 2020. Covid-19 highlights southafrica’s need for local level social data.
TheConversation .Simon Kemp. 2020. Digital 2020: South africa.
DataReportal .Lester Kiewit. 2020. Health department confirmssouth africa’s first covid-19 case.
Mail andGuardian .Vukosi Marivate and Herkulaas MvE Combrink.2020. Use of available data to inform the covid-19 outbreak in south africa: A case study.
DataScience Journal , 19(1). Rendani R Mbuvha and Tshilidzi Marwala. 2020.On data-driven management of the covid-19outbreak in south africa. medRxivmedRxiv