A Stance Data Set on Polarized Conversations on Twitter about the Efficacy of Hydroxychloroquine as a Treatment for COVID-19
Ece Çiğdem Mutlu, Toktam A. Oghaz, Jasser Jasser, Ege Tütüncüler, Amirarsalan Rajabi, Aida Tayebi, Ozlem Ozmen, Ivan Garibay
AA S
TANCE D ATA S ET ON P OLARIZED C ONVERSATIONS ON T WITTER ABOUT THE E FFICACY OF H YDROXYCHLOROQUINEAS A T REATMENT FOR
COVID-19
Ece Çi˘gdem Mutlu ∗ , Toktam Oghaz , Jasser Jasser , Ege Tütüncüler Amirarsalan Rajabi , Aida Tayebi , Ozlem Ozmen , Ivan Garibay † Department of Industrial Engineering Department of Computer Science {ece.mutlu, jasser.jasser, ege.tutunculer, igaribay, ozlem}@ucf.edu{toktam}@cs.ucf.edu, {amirarsalan, aida.tayebi}@knights.ucf.edu A BSTRACT
At the time of this study, the SARS-CoV-2 virus that caused the COVID-19 pandemic has spreadsignificantly across the world. Considering the uncertainty about policies, health risks, financialdifficulties, etc. the online media, specially the Twitter platform, is experiencing high volume ofactivity related to this pandemic. Among the hot topics, the polarized debates about unconfirmedmedicines for the treatment and prevention of the disease have attracted significant attention fromonline media users. In this work, we present a stance data set, COVID-CQ, of user-generatedcontent on Twitter in the context of COVID-19. We investigated more than 14 thousand tweetsand manually annotated the opinions of the tweet initiators regarding the use of “chloroquine” and“hydroxychloroquine” for the treatment or prevention of COVID-19. To the best of our knowledge,COVID-CQ is the first data set of Twitter users’ stances in the context of the COVID-19 pandemic,and the largest Twitter data set on users’ stances towards a claim, in any domain. We have made thisdata set available to the research community via GitHub . We expect this data set to be useful formany research purposes, including stance detection, evolution and dynamics of opinions regardingthis outbreak, and changes in opinions in response to the exogenous shocks such as policy decisionsand events. By August 2020, about 20 million confirmed infected cases with SARS-CoV-2 virus have been reported worldwide.Among the affected countries, the United States of America has reported the highest amount of infection, which isabout 5 million infected individuals. The rapid spread of the virus and the uncertainty and risks associated with theCOVID-19 pandemic has led institutions, policymakers, and individuals to seek drastic countermeasures against thespread of this disease, while imposing the least costs [1]. For instance, many governments have taken severe responsesto contain the virus, such as lock down the infected regions and shuttering their economies for weeks, closing theirborders, and investing an unprecedented amount of funding on medical facilities and equipment. In the absence of avaccine, healthcare professionals resorted to alternative uses of existing drugs. Examples of such drugs are chloroquine(CQ) and hydroxychloroquine (HCQ), which are immunosuppressive and anti-parasite drugs that have been in use totreat malaria and lupus. Hydroxychloroquine is a less toxic metobolite of chloroquine and has been identified to haveless side effects [2]. Both of these drugs are included in the treatment regimen of COVID-19 patients by physiciansin Italy, France, and China, after some studies suggested that their use could be effective in inhibiting COVID-19infections [3, 4]. ∗ All authors contributed equally to this study. † Corresponding author: Ivan Garibay, [email protected] https://github.com/eceveco/COVID-CQ a r X i v : . [ c s . S I] S e p hile the majority of these studies came under heavy criticism due to the lack of scientific rigor, such as unusuallysmall sample sizes and the absence of randomized trials, certain doctors, physicians and politicians quickly embracedthe idea that hydroxychloroquine could be a “miracle cure”. On the other hand, some studies have cautioned against theuse of chloroquine and hydroxychloroquine for the treatment of COVID-19 due to their hazardous side effects and theirinefficacy [5, 6]. In this information-scape of conflicting results and uncertainty, the unproven claim that Chlroquineand hydroxychloroquine are cure for COVID-19 quickly spread in online social networks and mainstream media outlets.In the White House press briefing dated March 19, 2020, hydroxychloroquine received endorsement from the USAdministration, which accelerated spread of the rumors concerning the effectiveness of hydroxychloroquine against thedisease. Thus, extensive polarization is observed on the effectiveness of these two drugs, specially, hydroxychloroquine,while this polarization is fueled by societal, political, and medical discussions about these drugs.Following these events and the political attention that hydroxychloroquine received, the claim that chloroquine andhydroxychloroquine may be effective tools against COVID-19 created tensions in social media, with many peopleposting about evidence for or against the effectiveness of both drugs, supporting or rejecting their use in a political tone,or just making neutral remarks about the ongoing chloroquine/hydroxychloroquine and COVID-19 situation. In thisstudy, we aim to identify and analyze Twitter users’ stances about the rumor that chloroquine and hydroxychloroquineare cure for the novel coronavirus. For this work, we define rumor as an unofficial story or piece of news that mightbe true or invented, and quickly spreads from person to person, per Cambridge Dictionary. Our contributions are asfollows: • We present a manually annotated Twitter stance data set for the unproven claim of “chloro-quine/hydroxychloroquine are cure for the novel coronavirus”, that we call COVID-CQ and it is accessible tothe public from our GitHub repository . • To the best of our knowledge, COVID-CQ is the first stance dataset regarding COVID-19. This data can beused by the research community to analyze the dynamics of the opinions of social media users in a worldwidepandemic. • To the best of our knowledge, COVID-CQ is currently the largest human-labeled stance dataset on Twitterconversations with more than 14 thousand stance labels towards a claim. • The conducted annotation procedure includes joint annotation of tweets and shared URLs for disambiguation.This technique allowed us to produce accurate annotations for a challenging dataset that requires deepunderstanding of the content to identify the true stance of a tweet.
Detecting opposite opinions in a polarized conversation towards a target subject is a sub-domain of sentiment analysis,usually referred to as stance classification [7]. In stance classification, the subject under investigation is generally aperson, organization, policy, or opinion [8]. The importance of this field of research relies on its applications, includingthe automatic extraction of attitudes and opinions towards events, and fake news or rumor identification [9]. Manystudies on the literature of fact checking consider applying stance classification on potentially related documents togain knowledge on varying sources of information, and predict the factuality of a claim according to the strength ofaggregated stances [10]. Generally, three stance classes have been considered in the literature for the task of stanceclassification: i) positive (in favor), ii) negative (against), and iii) neutral (none or neither); however, some studies haveapproached this classification by considering an extra class for the unrelated or irrelevant documents [11]. Despite thecommon categorization of stances as mentioned earlier, the idea of relative classification of stances using a rankingmechanism is proposed in [12]. The problem of stance classification was further evolved by the introduction of acomplementary task to stance classification, which focuses on the stances towards given claims rather than entities,proposed in [13]. The first data set on stance classification towards claims was also proposed in [13], which introduced astance data set for 2,394 claims on Wikipedia articles. This data set is the most similar corpus to our data set. However,our focus is on the Twitter platform.Despite the existence of many stance data sets for polarized social media subjects, the available data sets are mostlyfocused on multiple target subjects, which results in having a small number of samples for each of the stance classes. Forinstance, the data set introduced in [14] only contains labels for 4,455 tweets regarding the US presidential candidatesfor the 2016 election: “Donald Trump”, “Hillary Clinton”, “Ted Cruz”, and “Bernie Sanders” . Also, the SemEval-2016Task 6 benchmark [15] only contains 4,870 tweets with stance labels for 6 targets: “Atheism”, “Climate Change isa Real Concern”, “Feminist Movement”, “Hillary Clinton”, “Legalization of Abortion”, and “Donald Trump”. Inaddition to having a small size, SemEval data set was collected via hashtag queries and only specific tweets in which https://github.com/eceveco/COVID-CQ Eisenberg and Finlayson define annotation as the "process of explicitly encoding information about a text that wouldotherwise remain implicit" [18]. For this study, annotation is the record of the Twitter audiences’ stances about therunning debate of chloroquine and hydroxychloroquine as treatments for coronavirus. Our purpose is to create apure human-annotated data set rather than ML-based data set with or without supervision, as we believe that human-annotation renders further studies more robust and reliable. The annotation procedure was conducted by a team ofcontaining 6 graduate and 3 undergraduate students in order to reach a consensus on the annotation guideline. In ourannotation procedure, each student was asked to annotate the individual tweets as "Against","Favor" or "Neutral/None" for the unproven claim of "chloroquine/hydroxychloroquine is cure for the novel coronavirus" , relying on the well-known rumor listing website of fullfact.org as in Kwan et al.’s study [19]. We have done our analysis on online users’conversations regarding topics related to the COVID-19 pandemic in the Twitter platform. The data was collectedusing the Twitter API and Hydrator as suggested in [20]. The detailed list of keywords being used for our datacollection is given in the cited study. We considered tweets only related to the specific rumor and filtered tweets whichinclude "hydroxychloroquine, chloroquine, and HCQ" as keywords for our queries, published between 04/01/2020and 04/30/2020. This data set includes 98,671 tweets generated by 75,685 unique Twitter users. Since stances of theretweets may be easily attained with assumptions, we focused only on the 14,374 unique tweets (tweets/mentions/replies)generated by 11,552 unique users (The most active user has 91 user-generated contents.) to decrease the work-load.Each of the investigated tweets is annotated by at least two different annotators. In the first round, the data waspartitioned into 15 different clusters according to the time of the information creation and the tweet clusters wererandomly assigned to the every possible combination of our annotator pairs. Thus, any possible biases due to annotator-pair match and time are prevented (Figure 1). Among all the performed annotations, the inter-annotator agreement onthis set was 87.37%, which demonstrates that our annotators were effectively educated before conducting the procedure.In the second round, the remaining 12.63% of the data was assigned to a third annotator who was not being asked toannotate that specific tweet for the first round. After labeling all tweets, the cosine similarity between td-idf tokenizedand vectorized tweet texts and their labels were compared based on the assumption that similar tweets are more likelyto have a similar stance. Then, all our annotators were asked to discuss their annotation and the reasoning behind theirdecision, to reach a consensus on the inconsistent tweets. Thus, noisy labeling has been prevented.Some of the challenges we faced while annotating this data are: i) Some of the tweets include irony/sarcasm; therefore,the true label might be hard to catch when the annotator is focused on the sentiment of the tweet only. ii) Since this https://fullfact.org https://github.com/DocNow/hydrator Data Partition for AnnotationAnnotators
Figure 1: A visualization of the distribution of our data set for stance annotation.debate is based on a health-related claim, the stances of a large amount of investigated tweets were ambiguous when theannotator only focused on the tweet text rather than a joint annotation of the tweet text and the shared URLs, if any. Thesource of this ambiguity is that many Twitter users shared URLs to the academic studies and news websites. iii) Eachtweet can only contain up to 280 characters, which often poses a difficulty to fully understand the true meaning of themessage, due to Twitter users being constrained to writing a short text. To overcome these challenges, we investigatedthe content of the URLs to understand the correct stances, only if the tweet text was not self-explanatory.
We asked our annotators to label each of the tweets in our corpus using one of the three labels: "Against", "Favor",or "Neutral/None" regarding the unproven claim of "Hydroxychloroquine and chloroquine are cures for the novelcoronavirus." . To gain a deep understanding of the stances in tweets, and due to the high subjectivity related to thisclassification method, the ambiguous tweets were being discussed in details among the annotation team. In the followingsub-sections, the three classification labels and their corresponding tweet examples are explained in more details.I.
AGAINST:
This stance label was being used for the annotation of tweets that imply an opposition to the claim,either directly or indirectly.The stances of some of the tweets in this category are easily comprehensible as the tweet initiator expresses adirect opposition against the claim. For instance: "I’m a physician. I would not take .Some of the other tweets that were being identified as belonging to this category do not include personalopinions. Instead, these tweets might contain URLs to the academic studies or news articles in whichhydroxychloroquine is demonstrated to be not effective against COVID-19, or simply contain the headingof the news article. It is assumed that the tweet initiator aims to share this information since he/she opposesthe claim. An example of such a tweet is: "No evidence of clinical efficacy of hydroxychloroquine in patientshospitalized for COVID-19 infection with oxygen requirement: results of a study using routinely collected datato emulate a target trial | medRxiv" .Another example of expressing a counter attitude towards the aforementioned unproven claim, which wasfrequently observed in our data set, is via rationalizing against the claim. For instance, we observed that manyTwitter users initiate contents that directly opposes the claim through expressing a reasoning, such as theside-effects of the drugs, or indirectly, via sharing the headings of news articles that imply the same concept: "French Hospital Stops Hydroxychloroquine Treatment for COVID-19 Patients Over Major Cardiac Risk" , "Mr. Trump himself has a small personal financial interest in Sanofi, the French drugmaker that makesPlaquenil, the brand-name version of hydroxychloroquine." .Some tweets, on the other hand, include sarcasm/irony, which challenge the understanding of the true stancebehind the textual content, even for human annotators. For instance: "This is why you don’t take your medicaladvice from a reality TV host. .II. FAVOR:
This stance label was being used for the annotation of tweets that imply a support opinion towardsthe claim, either directly or indirectly.The stance label for some of the tweets in this category can be easily implied by the annotator, as the tweet ini-tiator expresses a direct support in a straightforward language. For instance: "It is TIME to open up businessesand schools again! Corona numbers are inflated and you know it! And...there’s a cure!...hydroxychloroquine!" atients doctors treat treatment drug trump effective workdr zinc newpeople why curehelp saveazithromycin @realdonaldtrumpcovidvirus study gettake via againstfauci success rateddeath days needlifejust testeddemocrat like world hospital state media presidentlivesc go prescribechina shows fightknowresults manyinfection trialvaccine makeup saymedical fdacountriespandemictime early casesindiareported therapygiveback approvedkills rightthank corona rep preventthink malaria stopwellpositive american governorsurveyend ny dieclinical try years globalcombination disease started ask recovered symptomscdc hydrosulfate drug indiatrump patients treatment trial treat study news via covid get workagainst people export test clinical doctors take effective why @realdonaldtrump justcountries need like pandemicpresidenttime help malaria millionazithromycin supplyknow update coronahospitalized ban make medical question virusdr upchina ask cure fighttablets lookmany world lupus dosesstates goingprevent please indianmasks nationalwhite give infectionfirsthouse prescribedhealthcasesstart reportresearch deathslive thinktouted usa modipm sayday evidencekilledretaliationright weeks datafar good hcq requestthanks vaccinedonaldtrying trump drug patients study treatment effects treat death take @realdonaldtrump curepeople against news trial hospitalsbenefitheartside pushing doctorevidencecovid work stoppresident get touted risk medicalfdareports like promoted viawarns findsmalaria justunprovenmakes showskill whydr rate expertneed clinicalazithromycinhelp know testinggoingsay frenchdied researchersvirus evencause up miracle small lupusdisinfectantsuggests higheramericans caretimeapproved live donald dangerous doses pandemicpotentialcompany injectsevere hype duehope vaccine foxrecommends serious followedproblemsfinancialthink telldisease a. b. c. Figure 2: This figure demonstrates the most frequently used keywords in each stance category: (a) favor, (b) neutral, (c)against. On the contrary to the above example, we observed that some of the tweet initiators convey their support ofthe claim in an indirect manner via questioning the conditions against what it seems to be an obvious solutionto them. For instance:
Why did Fauci CHEER when hydroxychloroquine was used in 2013 for MERS, but isnow skeptical for coronavirus?
III.
NEUTRAL/NONE:
This stance label is the last category in the annotation of our data set, and its label wasbeing used for the tweets that are neither in favor, nor against the aforementioned claim.We observed that most of the tweets in this category fall into one of these groups: i) tweets that are in the formof a question towards the truthiness of the claim, ii) tweets that convey a question with the aim of gaining moreknowledge on the topic, iii) tweets that are written as a statement in a pure neutral tone, iv) tweets that onlycontain a neutral heading of a news article or an academic publication, and finally, v) tweets that contain thequery keywords; however, no clear relation between the claim under study and the tweet content were implied.An example of the tweets which convey a question as in group ii discussed above, is: "What is
Finally, the tweets that are focused on the events related to the chloroquine/hydroxychloroquine drugs, ratherthan a direct relation to the claim under study is: "India sends hydroxychloroquine to UAE for COVID-19patients."
The annotated COVID-CQ data set includes 14,374 original tweets (tweets/mentions/replies), which were generated by11,552 unique users on Twitter. We excluded the retweets from our data set annotation. Table 1 illustrates the size ofour data set and the frequency of tweets for each class, in which the "Favor" class with containing 6841 tweets is thelargest class, followed by "Against" class with 4685 tweets, and finally, the smaller class is "Neutral" with 2848 tweets.Table 1: The frequency of the annotated tweets belonging to each stance classStance Number of TweetsNeutral 2848Against 4685Favor 6841
Total 14374
To briefly provide information on the underlying topics in our corpus, we demonstrated the most frequently used wordsfor each stance category in Figure 2. These word clouds are achieved after preprocessing the textual content of the dataset, including the elimination of stopwords and the common domain words, such as chlorquine, hydroxychloroquine,covid19, and coronavirus. A detailed explanation of text preprocessing and cleaning is provided in section 6. Despitethe high similarity of the content in our corpus, and the intertwined topics for all the three clusters (i.e. the topicsrelated to the available drugs, hospitalized patients, and the treatment methods), it is clearly observable that some of thekeywords have been appeared in a specific stance class in a higher frequency. Since a considerable number of tweets inthe
Neutral/None class are related to the import of the hydroxychloroquine drug from India to the US, the word “India”has appeared as one of the most frequent words in this class. For the
Favor stance class, we observed that plenty of5weets in this category are related to the effectiveness of hydroxychloroquine as combination with two other drugs, Zincand Azithromycin. Thus, the word cloud for the
Favor class contains these keywords. Additionally, positive terms suchas “effective”, “help”, “save”, and “success” are also more observable in the textual content of this category. Finally, the
Against stance class has been observed to include words with a negative sentiment, including “risk”, “stop”, “warn”,“kill”, and “death”.Figure 3: The daily tweet counts in April 2020, classified into three categories: ’neutral’, ’against’, and ’favor’. Theblack line refers to the ratio of the number of ’favor’ labeled tweets to the number of tweets with the ’against’ stance,for a 3 day moving average.Figure 3 represents the daily counts of tweets that are labeled as "Favor" (green), "Neutral" (gray), and "Against" (pink)in a one month time period. As the month of April is the beginning of the chloroquine/hydroxychloroquine debate, thefluctuation in the ratio between "Favor" to "Against" tweets are found as very drastic; the black line demonstrates the3-day moving average of the ratio of "Favor" tweets to "Against" tweets. Therefore, this data set offers notonly a challenging corpus for the stance detection task, but also presents a drastic dynamic for the researchers who focuson a better understanding of the information diffusion, polarization, and opinion changes over time, in their studies.The prepared data set is available to the public via our GitHub repository, accessible on https://github.com/eceveco/COVID-CQ . We adhere to Twitter’s terms and conditions by not providing the tweet JSON, but sharing thestance labels with the tweet IDs, so that the tweets can be rehydrated from the Twitter API. Throughout the COVID-19 pandemic, many news articles and research publications are being discussed among socialmedia users. Although many drugs made their path to mass clinical trials and researches across the world, the fiercedebates surrounding the drugs hydroxychloroquine are more frequently observable among political figures, medicalscientists, and social media users, while hydroxychloroquine is a less toxic metobolite of chloroquine and has beenidentified to have less side effects [2]. As the COVID-CQ data set is focused on a month-length Twitter activitiesregarding the COVID-19 pandemic, and particularly, on the chloroquine/Hydroxychloroqune conversations, this datacontains textual content in relation to many major events that occurred since the beginning of the pandemic up to theend of April 2020. As discussed in [21], narrative summaries can be constructed from an ordered chain of individualevents with causality relationships amongst events, appeared within a specific topic. According to this definition, belowwe briefly narrate the major events that are being discussed in the Twitter conversations in our data set. However, thisnarration does not reflect authors’ personal opinions towards any of the reviewed events.Amongst the narratives that are discussed in this data set while being occurred before April is the news related to theresults of a study published on March 20 regarding hydroxychloroquine to treat COVID-19 patients, which foundthat treating patients with a combination of hydroxychloroquine and azithromycin results in a more efficient virus6limination [22]. This combination of the drugs has been referred to as “game changer” and “beginning of the end ofthe pandemic” by many Twitter users. However, plenty of studies later failed to replicate these results [23].Examples of other events that attracted significant attention from Twitter users are the purchase of hydroxychloroquinesulfate tablets by the Department of Veterans Affairs and the Bureau of Prisons in March 2020. We observed in ourdata that the tweet initiators who propagated news on these events were referring to many news articles regarding thisnarrative, including the articles published by New York Post and The Daily Beast . Further discussions in regards tothe purchase and storage of these drugs in our data set are related to the US government announcement of stockpiling thedrug hydroxychloroquine in late March, the event that according to Bloomberg was later followed by many hospitalsin the United States. Although a substantial number of users on social media only shared the URLs to news articleand/or included the headings on these topics, many individuals argued that the mass storage of the unproven drugs bythe US administration and hospitals might result in the deprivation of patients in true needs to access the drugs, such asthe Lupus patients who receive antimalarial drugs to ease their symptoms [24]. A Bloomberg article refers to thisissue as the shortage of the drug for Lupus patients, as a consequence of its high demand for COVID-19.Another important event in March 2020 that affected the opinions on online social media was the Emergency UseAuthorization (EUA) for oral formulations of hydroxychloroquine sulfate and chloroquine phosphate by the Food andDrug Administration (FDA), granted on March 28. We observed that this event attracted significant attention on Twitterduring April, and that it encouraged the appearance of a positive attitude towards the drugs.On March 25, India that is one the largest manufacturers of the drug hydroxychloroquine, announced that the directorate-general of foreign trade has prohibited the export of this drug to any other countries amid coronavirus outbreak . Thisdecision was partially lifted on April 5 in response to a call from president Trump to Indian Prime Minister NarendraModi, according to BBC . Many Twitter users propagated the news on the lift of export ban for this drug by using on April 2.On April 7, another rumor emerged, but this time regarding the immunity of patients with rheumatology illnessesagainst the novel coronavirus, as a result of taking the drug hydroxychloroquine. The Twitter users started to propagatethis rumor after the rheumatologist, Dr. Daniel Wallance, mentioned in an interview with Dr. Oz on April 7 that outof 800 patients who are taking the drug, none have been reported to contract the virus . However, the controversialreports appeared from late April, in some the patients with Lupus were being considered as to be at higher risks ofinfection and development of severe symptoms for coronavirus .The support of hydroxychloroquine during the white house press briefings on the coronavirus pandemic influenced theopinions on Twitter negatively. Consequently, social media users started to relate political and/or financial benefits tothe support of the drug, which was also being discussed in many news articles, including what Washington Post calls“the real reason behind hydroxychloroquine obsession” , published on April 7.Despite the positive news regarding the efficiency of the drugs until the middle of April 2020, the results published ina study of hundreds of patients at US Veterans Health Administration medical centers suggested that patients takinghydroxychloroquine are no less likely to get infected by the virus, instead, the death rates in these peatiest have beenobserved to be higher [25]. These results appeared in many news articles, including a CNN article published on April21. nypost.com/2020/04/07/federal-agencies-purchase-large-supply-of-hydroxychloroquine/ thedailybeast.com/the-bureau-of-prisons-just-bought-a-ton-of-hydroxychloroquine-trumps-covid-19-miracle-drug?ref=scroll bloomberg.com/news/articles/2020-03-20/hospitals-stockpile-malaria-drug-trump-says-could-treat-covid-19 usatoday.com/story/news/health/2020/04/18/hydroxychloroquine-coronavirus-creates-shortage-lupus-drug/5129896002/ fda.gov/media/136534/download statnews.com/pharmalot/2020/03/25/india-trump-hydroxychloroquine-coronavirus-covid19/ bbc.com/news/world-asia-india-52196730 nypost.com/2020/04/02/hydroxychloroquine-most-effective-coronavirus-treatment-poll/ youtube.com/watch?v=kd7Jec3pZBk medicalnewstoday.com/articles/lupus-and-covid-19 washingtonpost.com/opinions/2020/04/07/real-reason-trump-is-obsessed-with-hydroxychloroquine/ cnn.com/2020/04/21/health/hydroxychloroquine-veterans-study/index.html . The severenegative effects reported for the use of this drug include abnormal heart rhythms that might threaten patients’ lives.The FDA warning related to the use of antimalarial drugs to treat COVID-19 patients and the publication of academicresearches which reported the inefficacy of these drugs against the novel coronavirus caused the United States to be leftwith massive supplies of hydroxychloroquine, the concern that was being reflected in many news websites at the end ofApril, including the article published in USA Today in April 27. Finally, in June 2020 the FDA withdrawn the grantedemergency use authorization (EUA) for the drug hydroxychloroquine . However, the EUA withdrawal event does notappear in our data set. After the preparation of the data set according to the annotation guidelines, we conducted extensive analysis on thedata to ensure the quality of the annotation, including the consistency of the stance labels in our corpus. In thisregard, we investigated the semantic similarity of the tweets via computing the pairwise cosine similarity on the vectorrepresentation of the tweets. The Universal Sentence Encoder from the TensorFlow library was used to achieve thesemantic vector representations of the tweets. The sentence level embeddings provide high level sentence semanticrelationship, which enables the comparison of the similarity of tweet contents against each other to assure labelingconsistency. After the calculation of pairwise cosine similarity for the tweets in our corpus, we reevaluated the labels ofthe tweets with ≥ . cosine similarity, where this threshold was being identified by human judgement via comparingthe tweet similarity results. The reevaluation procedure of the identified highly similar tweet contents include manualinvestigation of these tweets by the annotators, such that the highly similar tweets that convey the same stance towardsthe claim are categorized into the same stance class. To demonstrate the potential of evaluating many stance classification models using this data, and to evaluate thequality of our data set, we conducted extensive analysis using six different classification methods. The implementedmodels for this purpose include Multilayer Perceptron (MLP), Logistic Regression (LR), Support Vector Machine(SVM), Multinomial Naive Bayes (MNB), Stochastic Gradient Descent (SGD), Gradient Boosting (GB), and finally,Convolutional Neural Network (CNN). Furthermore, all the classification methods are compared for the computedvector representations of word unigrams and bigrams using the term frequency-inverse document frequency (tf-idf).The implemented Multilayer Perceptron contained 2 dense layers, the rectified linear unit (ReLU) as the activationfunction, and the cross-entropy loss as the loss function. For the Logistic Regression classifier, the Limited-memoryBroyden Fletcher Goldfarb Shanno (lbfgs) was used as the solver. The Support Vector Machine model was implementedwith the linear kernel. For the Stochastic Gradient Descent model, the perceptron loss was used. In the implementationof the Gradient Boosting model, the deviance loss was used for model optimization. Finally, the Convolutional NeuralNetwork (CNN) was implemented in two different ways to receive the inputs as vectors computed by one-hot encodingand GloVe word embedding, both with 5 convolutional layers with kernel size of 3 and stride size of 2, and withExponential Linear Unit (ELU) activation function. The training stop criteria was to reach to a maximum number of1000 iterations in training for all the classifiers.
To prepare the input to the classifiers, we first filtered the tweets that were identified by Twitter to be in a differentlanguage than in English. After excluding the non-English tweets, we removed any punctuation marks and non-Asciicharacters, and replaced the integers with their textual representation. Further preprocessing of the data include mappingall the input text to lowercase format, followed by word stemming, lemmatization, and the removal of the stopwords.Additionally, the URLs, hashtag signs ( xxx ), and emoticons were removed to achieve a higher textual quality. Itshould be noted that none of these classification methods have been trained or tested on the content of the shared URLsas part of the input data. After this step, the term frequency-inverse document frequency (tf-idf) was used to generatethe vector representation of the input tweets. Using tf-idf, we generated a vector space with weighting scheme based on time.com/5827085/fda-warning-hydroxychloroquine/ usatoday.com/story/news/politics/2020/04/27/coronavirus-states-stockpile-hydroxychloroquine-drug-trump-touted/3031660001/ fda.gov/media/136534/download tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder Multinomial Naive Bayes (MNB) 0.7182 0.7182Gradient Boosting Classifier (GB) 0.6764 0.6768the frequency of unigrams and bigrams in a tweet relative to the total number of their frequencies in the entire data set.Thus, tf-idf captures the most distinct words while ignoring the semantic or syntactic attributes. After this step, thevectorized tweets were used as the input to all the classifiers. For the training and testing of all the models, we used80% of the tweets in training, and the remaining of the tweets to evaluate the models.
The comparison of the results for 6 classification models using our stance data set is provided in table 2. Amongthe investigated classification methods, the Logistic Regression model achieved the best accuracy of 0.76 for bothaccuracies when the feature vectors for the tweets were computed using unigrams and bigrams for tf-idf. The next bestperformance was achieved by the SVM with very close performance to LR. The gradient boosting model achievedthe lowest performance for both unigram and bigram tf-idf vectorized inputs. Surprisingly, using the bigrams togenerate the tf-idf feature vectors did not affect the accuracy, observed for all models. However, despite the use ofgeneral purpose classification methods than state-of-the-art stance detection models, and although the contents from theURLs were not used for the evaluation of these classifiers, all of the models were able to classify the tweets with anacceptable performance. For further analysis, we implemented a Convolutional Neural Network (CNN) classifier forstance classification with one-hot encoding and GloVe vectorization of the words in tweets. This classifier achieved theaccuracy of 0.73 for both vectorization methods. Additionally, the stance classification using the MLP model was alsorepeated for the one-hot encoded tweets. The achieved classification accuracy for this classifier was 0.75, which isslightly improved comparing with the MLP model using the tf-idf feature vectors.
In this work, we introduced a large data set of Twitter stances towards the unproven claim of “chloroquine andhydroxychloroquine are cure for the new coronavirus”. Our data set, COVID-CQ, contains stance labels for morethan 14 thousand original tweets, after discarding the retweets. COVID-CQ defers from the existing corpus as the trueunderlying stances in Twitter conversations have been identified via a joint annotation of tweets’ text and the sharedURLs when the tweets were not self-explanatory. Accordingly, our data set challenges the prediction models for thetasks of stance detection, to incorporate further information besides the text of the tweets. We have made the annotatedcorpus available to the public through our GitHub repository, in which the Tweet ids and the stance labels are providedto the research community and the given information can be used for Tweet rehydration via the Twitter API. To thebest of our knowledge, COVID-CQ is the first data set regarding the stances towards the COVID-19 pandemic, besidesbeing the largest human annotated stance data set for social media on stances towards a claim.
Acknowledgments
The authors gratefully thank Mina Sonbol and Nicholas Wiesenthal who assisted in the annotation and analysis of thedata set.
References [1] Amirarsalan Rajabi, Alexander V Mantzaris, Ece C Mutlu, and Ivan Garibay. Investigating dynamics of covid-19spread and containment with agent-based modeling. medRxiv , 2020.[2] Thomas J Stokkermans and Georgios Trichonas. Chloroquine and hydroxychloroquine toxicity. 2019.93] Jia Liu, Ruiyuan Cao, Mingyue Xu, Xi Wang, Huanyu Zhang, Hengrui Hu, Yufeng Li, Zhihong Hu, Wu Zhong,and Manli Wang. Hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov-2infection in vitro.
Cell discovery , 6(1):1–4, 2020.[4] Nicola Principi and Susanna Esposito. Chloroquine or hydroxychloroquine for prophylaxis of covid-19.
TheLancet Infectious Diseases , 2020.[5] Muskaan Sachdeva, Asfandyar Mufti, Khalad Maliyar, Yuliya Lytvyn, and Jensen Yeung. Hydroxychloroquineeffects on psoriasis: a systematic review and a cautionary note for covid-19 treatment.
Journal of the AmericanAcademy of Dermatology , 2020.[6] Matthieu Mahevas, Viet-Thi Tran, Mathilde Roumier, Amelie Chabrol, Romain Paule, Constance Guillaud,Sebastien Gallien, Raphael Lepeule, Tali-Anne Szwebel, Xavier Lescure, et al. No evidence of clinical efficacy ofhydroxychloroquine in patients hospitalized for covid-19 infection with oxygen requirement: results of a studyusing routinely collected data to emulate a target trial.
MedRxiv , 2020.[7] Yaakov HaCohen-Kerner, Ziv Ido, and Ronen Ya’akobov. Stance classification of tweets using skip char ngrams.In
Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages 266–278.Springer, 2017.[8] Parinaz Sobhani.
Stance detection and analysis in social media . PhD thesis, Université d’Ottawa/University ofOttawa, 2017.[9] Ahmet Aker, Leon Derczynski, and Kalina Bontcheva. Simple open stance classification for rumour analysis. arXiv preprint arXiv:1708.05286 , 2017.[10] Ramy Baly, Mitra Mohtarami, James Glass, Lluís Màrquez, Alessandro Moschitti, and Preslav Nakov. Integratingstance detection and fact checking in a unified corpus. arXiv preprint arXiv:1804.08012 , 2018.[11] Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, and Kalina Bontcheva. Stance detection with bidirectionalconditional encoding. arXiv preprint arXiv:1606.05464 , 2016.[12] Qiang Zhang, Emine Yilmaz, and Shangsong Liang. Ranking-based method for news stance detection. In
Companion Proceedings of the The Web Conference 2018 , pages 41–42, 2018.[13] Roy Bar-Haim, Indrajit Bhattacharya, Francesco Dinuzzo, Amrita Saha, and Noam Slonim. Stance classificationof context-dependent claims. In
Proceedings of the 15th Conference of the European Chapter of the Associationfor Computational Linguistics: Volume 1, Long Papers , pages 251–261, 2017.[14] William Ferreira and Andreas Vlachos. Emergent: a novel data-set for stance classification. In
Proceedings of the2016 conference of the North American chapter of the association for computational linguistics: Human languagetechnologies , pages 1163–1168, 2016.[15] Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. Semeval-2016 task6: Detecting stance in tweets. In
Proceedings of the 10th International Workshop on Semantic Evaluation(SemEval-2016) , pages 31–41, 2016.[16] Ramon Villa-Cox, Sumeet Kumar, Matthew Babcock, and Kathleen M Carley. Stance in replies and quotes (srq):A new dataset for learning stance in twitter conversations. arXiv preprint arXiv:2006.00691 , 2020.[17] Dimitar Dimitrov, Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and Stefan Dietze.Tweetscov19–a knowledge base of semantically annotated tweets about the covid-19 pandemic. arXiv preprintarXiv:2006.14492 , 2020.[18] Joshua Eisenberg and Mark Finlayson. Annotation guideline no. 1: Cover sheet for narrative boundaries annotationguide.
Journal of Cultural Analytics , page 11199, 2019.[19] Sejeong Kwon, Meeyoung Cha, and Kyomin Jung. Rumor detection over varying time windows.
PloS one ,12(1):e0168344, 2017.[20] Emily Chen, Kristina Lerman, and Emilio Ferrara. Tracking social media discourse about the covid-19 pandemic:Development of a public coronavirus twitter data set.
JMIR Public Health and Surveillance , 6(2):e19273, 2020.[21] Toktam A. Oghaz, Ece Çi˘gdem Mutlu, Jasser Jasser, Niloofar Yousefi, and Ivan Garibay. Probabilistic model ofnarratives over topical trends in social media: A discrete time model. In
Proceedings of the 31st ACM Conferenceon Hypertext and Social Media , HT ’20, page 281–290, New York, NY, USA, 2020. Association for ComputingMachinery.[22] Philippe Gautret, Jean-Christophe Lagier, Philippe Parola, Line Meddeb, Morgane Mailhe, Barbara Doudier,Johan Courjon, Valérie Giordanengo, Vera Esteves Vieira, Hervé Tissot Dupont, et al. Hydroxychloroquine andazithromycin as a treatment of covid-19: results of an open-label non-randomized clinical trial.
Internationaljournal of antimicrobial agents , page 105949, 2020.1023] Joshua Geleris, Yifei Sun, Jonathan Platt, Jason Zucker, Matthew Baldwin, George Hripcsak, Angelena Labella,Daniel K Manson, Christine Kubin, R Graham Barr, et al. Observational study of hydroxychloroquine inhospitalized patients with covid-19.
New England Journal of Medicine , 2020.[24] C Ponticelli and G Moroni. Hydroxychloroquine in systemic lupus erythematosus (sle).
Expert opinion on drugsafety , 16(3):411–419, 2017.[25] Joseph Magagnoli, Siddharth Narendran, Felipe Pereira, Tammy H Cummings, James W Hardin, S Scott Sutton,and Jayakrishna Ambati. Outcomes of hydroxychloroquine usage in united states veterans hospitalized withcovid-19.