[PDF] Eating Garlic Prevents COVID-19 Infection: Detecting Misinformation on the Arabic Content of Twitter

Abstract

The rapid growth of social media content during the current pandemic provides useful tools for disseminating information which has also become a root for misinformation. Therefore, there is an urgent need for fact-checking and effective techniques for detecting misinformation in social media. In this work, we study the misinformation in the Arabic content of Twitter. We construct a large Arabic dataset related to COVID-19 misinformation and gold-annotate the tweets into two categories: misinformation or not. Then, we apply eight different traditional and deep machine learning models, with different features including word embeddings and word frequency. The word embedding models (\textsc{FastText} and word2vec) exploit more than two million Arabic tweets related to COVID-19. Experiments show that optimizing the area under the curve (AUC) improves the models' performance and the Extreme Gradient Boosting (XGBoost) presents the highest accuracy in detecting COVID-19 misinformation online.

Full PDF

EE ATING G ARLIC P REVENTS

COVID-19 I

NFECTION :D ETECTING M ISINFORMATION ON THE A RABIC C ONTENT OF T WITTER

A P

REPRINT

Sarah Alqurashi ∗ Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi Arabia

Btool Hamoui

Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi Arabia

Abdulaziz Alashaikh

Computer and Networks Engineering DepartmentUniversity of JeddahJeddah, Saudi Arabia

Ahmad Alhindi

Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi Arabia

Eisa Alanazi

Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi ArabiaJanuary 15, 2021 A BSTRACT

AST T EXT and word2vec) exploit more than two million Arabic tweets relatedto COVID-19. Experiments show that optimizing the area under the curve (AUC) improves themodels’ performance and the Extreme Gradient Boosting (XGBoost) presents the highest accuracyin detecting COVID-19 misinformation online. ∗ Corresponding author: [email protected] a r X i v : . [ c s . I R ] J a n PREPRINT - J

ANUARY

15, 2021

The new coronavirus pandemic was accompanied by a large and rapid spread of rumors, false information, and fakenews. Misinformation has existed over the years and usually ﬂourish on various important issues such as healthoutbreaks, climate change, and vaccinations. Human crises are fertile ground for misinformation, as it happenedduring the Zika virus [1], Ebola [2], and others. Moreover, misinformation is intensiﬁed during sudden and intensecrises such as the COVID-19 pandemic. However, in the modern era, social media has helped magnify the spread ofmisinformation among individuals. As in recent times, there has been a global increase in the spread of informationin general, especially misinformation related to COVID-19 through various social media. The unprecedented amountof information poses serious public health challenges, especially concerning infectious diseases, which prompted theWorld Health Organization (WHO) to warn against the infodemic. The infodemic is a massive amount of correct andincorrect information, making it difﬁcult for individuals to access reliable information and credible guidance whenneeded [3]. This phenomenon, in turn, leads to a fast and easy spread of fake and unreliable information, especiallyon social media, which facilitates the diffusion of misinformation.Several conspiracy theories about the origins of the COVID-19 virus have spread on Arabic social media, all with acommon idea that the virus was a biological weapon. This misinformation started from social media accounts withno reliable proof to back their claims. Moreover, misleading information about the virus’s symptoms and how tocure the new virus and reduce its transmission circulates on social media. For example, a widespread misinformationclaimed that home remedies such as taking vitamin C and eating garlic can treat and prevent COVID-19 infection witha complete lack of evidence. Although some home remedies are harmless, some can be very dangerous. While thesemisinformation serve their promoters’ interests, it also harms societies. Especially that a high percentage of individualsdepend on a social media platform for information and news. Research has shown that the more individuals are exposedto false information and fake news, the more likely they will accept and believe it [4]. Misinformation confuses peopleand causes harm to the health of individuals. It may also incite violence, discrimination, or hostility against speciﬁcgroups in society. Furthermore, it may obstruct the efforts to control the current health crisis.Twitter is one of the most used social networking sites in the Arab world that has become a tool for spreading mis-information regarding COVID-19. A recent study shows that false information spreads six times faster than correctinformation on Twitter [5], which makes it challenging to ﬁnd accurate information on Twitter, causing an increasein mental distress and anxiety during the pandemic. One of the misinformation concerning factors is the spread rateon Twitter exceeds physical distances. The early spread of conspiracy theories and other false and misleading infor-mation may occur on Twitter. However, it may reach a larger audience once it appears, and it may be ampliﬁed bysocial media inﬂuencers as well through reports in unreliable media sites, which reduces the effectiveness of ofﬁcialsattempts in slowing the spread of such misleading information. As a result, individuals around the world are affectedmentally and physically by misinformation. The World Health Organization has teamed up with prominent socialmedia platforms such as Facebook, Twitter, and YouTube to ﬁght the infodemic by verifying irrelevant informationand providing evidence-based information to the public [6]. Despite the efforts made by different entities aroundthe globe, including WHO, governments, and social media sites, misinformation continues to spread widely. Theproblem lies in the difﬁculty of detecting and correcting misinformation in the Arabic content before it spreads morewidely. The Arabic language also poses a challenge because it has many dialects and rich vocabulary, which makesmisleading information exists in more than one dialect making it harder to detect. As a result, there is an urgent needto develop systems that are capable of automatically identifying misinformation in Arabic content. In this work, weinvestigate detecting Arabic misinformation on Twitter using natural language processing and machine learning. Ourcontributions to this area are summarized as follows:• We extract a sample of tweets from a large Arabic dataset related to the COVID-19 pandemic. Humanannotators are utilized for labeling the sample. With high-quality, human-powered data annotation, we canestimate the credibility of the considered tweets automatically.• We build two Arabic word embedding models using F

AST T EXT and word2vec based on more than twomillion Arabic tweets related to COVID-19 for a comparative analysis between the classiﬁers.• We examine the prediction performance on ﬁve traditional classiﬁers: Random Forests (RF), Extreme Gradi-ent Boosting (XGB), Naive Bayes (NB), Stochastic Gradient Descent (SGD), and Support Vector Machines(SVM) with different features in addition to three other deep learning classiﬁers CNN, RNN, and CRNN.• We improve the performance of all the models by optimizing the area under the curve using grid search forthe traditional classiﬁers and AUC loss function for the deep learning models.2

PREPRINT - J

ANUARY

15, 2021

To reduce misinformation on social media, it is essential to understand what the term misinformation means. Somescholars describe misinformation as false and inaccurate information that unintentionally transmits [7]. Usually, or-dinary users spread this type of misinformation because of their conﬁdence in the information source, whether theywere personally acquainted with or were inﬂuential users on their social network. They share the misinformation toinform people in their surroundings about a speciﬁc situation or story because they believe it is true.In contrast, disinformation is known as false and inaccurate information that is transmitted intentionally [7]. Usuallycarried by a group of people/writers or even publishers with a common goal to deceive the public and promote disinfor-mation. Disinformation includes conspiracy theories, fake news, and spams. The outcome of mis- and dis-informationis the same, whether it is published intentionally or not.On social media where users can post anything, it is difﬁcult for researchers to determine whether a piece of infor-mation was intentionally created or not. Therefore, misinformation has been identiﬁed as an umbrella term for allfalse and inaccurate information, regardless of the goal or intention [8]. The umbrella term misinformation includesfake news, which is a type of misinformation that mimics traditional news, rumors, which are unveriﬁed informationthat can be correct, and spams, unwanted information that exhaust its recipient [8, 9]. These misinformation typesshare a negative impact, as their impact extends on every aspect of life, and may have social and economic conse-quences. Furthermore, misinformation has a signiﬁcant impact on emergency response during disasters. It aims tomislead and confuse the public opinion and threaten public security and community stability, especially in the absenceof immediate intervention to combat it [9].

Due to social media ease of use, the spread of misinformation has expanded across ranges. The impact of misinforma-tion goes beyond personal life to affecting society and even the economy. One of the examples of misinformation thathas a negative effect is the spread of inaccurate information related to vaccinations. Anti-vaccinations groups claimthat vaccines cause autism, which caused fear of vaccinations among many parents, making them refuse or at leasthesitate to vaccinate their children, which caused an unprecedented increase in preventable diseases [10]. The fearof vaccination continue during the global pandemic of COVID-19, as some conspiracy theories spread through socialmedia platforms claiming that the COVID-19 vaccine contained a chip that controls humans.The amount of data on social media makes it difﬁcult to distinguish between misleading and accurate information.Therefore, identifying misinformation via social media has been a popular topic in recent years.Many studies in the English language have examined the presence of misinformation on social media, such as detect-ing rumors [11], fake news [12], spam [13] and heath misinformation [1, 2]. However, most of the Arabic languageresearch focused on identifying information credibility of news disseminated in Twitter. Often, the tweets were an-notated based on an annotator judgment and machine learning models are used based on user or content features, ora combination of both features [14, 15]. Some studies added new features such as sentiment analysis [16, 17], userreplies polarity[18], the similarity between username and display name [19], and TF-IDF [20]. The work in [21] usedcontent and user features to detect Arabic rumor from Twitter using semi-supervised expectation-maximization (E-M).The proposed model achieved an F1 score of 80%. However, little work so far has focused on detecting and trackinghealth misinformation in the Arabic language. Recently, a study that tackled the problem of detecting Arabic cancertreatment-related rumors on Twitter was presented in [22]. They utilized ten machine learning models using TF-IDFfeatures with different n-grams extracted form a dataset of 208 annotated tweets. An oversampling technique wasapplied to the dataset where it achieved F1 score of 0.86 by the random-forest model with oversampling and 5 gramTF-IDF features.There is a great body of work related to COVID-19 infodemic in social media. The evolution of misinformationwas studied on the Weibo social media platform [23] using misinformation identiﬁed by fact-checking platforms.Another study [24] examines the identiﬁcation of misinformation videos on YouTube using NLP and machine learning.Furthermore, the work in [25] presented an analysis of the evolution of the opinion of Singapore telegram group chatregarding COVID-19.The vast majority of COVID 19 infodemic studies on social media platforms focused on Twitter largely because Twitteris one of the most popular social media platforms. Twitter also provides access to a large amount of content in manylanguages. Along this line, many studies of misinformation on Twitter focused on analyzing the content of tweetsto understand Twitter conversion during COVID 19 [26, 27, 28]. To study the development of conversation aroundmisinformation on Twitter, Singh et al.[26], collected ﬁve common misinformation related to COVID-19, which are3

PREPRINT - J

ANUARY

15, 2021about the virus’s origin, vaccine development, ﬂu comparison, heat kills the disease, and home remedies. Each tweet isassigned to corresponding misinformation based on words and phrases in the tweets. The authors noticed an increasein the conversation around the misinformation since January 2020. In [29], a study of disseminating COVID 19misleading information and reliable information on Twitter using communicative content analysis shows that thelikelihood of misleading information to be retweeted is less than accurate information.Several studies have relied on fact-checking websites as ground truth data. In [30], authors collected the COVID-19related tweets that have been mentioned in fact-checking articles to study the source of misinformation and how itis spreading. The retweet speeds were used as a proxy for the propagation speed of misinformation. Their worksuggests that the propagation speed of misinformation is higher than accurate information. Another study on howmisinformation content spreads over ﬁve months on Twitter was presented in [31]. On a different note, the work in[27] presents a different measure of the tweets’ credibility based on user specialty and occupation.Considerable work has also focused on the quality of the links and information sources found in tweets in manylanguages (e.g., English [26], Italy [32]). The links were examined and classiﬁed as reputable sources or not, usingfact-checking websites [32] and well-known domains [26]. Low-quality links were less used in tweets than high-quality links.Researchers also studied the type of accounts that help spread false information about COVID-19. The role andbehavior of bot accounts on Twitter during COVID-19 were analyzed in [33, 34] where it was shown that Twitter botsparticipate in misinformation propagation on Twitter, either for political or marketing gain [33].Machine learning techniques have also been adapted to detect misinformation. The work in [35] discussed the chal-lenges in designing and developing an AI solutions for infodemic detection. Moreover, authors presented a tool toestimate whether an article is a misinformation based on URL checker, fake news classiﬁer, and website matcher. Amisleading information detection system was presented in [36]. The system relies on the fact-checking website andinternational organization data. The system is based on ensemble techniques built using 10 machine learning modelswith 7 feature extraction techniques. Another study in [37] deploys machine learning techniques on Twitter misinfor-mation using ensemble techniques based on user level and tweet level features. The models that had great accuracywere SVM and random forest.Most of the research has focused on the English language. However, there are very few studies on the Arabic language.In [38], they applied SVM, F

AST T EXT , and BERT on 218 Arabic tweets and 504 English tweets. The F

AST T EXT model provided the best result for Arabic text. The work in [39] study the Arabic conversation on Twitter by applyingtopic modeling. Machine learning models such as logistic regression, support vector machines, and naive Bayes wereused on 2000 labeled tweets to build rumors detection system. The highest accuracy is 84% achieved by logisticregression classiﬁer with vector count features. Also, they found out that rumors are usually written in an academicway and promoted by fake health professionals. Another study of COVID-19 misinformation was presented in [40].The study published a large, manually annotated dataset of Arabic tweets related to COVID-19. The tweets werelabeled based on 13 classes, including only 421 rumors. The author employed machine learning and transformermodels using Mazajak embeddings and TF-IDF n-gram for words and characters. The best model was SVC withTF-IDF characters n-gram with 0.79 F1 scores.In all previous studies investigating the Arabic content of misinformation on social media, the used datasets were verylimited. In this work, we construct one of the largest datasets of tweets for misinformation in Arabic language. Weprovide a comparative analysis between the classiﬁers based on using TF-IDF and Arabic word embedding models,built based on more than two million Arabic tweets related to COVID-19. Furthermore, we further optimize the areaunder the curve (AUC) to better improve the models accuracy.

The proposed system comprises several stages shown in Figure 1. It begins by collecting tweets using the Twitterstreaming API and ends with evaluating the models performance and comparing them. In the remaining of thissection, we describe the steps in details.

We collected a large number of Arabic tweets using the Twitter streaming application interface and Tweepy Pythonlibrary for four months from January 1, 2020, to April 30, 2020. We extracted tweets based on a list of the mostcommon Arabic keywords associated with COVID-19. We ﬁlter the Twitter stream based on the Arabic language andobtain relevant tweets about the pandemic. Table 2 shows a list of all the relevant Arabic keywords used to collect4

PREPRINT - J

ANUARY

15, 2021

Paper Purpose

AST T EXT and BERT[39] Detect COVID19 misinformation 2000 Count vector ,TF-IDF, andword embedding Traditional classiﬁers[40] Analyzing COVID19 Arabic Tweets 8000 TF-IDF n-gram for words andcharacters Traditional classiﬁersOur work Detect COVID19 misinformation 8786 TF-IDF, FastText,and word2vec word embedding Traditional and deep learning classiﬁers

Table 1: Summary for different attempts in detecting misinformation from the Arabic content of Twitter.tweets about COVID-19. The dataset contains more than 4,514,136 million tweets. We store the tweet’s full object,including the timestamp of the tweet, the id of the tweet, user proﬁle information including the number of followers,and geolocation of the tweet in a MongoDB NoSQL database. The dataset is available online on GitHub . When dealing with Arabic data, it is important to recognize the rich cultural and linguistic diversity across the Arabregion which may translate into different challenges (e.g., dialects) that must be addressed during the model’s devel-opment. It is also essential to consider the general features of the Twitter data. For example, tweets size is limited to280 characters. Despite this, the content of the tweets are varied and can consist of texts, symbols, URLs, pictures, andvideos. Furthermore, on Twitter, users tend to use informal writing methods to reduce the text’s length, while otherscan still comprehend it. Also, Twitter data contains large amounts of spelling errors and does not necessarily followthe language’s formal structure. Thus, Twitter data becomes very noisy. Accordingly, it is essential to apply somepre-processing to the raw text data before feeding it to the classiﬁers. We perform the following preprocessing stepsto the tweets:• We removed non-Arabic words.• We removed special characters such as ( (cid:63)(cid:44) (cid:33)(cid:44) (cid:46) (cid:44) (cid:12)(cid:64) ).• We performed text correction using Textblob python Library [41]• We Normalized Arabic text by : Replacing ( (cid:64) (cid:44) (cid:14)(cid:64) (cid:44) (cid:64)(cid:13) (cid:44) (cid:13)(cid:64) ) with ( (cid:64) ) and Replacing ( (cid:13)(cid:248) ) with ( (cid:64) ) and Replacing( (cid:248) ) with ( (cid:248)(cid:10) ) and Replacing ( (cid:16)(cid:232) ) with ( (cid:232) ) and Replacing ( (cid:13)(cid:240) ) with ( (cid:240) ) and Replacing ( (cid:16)(cid:232) ) with ( (cid:232) ) and Replacing( (cid:192) ) with ( (cid:188) )• We removed the repetition of characters such as ( (cid:201)(cid:103)(cid:46) (cid:64)(cid:64)(cid:64)(cid:64)(cid:65)(cid:171) ) turns into ( (cid:201)(cid:103)(cid:46) (cid:65)(cid:171) )• We removed stop words such as ( (cid:250)(cid:10) (cid:9)(cid:175) (cid:44) (cid:250)(cid:205)(cid:64)(cid:13) (cid:44) (cid:250)(cid:206)(cid:171) ) (from, to , in ).• We performed word stemming to convert each word to its corresponding root using Farasapy library [42] https://github.com/SarahAlqurashi/COVID-19-Arabic-Tweets-Dataset PREPRINT - J

ANUARY

15, 2021 A PREPRINT - J

ANUARY

9, 2021

Paper Purpose

AST T EXT and BERT[39] Detect COVID19 misinformation 2000 Count vector ,TF-IDF, andword embedding Traditional classiﬁers[40] Analyzing COVID19 Arabic Tweets 8000 TF-IDF n-gram for words andcharacters Traditional classiﬁersOur work Automatically detect Arabic misinformation related to COVID19 on Twitter 8786 TF-IDF, FastText,and word2vec word embedding Traditional and deep learning classiﬁers

Table 1: Summary of the Arabic Misinformation Literature.Figure 1: System architecture.tweets about COVID-19. The dataset contains more than 4,514,136 million tweets. We store the tweet’s full object,including the timestamp of the tweet, the id of the tweet, user proﬁle information including the number of followers,and geolocation of the tweet in a MongoDB NoSQL database. The dataset is available online on GitHub . https://github.com/SarahAlqurashi/COVID-19-Arabic-Tweets-Dataset Figure 1: System architecture.

This work copes with health-related misinformation detection. We do this by relyingon trusted sources of information. A recent work [43] shows that the ofﬁcial account of the Ministry of Health in SaudiArabia was among the top inﬂuential accounts in March 2020. Hence, we collected the false information reported onboth the World Health Organization (WHO) website and the Ministry of Health in Saudi Arabia website. Table 3shows a sample of tweets containing misinformation.

Dataset annotation: our misinformation dataset is sampled from tweets collected from early March 2020 to the endof April 2020. To narrow down the set of tweets without misinformation content, we use the similar procedure asused by [1]. We ﬁrst manually crafted a set of terms that best describe different misinformation. Then, we retrievedtweets related to those terms (e.g. “Vitamin C: (cid:250)(cid:10)(cid:230)(cid:133) (cid:9)(cid:225)(cid:30)(cid:10)(cid:211)(cid:65)(cid:16)(cid:74)(cid:74)(cid:10) (cid:9)(cid:175) ”, “Sarin gas: (cid:9)(cid:225)(cid:75)(cid:10)(cid:80)(cid:65)(cid:130)(cid:203)(cid:64) (cid:9)(cid:80)(cid:65) (cid:9)(cid:171) ”, “Mosquitoes: (cid:9)(cid:144)(cid:241)(cid:170)(cid:74)(cid:46)(cid:203)(cid:64) ”, and“Biological warfare: (cid:16)(cid:233)(cid:74)(cid:10)(cid:107)(cid:46) (cid:241)(cid:203)(cid:241)(cid:74)(cid:10)(cid:75)(cid:46) (cid:72)(cid:46) (cid:81)(cid:107) ”). The tweets were then combined into one dataset and labelled by two Arabicnative speaker volunteers. Before labeling the tweets, the annotators reviewed the list of the collected misinformation.Due to the substantial manual effort involved in labeling these tweets, each tweet in the dataset was labeled by exactlyone annotator. The tweet which contains misinformation were labeled ”1” and others were labeled by ”0”.In total, our misinformation dataset consists of 8,786 Arabic tweets, which contains 36,198 unique words after apply-ing pre-processing. Overall, our labelled misinformation dataset covers signiﬁcant misleading and inaccurate contentthat were widely circulated among Arabic tweeters during March and April. The number of tweets containing misin-formation in April (709 tweets) was higher than its counterpart in March (602 tweets). Table 4 shows general statisticsabout the dataset. Recall that we consider a (Misinformation) as the tweet that has been labeled by “1” and (Other) asthe tweet that has been labeled by “0” by the annotators. From Table 4, we observe that the dataset is unbalanced; themajority class (Other) has 7,475 more tweets than the minority class (Misinformation). The misinformation data-setis freely accessible on GitHub. . https://github.com/SarahAlqurashi/COVID19-Misinformation-dataset- PREPRINT - J

ANUARY

15, 2021

Keyword English Translation Tracing Date (cid:250)(cid:10)(cid:107)(cid:46) (cid:65)(cid:16)(cid:74)(cid:203)(cid:64) (cid:128)(cid:240)(cid:81)(cid:75)(cid:10)(cid:65) (cid:9)(cid:174)(cid:203)(cid:64)

Coronavirus 2020-01-01 (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187)

Corona 2020-01-01 (cid:9)(cid:224)(cid:65)(cid:235)(cid:240)(cid:240)

Wuhan 2020-01-01 (cid:9)(cid:225)(cid:30)(cid:10)(cid:146)(cid:203)(cid:64)

China 2020-01-01 (cid:250)(cid:10)(cid:230)(cid:17)(cid:132) (cid:9)(cid:174)(cid:16)(cid:75)

Outbreak 2020-01-01 (cid:16)(cid:233)(cid:211)(cid:65)(cid:210)(cid:187)

Mask 2020-01-01 (cid:16)(cid:72)(cid:65)(cid:211)(cid:65)(cid:210)(cid:187)

Masks 2020-01-01 (cid:16)(cid:72)(cid:65)(cid:210)(cid:16)(cid:174)(cid:170)(cid:211)

Sterilizers 2020-01-01 (cid:213)(cid:230)(cid:10) (cid:16)(cid:174)(cid:170)(cid:16)(cid:75)

Sterilization 2020-01-01 (cid:9)(cid:225)(cid:75)(cid:10)(cid:89)(cid:74)(cid:10)(cid:203)(cid:64) (cid:201)(cid:130) (cid:9)(cid:171)

Washing hands 2020-01-01 (cid:250)(cid:10)(cid:205) (cid:9)(cid:81)(cid:9)(cid:30)(cid:214)(cid:207)(cid:64) (cid:200) (cid:9)(cid:81)(cid:170)(cid:203)(cid:64)

Home isolation 2020-02-01 (cid:16)(cid:233)(cid:109)(cid:26)(cid:13)(cid:39)(cid:65)(cid:103)(cid:46)

Pandemic 2020-01-22 (cid:90)(cid:65)(cid:75)(cid:46) (cid:240)

Epidemic 2020-01-22 (cid:250)(cid:10)(cid:205) (cid:9)(cid:81)(cid:9)(cid:30)(cid:214)(cid:207)(cid:64) (cid:81)(cid:106)(cid:46) (cid:109)(cid:204)(cid:39)(cid:64)

Home quarantine 2020-02-01 (cid:49)(cid:57) (cid:89)(cid:74)(cid:10) (cid:9)(cid:175)(cid:241)(cid:187)

COVID 19 2020-03-01 (cid:200)(cid:241)(cid:106)(cid:46) (cid:16)(cid:74)(cid:203)(cid:64) (cid:81) (cid:9)(cid:162)(cid:107)

Curfew 2020-03-15 (cid:250)(cid:10)(cid:171)(cid:65)(cid:210)(cid:16)(cid:74)(cid:107)(cid:46) (cid:66)(cid:64) (cid:89)(cid:171)(cid:65)(cid:74)(cid:46)(cid:16)(cid:74)(cid:203)(cid:64)

Social distancing 2020-04-01 (cid:250)(cid:10)(cid:171)(cid:65)(cid:9)(cid:74)(cid:146)(cid:203)(cid:64) (cid:129) (cid:9)(cid:174)(cid:9)(cid:74)(cid:16)(cid:74)(cid:203)(cid:64) (cid:9)(cid:80)(cid:65)(cid:234)(cid:107)(cid:46)

Ventilator 2020-04-01 (cid:129) (cid:9)(cid:174)(cid:9)(cid:74)(cid:16)(cid:75) (cid:16)(cid:135)(cid:74)(cid:10) (cid:9)(cid:147)

Shortness of breath 2020-04-01 (cid:16)(cid:233)(cid:109)(cid:187)

Cough 2020-04-01 (cid:232)(cid:80)(cid:64)(cid:81)(cid:107) temperature 2020-04-01 (cid:145)(cid:9)(cid:29)(cid:240) (cid:81)(cid:16)(cid:30)(cid:211)

One and a half meters 2020-04-01 (cid:81)(cid:106)(cid:46) (cid:109)(cid:204)(cid:39)(cid:64) (cid:16)(cid:72)(cid:65)(cid:74)(cid:10)(cid:203)(cid:65)(cid:170) (cid:9)(cid:175)

Quarantine activities 2020-04-01 (cid:250)(cid:10)(cid:106)(cid:146)(cid:203)(cid:64) (cid:81)(cid:106)(cid:46) (cid:109)(cid:204)(cid:39)(cid:64)

Quarantine 2020-04-01 (cid:65)(cid:75)(cid:10)(cid:80)(cid:67)(cid:214)(cid:207)(cid:64) (cid:90)(cid:64)(cid:240)(cid:88)

Malaria medicine 2020-04-25 (cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:74)(cid:10)(cid:130)(cid:29)(cid:10)(cid:89)(cid:211)(cid:80)

Remdesivir 2020-04-25 (cid:81) (cid:9)(cid:162)(cid:109)(cid:204)(cid:39)(cid:64) (cid:169) (cid:9)(cid:175)(cid:80)

Curfew lift 2020-04-26 (cid:249)(cid:10) (cid:13)(cid:75) (cid:9)(cid:81)(cid:109)(cid:46)(cid:204)(cid:39)(cid:64) (cid:81) (cid:9)(cid:162)(cid:109)(cid:204)(cid:39)(cid:64)

Partial curfew 2020-04-26 (cid:249)(cid:10) (cid:170)(cid:210)(cid:16)(cid:74)(cid:106)(cid:46) (cid:214)(cid:207)(cid:64) (cid:105)(cid:130)(cid:214)(cid:207)(cid:64)

Active surveillance 2020-04-29 (cid:161) (cid:17)(cid:130) (cid:9)(cid:28)(cid:203)(cid:64) (cid:145)(cid:106) (cid:9)(cid:174)(cid:203)(cid:64)

Active testing 2020-04-29

Table 2: The list of keywords that we used to collect the tweets.

This step involves transforming the pre-processed tweets texts into the feature vector where we construct the featurevector for each tweet from a vectorization or word embedding. In that sense, from tokenized vectors of words, webuild the feature vectors using TF-IDF [44] and word embeddings techniques [45, 46]. The concept behind each ofthem is brieﬂy explained as follows:•

Sparse Vector Based on TF-IDF:

In this representation, the importance of a term/(n-gram) in a tweet isevaluated in relation to the whole dataset. This representation method gives high weights to terms that arespeciﬁc to some tweets and decreases the weight of frequently occurring words in the whole dataset. It iscomposed of term frequency (TF) and inverse document frequency (IDF) and it is computed using Equation1:

T F − IDF ( W i j ) = T F i x log( NDF i ) (1)7 PREPRINT - J

ANUARY

15, 2021

Misinformation headline Tweet examples

Eating garlic protectsagainst coronavirus Ar. (cid:9)(cid:224)(cid:65)(cid:130)(cid:9)(cid:29)(cid:66)(cid:64) (cid:213)(cid:230)(cid:132)(cid:107)(cid:46) (cid:250)(cid:10)(cid:9)(cid:175) (cid:233)(cid:171)(cid:65)(cid:9)(cid:74)(cid:214)(cid:207)(cid:64) (cid:233)(cid:75)(cid:10)(cid:241)(cid:16)(cid:174)(cid:16)(cid:74)(cid:203) (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:128)(cid:240)(cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:75)(cid:46) (cid:233)(cid:75)(cid:46) (cid:65)(cid:147)(cid:66)(cid:64) (cid:169)(cid:9)(cid:74)(cid:211) (cid:250)(cid:206)(cid:171) (cid:208)(cid:241)(cid:17)(cid:74)(cid:203)(cid:64) (cid:200)(cid:240)(cid:65)(cid:9)(cid:74)(cid:16)(cid:75) (cid:89)(cid:171)(cid:65)(cid:130)(cid:29)(cid:10)

En. Eating garlic helps prevent coronavirus infection, and to strengthen the body’s immunity.Gargling with salt protectsagainst corona virus Ar. (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:189)(cid:16)(cid:74)(cid:75)(cid:10)(cid:65)(cid:210)(cid:109)(cid:204) (cid:233)(cid:162)(cid:74)(cid:10)(cid:130)(cid:29)(cid:46) (cid:16)(cid:134)(cid:81)(cid:163) (cid:105)(cid:202)(cid:214)(cid:207)(cid:64)(cid:240) (cid:13)(cid:250)(cid:9)(cid:175)(cid:64)(cid:89)(cid:203)(cid:64) (cid:90)(cid:65)(cid:214)(cid:207)(cid:65)(cid:75)(cid:46) (cid:232)(cid:81) (cid:9)(cid:171)(cid:81) (cid:9)(cid:170)(cid:203)(cid:64)(cid:240) (cid:129)(cid:210) (cid:17)(cid:130)(cid:202)(cid:203) (cid:9)(cid:144)(cid:81)(cid:170)(cid:16)(cid:74)(cid:203)(cid:64)

En. Exposure to the sun and gargling with warm water and salt are simple ways to protect you from coronavirusCoronavirus is saringas Ar. (cid:116)(cid:26)(cid:39)(cid:10)(cid:80)(cid:65)(cid:16)(cid:75) (cid:250)(cid:10)(cid:230)(cid:238) (cid:16)(cid:68)(cid:9)(cid:74)(cid:16)(cid:28)(cid:131) (cid:241)(cid:109)(cid:46)(cid:204)(cid:39)(cid:65)(cid:75)(cid:46) (cid:232)(cid:64)(cid:65)(cid:16)(cid:174)(cid:75)(cid:46) (cid:232)(cid:81)(cid:16)(cid:30)(cid:9)(cid:175) (cid:90)(cid:65) (cid:9)(cid:146)(cid:16)(cid:174)(cid:9)(cid:75)(cid:64) (cid:89)(cid:171)(cid:241)(cid:211)(cid:240) (cid:81)(cid:238)(cid:68)(cid:17)(cid:133)(cid:64) (cid:90)(cid:64)(cid:241)(cid:234)(cid:203)(cid:65)(cid:75)(cid:46) (cid:232)(cid:80)(cid:65) (cid:17)(cid:130)(cid:16)(cid:28)(cid:9)(cid:75)(cid:64) (cid:232)(cid:89)(cid:211) (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:129)(cid:28)(cid:10)(cid:203)(cid:240) (cid:9)(cid:225)(cid:75)(cid:10)(cid:80)(cid:65)(cid:130)(cid:203)(cid:64) (cid:9)(cid:80)(cid:65) (cid:9)(cid:171) (cid:241)(cid:109)(cid:46)(cid:204)(cid:39)(cid:65)(cid:75)(cid:46) (cid:81)(cid:229)(cid:17)(cid:132)(cid:16)(cid:74)(cid:9)(cid:74)(cid:214)(cid:207)(cid:64)

En Sarin gas are spread in the atmosphere and not coronavirus, its diffusion in the air is months,Mosquitoes transmitinfection Ar. (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:128)(cid:240)(cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:203) (cid:9)(cid:144)(cid:241)(cid:170)(cid:74)(cid:46)(cid:203)(cid:64)(cid:240) (cid:233) (cid:9)(cid:174)(cid:74)(cid:10)(cid:203)(cid:66)(cid:64) (cid:16)(cid:72)(cid:65)(cid:9)(cid:75)(cid:64)(cid:241)(cid:74)(cid:10)(cid:109)(cid:204)(cid:39)(cid:64) (cid:201)(cid:16)(cid:174)(cid:9)(cid:75) (cid:16)(cid:73)(cid:74)(cid:46)(cid:17)(cid:28)(cid:75)(cid:10) (cid:233)(cid:9)(cid:74)(cid:28)(cid:10)(cid:16)(cid:74)(cid:211) (cid:80)(cid:88)(cid:65)(cid:146)(cid:211)

En trustable sources transferee pets and mosquitoes to the coronavirus5G Networks Spreadingthe Coronavirus Ar. (cid:209)(cid:234)(cid:211)(cid:65)(cid:130)(cid:107)(cid:46) (cid:64) (cid:250)(cid:10)(cid:9)(cid:175) (cid:9)(cid:225)(cid:30)(cid:10)(cid:106)(cid:46) (cid:130)(cid:187)(cid:240)(cid:66)(cid:64) (cid:145)(cid:16)(cid:174)(cid:9)(cid:75) (cid:250)(cid:10)(cid:9)(cid:175) (cid:73)(cid:46) (cid:28)(cid:46)(cid:130)(cid:16)(cid:29) (cid:65)(cid:220)(cid:216) (cid:233)(cid:74)(cid:10)(cid:17)(cid:75)(cid:80)(cid:65)(cid:191) (cid:232)(cid:80)(cid:241)(cid:146)(cid:29)(cid:46) (cid:128)(cid:240)(cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:203)(cid:64) (cid:81)(cid:229)(cid:17)(cid:132)(cid:9)(cid:29) (cid:250)(cid:10)(cid:9)(cid:175) (cid:16)(cid:72)(cid:89)(cid:171)(cid:65)(cid:131) (cid:53) (cid:16)(cid:233)(cid:186)(cid:74)(cid:46) (cid:17)(cid:131) (cid:233)(cid:74)(cid:10)(cid:9)(cid:74)(cid:28)(cid:10)(cid:146)(cid:203)(cid:64) (cid:9)(cid:224)(cid:65)(cid:235)(cid:240)(cid:240) (cid:250)(cid:10)(cid:9)(cid:175)(cid:209)(cid:234) (cid:9)(cid:174)(cid:16)(cid:75)(cid:64)(cid:241)(cid:235) (cid:250)(cid:10)(cid:9)(cid:175) (cid:51) (cid:16)(cid:233)(cid:186)(cid:74)(cid:46) (cid:17)(cid:131) (cid:208)(cid:64)(cid:89) (cid:9)(cid:106)(cid:16)(cid:74)(cid:131)(cid:66) (cid:64)(cid:241)(cid:170)(cid:107)(cid:46) (cid:81)(cid:9)(cid:175) (cid:189)(cid:203) (cid:9)(cid:88) (cid:9)(cid:225)(cid:30)(cid:10)(cid:146)(cid:203)(cid:64) (cid:16)(cid:73)(cid:187)(cid:80)(cid:88)(cid:64) (cid:168)(cid:80)(cid:64)(cid:241) (cid:17)(cid:130)(cid:203)(cid:64) (cid:250)(cid:10)(cid:9)(cid:175)

En In China Wuhan, Network 5G helped spread the virus in a catastrophic manner, causing a lack of oxygen in Humanbodies and falling onto the streets. China realized this, so they returned to using Network 3G in their phones

Table 3: Examples of Misinformation in tweetsMisinformation Other Total

T F i is the number of occurrences of i in j , N is the total number of tweets, and DF i is the numberof tweets containing the word i . We constructed TF-IDF vectors twice; once with unigrams and another withN-grams. We obtained for each tweet a sparse vector of dimension 5000 with both unigrams and N-gramsand used the Scikit-learn tool for the implementation.• Word Embeddings Creation:

In natural language processing, word embedding refer to the techniques usedin mapping words or phrases to vectors of real numbers. Word embedding methods represent words ascontinuous vectors in a low dimensional space. These vectors capture semantic information between words;the words with similar meaning will have vectors closer to each other.Building word embedding model using a large-scale training dataset is important to obtain meaningful em-beddings [47]. We built a word vectors model exploiting our whole COVID-19 dataset collected from January2020 to April 2020. By removing retweets and duplicated tweets, we ended with 2,821,940 tweets. We con-sider two noticeable word embeddings generation methods: word2vec, and F

AST T EXT . To train our models,we adopt the pre-processing pipeline of [48]. We investigate the following two types of word embeddings inthis work: – word2vec [45]: This is probably the most widely used technique to learn word embeddings utilizinga shallow feed forward neural network. To build word2vec model, we take into consideration themaximum length of a tweet is 280 characters, hence, we use small context window sizes W = 3 .The model trained using the CBOW algorithm with dimension D = 200 . For the rest of parameters,we set the batch size to , negative sampling to , minimum word frequency to and iterations to . – F AST T EXT [49]: in F

AST T EXT , the smallest unit is character-level n-grams and each word consistsof a bag of character n-grams. This representation helps capture the meaning of shorter words andprovides extraction of all preﬁxes and sufﬁxes of a given word. For this reason, F

AST T EXT has beenshown to be more accurate and effective comparing to word2vec [50]. To train F

AST T EXT model,we used a small window of size with a dimension of . We set minimum word frequency to and iterations to .It is worth noting that very few studies have been developed F AST T EXT models for the Arabiclanguage.The general word embeddings model such as the F

AST T EXT model developed by [46]was trained on the Arabic Wikipedia Articles written in Modern Standard Arabic (MSA), henceemploying such a model will not perform well on Twitter datasets. In the literature, the Arabicmisinformation studies that employed F

AST T EXT model, used F

AST T EXT model in an unsupervisedmanner to produce feature vectors in [39] while it was used in a supervised manner to predict theclass labels in [38].However, there is a lack of detail about the training set sizes and parameters usedin training the word embeddings model used in [39].A recent study showed that unsupervised pre-training F

AST T EXT on domain-speciﬁc can improve the classiﬁcation quality over the supervised8

PREPRINT - J

ANUARY

15, 2021one, particularly when the dataset labels are limited[51]. Hence, we opted to employ the F

AST T EXT method where models are pre-trained through unsupervised training in our classiﬁcation models.We use the Gensim [52] implementation for the word2vec and F

AST T EXT tools and we use the scheme thatwas used by [53] to build the tweet-level representation for the machine learning models. Given the word2vecor

FASTTEX models, we retrieve the vector representation of each word in each tweet, by averaging the wordvectors of all words per tweets, as follow: V tweet vec = (cid:80) ni =1 W i n (2)where n is the number of words in the tweet, and W i is the word2vec embedding for the i word. Thisrepresentation retains the number of dimensions ( D = 200 ) in the word embedding models. The wordembeddings models used in our work is made freely available . To automatically predict COVID-19 misinformation in the Arabic Twitter, we used different types of classiﬁers. Inthis section, we will present them in detail. The ﬁrst type of classiﬁers includes traditional (e.g., not deep) classiﬁerswhich are: support vector machine (SVM), multinomial naive Bayes (NB), Extreme Gradient Boosting (XGBoost),Random forest (RF), and Stochastic Gradient Descent (SGD). We used the implementation of these classiﬁers fromthe scikit-learn library [54].The second type are deep learning models. The deep learning classiﬁers involved a convolutional neural network(CNN), Recurrent Neural Networks with bidirectional long short-term memory (RNN BiLSTM), Convolutional Re-current Neural Networks (CRNN). We used the implementations of these classiﬁers in Pytorch [55]. Each proposeddeep learning model consists of an input embedding layer, hidden layer, a dense output layer, and an activation func-tion. The embedding layer is the ﬁrst layer of our deep learning models, and it creates a dense vector representationfrom the inputted text sequence. It can be initialized by a pre-train word embedding model or learned while training themodel. We experiment with three types of embedding layers to train the models. In the ﬁrst experiment, the weightsof the embedding layer are initialized randomly, and it will learn the embedding for all the words in the dataset. Thesecond and third embedding layers are initialized using the weights from pre-trained word2vec and F

AST T EXT . Oncethe embedding layer maps each sequence text into a vector representation, the embedding representation is fed intothe classiﬁers. The dense output layer takes the number of categories available as its output dimension. We used thesigmoid activation function and cross-entropy loss function. In the following, we describe the models structures .•

Convolutional Neural Network (CNN) : In the CNN model, we use a one-dimensional convolution layerwith a multi-scale kernel 4 and 5 with a ﬁxed length of 100 for each ﬁlter dimensionality. The kernel sizedeﬁnes the number of words to consider as the convolution passes over the word vector resulting in differentn-grams. The application of a convolution operation using one ﬁlter window over the word vector producesa new features map. After each convolution operation, we apply a nonlinear transformation using a RectiﬁedLinear Unit (ReLU) [56]. The convolved result is pooled using the maximum pooling operation to capturethe text’s most relevant features. Then all feature maps are concatenated in one single vector with a ﬁxedlength. Finally, we feed this vector through a fully-connected layer with a 0.5 dropout rate.•

Recurrent Neural Networks (RNN) : The RNN model consists of one bi-directional LSTM layer. The bi-directional LSTM train two LSTMs on the input sequence. The ﬁrst one examines the input sequence inforward order, and the second one examines the input sequence backward and then combines the informationfrom both ends to derive a single representation. This helps in learning a better feature representation andcapturing more sequential patterns from both directions. The bi-directional LSTM layer is followed by adropout layer and a fully connected layer.•

Convolutional Recurrent Neural Networks (CRNN) : For the ﬁnal model, we used a combination of one-dimensional convolution layer and ﬁve bi-directional LSTM layers to create CRNNs (Convolutional Recur-rent Neural Networks). The model uses a multi-scale convolutional layer with a kernel of 4 and 5 to extractmultiple map features from the input text. Each kernel has a ﬁxed length of 100. We apply a nonlineartransformation using a Rectiﬁed Linear Unit (ReLU) [56] to each feature map. Then, the max-pooling layerpools them separately to extract essential text features. Then the extracted features are concatenated and fedas input to bi-LSTM layers. The bi-LSTM extracted the text features, and the output of bi-LSTM is fed to afully connected layer. available at: https://github.com/BatoolHamawi/COVID-19WordEmbeddings PREPRINT - J

ANUARY

15, 2021Deep learning models have some advantages. For example, CNN automatically selects relevant words in tweets whilethe RNN-BiLSTM network captures the word patterns in tweets in two directions from right to left and vice versa and,unlike CNN, can manage the different lengths of tweets. The CRNN model combines the beneﬁts of both networks.

When dealing with imbalanced classiﬁcation tasks, it is natural for the classiﬁer to get biased toward the majority class.One of the most used techniques to solve the imbalance issue is to change the evaluation metric to a metric that tellsa more truthful story. Therefore, when evaluating the models performance, we report different metrics including theArea Under the ROC Curve (AUC), precision, recall, and F1. The deﬁnition of these measurements is brieﬂy outlinedas follows:

The Area Under the ROC Curve (AUC) : indicates the classiﬁers’ ability to distinguish between classes through theprobability curve ( ROC ). The AUC is deﬁned as follows:

AU C = (cid:88) i ∈ ( T P + F P + F N + T N ) ( T P R i + T P R i − ) . ( F P R i + F P R i − )2 (3) Precision : represent the percentage of positively classiﬁed tweets that actually correct. The precision is mathemati-cally expressed as follows:

P recision = T PT P + F P (4)

Recall : indicates the ability of the classiﬁers to classify all positive instances correctly. The recall is mathematicallyexpressed as follows:

Recall = T PT P + F N (5)Where

T P is the number of correctly identiﬁed tweets as misinformation,

F P is the number of incorrectly identiﬁedtweets as misinformation,

T N is the number of correctly identiﬁed tweets as not misinformation, and

F N the numberof incorrectly identiﬁed tweets as not misinformation.

F1 score : indicates the weighted harmonic mean of both precision and recall. The F1 is mathematically expressed asfollows: F score = 2 P recision · RecallP recision + Recall (6)

First, we shufﬂed the data to ensure that the model is not affected by order of the data. We randomly split the sampleof 8786 annotated tweets into training and testing sets (80:20 splits) for the traditional classiﬁers and for the deeplearning classiﬁers the sample was randomly split into training, testing, and validation sets (60:20:20 splits).

Classiﬁers Hyper-parameters

RF Classiﬁer Criterion:entropy, max depth:8, max features:log2, n estimators:500,class weight:balancedXGB Classiﬁer Colsample bytree:0.8, gamma:2, max depth:5, min child weight:1, subsample:1.0NB Classiﬁer Alpha= 0.5, ﬁt prior= TrueSGD Classiﬁe Alpha:0.0056, l1 ratio:0.13, loss: modiﬁed huber, penalty:l2,max iter:6000 , class weight:balancedSVC Classiﬁer C:1, gamma:1, kernel:rbf, probability:True , class weight:balancedTable 5: Hyper-parameter Settings for Traditional Classiﬁers10

PREPRINT - J

ANUARY

15, 2021

Classiﬁers Hyper-parametersCross Entropy Loss :

CNN max feature :3194 , max feature(F

AST T EXT ): 26275 , max feature(word2vec): 247180 ,dropout : 0.5 , kernel size: 4 & 5 ,ﬁlter: 100,epochs:500 , batch size: 32 , embedding size :200RNN max feature :3194 , max feature(F

AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 1 , dropout : 0.5 , epochs:500 ,batch size: 32 , embedding size :200CRNN max feature :3194 , max feature(F

AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 5 , dropout : 0.5 ,kernel size: 4 & 5 ,ﬁlter: 100, epochs:500 ,batch size: 32 , embedding size :200

AUCPR Loss :

CNN max feature(F

AST T EXT ): 26275 , max feature(word2vec ): 247180 ,dropout : 0.5 , kernel size: 4 & 5 ,ﬁlter: 100,epochs:600 , batch size: 32 , embedding size :200RNN max feature(F

AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 1 , dropout : 0.5 , epochs:600 ,batch size: 32 , embedding size :200CRNN max feature(F

AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 5 , dropout : 0.5 ,kernel size: 4 & 5 ,ﬁlter: 100, epochs:600 ,batch size: 32 , embedding size :200

Table 6: Hyper-parameter Settings for Deep Learning Classiﬁers.

Since our dataset is imbalanced, we constructed a grid search to ﬁnd the best hyper-parameters and maximize the AUCscore. In the grid-search function, we chose AUC as the scoring parameter with 5-fold cross-validation. The modeltrains in each fold with all training data using all parameter combinations. To ﬁnd the optimum parameter for the fold,each trained model is evaluated on the validation set. Then the trained model with the optimum parameters is usedon the test set. This procedure is repeated until the model that maximizes the AUC score is found. Table 5 show thehyper-parameter settings for traditional classiﬁers.

Classiﬁers FeaturesTF-IDF Word Level TF-IDF NgramAccuracy AUC Score Precision Recall F1 Score Accuracy AUC Score Precision Recall F1 ScoreRF Classiﬁer 80.7% 80.3% 0.37 0.57 0.45 78.2% 75.8 % 0.32 0.54 0.41XGB Classiﬁer 87.5% 80.6% 0.73 0.16 0.26 87.0% 70.9% 0.73 0.10 0.18NB Classiﬁer 87.9% 80.9% 0.70 0.22 0.34 86.7% 78.1% 0.55 0.24 0.34SGD Classiﬁer 79.9% 83.3% 0.37 0.68 0.48 79.4% 78.9% 0.35 0.54 0.44SVC Classiﬁer 87.8% 82.9% 0.59 0.39 0.47 82.0% 75.8% 0.38 0.50 0.43

Table 7: Traditional Classiﬁer Overall PerformanceWe trained the traditional classiﬁers using unigram and n-gram TF-IDF feature representations. The n-gram was acombination of bigrams and trigrams. We also report the results based on word2vec and F

AST T EXT embeddingsmethods. Table 7 shows the Accuracy, AUC, Precision, Recall, and F1 measure results of the traditional classiﬁersfor unigram and n-gram TF-IDF feature representations. The SVM classiﬁer with unigram feature representationsachieved the best accuracy of 87.8%, AUC score of 82.9%, and F-measure of 0.47, while the SGD classiﬁer reached11

PREPRINT - J

ANUARY

15, 2021 (a) (b)

Figure 2: Visualization of the ROC curves of traditional classiﬁers using (a) word2vec and (b) F

AST T EXT wordembedding techniques.the highest recall 0.68 and the best AUC score of 83.3% with unigram feature representations. For the precision, theXGB classiﬁer achieved the highest precision, 0.73 for both unigram and n-gram. The results indicate that N-gram’ssize inﬂuences the accuracy rate with different classiﬁers. All the classiﬁers achieved the highest performance whenusing TF-IDF unigram.

Classiﬁers Featuresword2vec F

AST T EXT

Accuracy AUC Score Precision Recall F1 Score Accuracy AUC Score Precision Recall F1 ScoreRF Classiﬁer 83.3% 83.6% 0.47 0.56 0.51 84.3% 84.3% 0.50 0.53 0.52XGB Classiﬁer 86.2% 85.4% 0.67 0.25 0.37 86.8% 85.4% 0.72 0.27 0.39NB Classiﬁer 74.4% 81.2% 0.35 0.741 0.47 73.4 80.4 0.33 0.69 0.45SGD Classiﬁer 74.0% 81.0% 0.34 0.71 0.46 73.8% 81.4% 0.34 0.74 0.47SVC Classiﬁer 76.6% 84.2% 0.38 0.81 0.52 77.8% 85.3 % 0.40 0.80 0.53

Table 8: Traditional Classiﬁer Overall Performance based on word embedding methodsUsing Both word2vec and F

AST T EXT word embeddings results in slightly higher AUC and F1 scores. The overallAUC is increased by 1 to 5 points, as shown in Table 8. Nevertheless, almost all classiﬁer’s performance improvedwith the trained word embeddings except for the SGD, where the AUC score decreased by 1 to 2 points. The highestAUC Score was generated by XGB classiﬁer using both word embedding methods. However, the XGB classiﬁer withF

AST T EXT performs the best among the traditional classiﬁers, giving as much as 85.4% AUC Score as well as secondbest precision of 0.72 an F1 score of 0.39, which signiﬁes that the predication by XGB classiﬁer is much better thanall other classiﬁers. Followed by the SVC classiﬁer with a close AUC Score of 85.3 % and 0.80 recall with highest F1score 0.53 among all classiﬁers. The ROC curve generated by the traditional classiﬁers using both word embeddingmethods are shown in Figure 2.

We trained the deep learning classiﬁers using adam optimizer to learn model parameters with varying learning rates,a batch size of 32, and 500 epochs to optimize the cross-entropy loss. Table 6 shows the hyper-parameter settings forthe deep learning classiﬁers. We reported the results with and without the pre-trained word embeddings. Without thepre-trained word embeddings, the accuracy of the CNN, RNN, and CRNN, 85.0%, 84.3%, 85.3%, respectfully, andthe AUC score was 50%, 50%, 50%, as shown in Table 9. The deep learning classiﬁers generalize toward the majorityclass. With the pre-trained word2vec and F

AST T EXT embeddings, the classiﬁers performance increased. The CRNNhas the best improvement with word2vec embedding as it outperformed all other classiﬁers with the highest AUC,precision, recall, and F1 score. Whereas for the CNN and RNN have the best improvement with the F

AST T EXT embedding, while the CRNN showed the worse performance with the F

AST T EXT embedding. The most signiﬁcantimprovement was with F

AST T EXT embedding and the CNN classiﬁer’s AUC score, which improved about 37.8%compared to the previous results. 12

PREPRINT - J

ANUARY

15, 2021 embeding method Classiﬁer Accuracy AUC Score Precision Recall F1 Score

Without pre-trained word embedding CNN 85.0% 50 % 0 0 0RNN 84.3% 50 % 0 0 0CRNN 85.3% 50 % 0 0 0F

AST T EXT

CNN 70% 68.6 % 0.75 0.43 0.54RNN 83.6% 64.3 % 1 0.28 0.44CRNN 85.0% 50 % 0 0 0word2vec CNN 85.7% 57.1 % 1 0.14 0.25RNN 82.9% 57.1 % 1 0.14 0.25CRNN 85.3% 64.3 % 1 0.29 0.44

Table 9: Deep learning Classiﬁer Overall PerformanceTo handle the imbalanced data set and further improve the classiﬁer performance, we conducted a second experiment.We trained the classiﬁers using AUCPRLoss loss function that optimize for AUC based on [57], they introducedsimple building block bounds that provide a uniﬁed framework for efﬁcient, salable optimization of a wide range ofobjectives, including directly optimizing AUC. We used Adam’s optimizer with varying learning rates, a batch sizeof 32, and 600 epochs. Table 9 shows the deep learning classiﬁers results After AUC optimization. Without the pre-trained word embeddings, the performance of the classiﬁer remarkably improved for all classiﬁers. The loss functionhad improved the AUC score about 26.8% for all classiﬁers, and the models were able to detect the minority class. embeding method Classiﬁer Accuracy AUC Score Precision Recall F1 Score

Without pre-trained word embedding CNN 84.2% 64.3 % 1 0.29 0.44RNN 81.6% 64.3 % 1 0.29 0.44CRNN 86.0% 64.3 % 1 0.29 0.44F

AST T EXT

CNN 74.9% 54.3 % 0.50 0.14 0.22RNN 83.5% 64.3 % 1 0.14 0.25CRNN 85.0% 50 % 0 0 0word2vec CNN 68.4% 61.8% 0.67 0.29 0.40RNN 84.5% 57.1 % 1 0.14 0.25CRNN 85.1% 64.3 % 1 0.29 0.44

Table 10: Deep learning Classiﬁer Overall Performance after AUC Score OptimizationUsing the pre-trained word embeddings with AUCPRLoss function seems to improve only some of classiﬁers. Usingword2vec improve the CNN classifer by 4.7 points. However, further hyper tuning the classiﬁers parameter mayincrease the performance. Table 10 shows the overall performance for deep learning classiﬁers after optimizing theAUC. Among the deep learning classiﬁers, The CNN with F

AST T EXT embeddings with cross-entropy loss achievedthe best performance overall with the highest AUC score, precision, recall, and F1 score.

As a language, Arabic is a rich and complex language that has a vast vocabulary. It is also a highly morphological andderivative language. The complexity increases due to the informal nature of social media texts. There are two mainforms of the Arabic language present on social media: Modern Standard Arabic (MSA) and Dialectical Arabic (DA).Where the MSA is used for formal writing, and DA is used for informal daily communication. Nevertheless, the latteris the most common form.Further complicating matters is that the Arabic language has many different dialects that people use in social media.The different dialects are one of the reasons for introducing many new words into any language, especially stop words[58]. Another challenge is the diacritics used in the Arabic orthographic system. Diacritics are used to representsmall vowels and to clarify the meanings of words. There are thirteen diacritics in the Arabic language [59]. ManyArabic vocabularies has more than one meaning based on diacritics. Such as (cid:73)(cid:46) (cid:130)(cid:107) , which can mean thought orcounting, depending on the context. Furthermore, there are Arabic vocabularies that have different meaning based onthe context, like the word (cid:208)(cid:65)(cid:171) , which can mean public or year based on the context [59]. In most of the Arabic tweets,the text is written without diacritics, where the reader is supposed to understand the purpose of the meaning. However,13

PREPRINT - J

ANUARY

15, 2021 (a) (b)

Figure 3: Visualization of the ROC curves of deep learning after AUC Score classiﬁers using (a) word2vec and (b)F

AST T EXT word embedding techniques.this does not apply to machines. In addition, the Arabic language contains a lot of grammatical rules that change theshape and meaning of the words.Despite all these challenges, the classiﬁers showed promising results in distinguishing COVID-19 misinformation inArabic tweets, which means that available machine learning methods can deliver high-performance and promisingclassiﬁers using an imbalanced dataset of tweets. Compared to deep learning, traditional classiﬁers have better per-formance with higher AUC values. Based on the experimental results, it is evident that feature selection can be aneffective technique to improve traditional classiﬁers’ performance.Although the deep learning models are biased toward the minority class, the models’ performance can increase usingpre-train word embedding or by optimizing the AUC score. Using pre-trained word embedding on the disease-speciﬁcdataset can be more accurate than using other generic pre-trained word embedding in detecting health misinformation[60]. As shown by the results, all classiﬁers perform better with the help of pre-train embedding than without it.The F

AST T EXT word embedding improved all classiﬁers’ performance except for CRNN and NB, While word2vecimprove the results for CRNN. The key difference between word2vec and F

AST T EXT is that during the learning phase,F

AST T EXT tackles each word as composed of character n-grams whereas word2vec tackles words as the smallest unit.Arabic is a morphological rich language, in addition, the social media posts such as tweets usually include informalwriting and could be written with a misspelling which creates ambiguity. For example, the word corona : (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) couldbe written with several spellings ( (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:65)(cid:191) , (cid:65)(cid:9)(cid:75)(cid:80)(cid:241)(cid:187) , (cid:65)(cid:9)(cid:75)(cid:240)(cid:81)(cid:186)(cid:203)(cid:64) , (cid:65)(cid:9)(cid:75)(cid:80)(cid:240)(cid:241)(cid:187) , (cid:65)(cid:9)(cid:75)(cid:240)(cid:240)(cid:81)(cid:187) ). The F AST T EXT trained the embeddings vectorson the subwords units which consider the language morphology [46]. Therefore, F

AST T EXT deals with mispelling andallows learning meaningful representation for rare words while word2vec ignores them. For this reason, we believethe F

AST T EXT tweet vectors representation is better compared to the word2vec representation. The Area underthe ROC Curve (AUC) measures how well a machine learning model distinguishes between positive and negativetweets. A few studies have shown that optimizing the AUC is extremely useful for evaluating the classiﬁer whenclass distributions are heavily imbalanced [61, 62]. Our results conﬁrm that maximizing the AUC score improvessome of classiﬁers’ results for imbalanced datasets. It increases the ability of the classiﬁers to recognizes the minorityclasses. Nevertheless, optimizing for AUC is difﬁcult because it requires dataset sorting, which makes it relativelyexpensive. As well, AUC is not continuous in the training set and, hence, most studies optimize for a variant of AUCthat is differentiable. Many methods have been developed that directly optimize the AUC during the training of theclassiﬁers [63, 64]. Future studies are needed to investigate the impact of different AUC optimization techniques fordetecting the misinformation.While the proposed dataset covers a diverse range of misinformation content, one limitation is that our work havebeen limited to tweets disseminated during March and April 2020. This is largely motivated by the fact that mostof the Arabic-speaking countries reported their ﬁrst conﬁrmed cases of COVID-19 during March, 2020. Due to thelack of proper awareness and knowledge among people, the false information mostly spread at the early stage of thepandemic. Experimenting with a larger datasets spanning a longer duration (e.g., 5 months) would be useful to extendour work and validate it. 14

PREPRINT - J

ANUARY

15, 2021 (a) learning curves for XGB classiﬁer (b) learning curves for SVC classiﬁer

Figure 4: learning curves for the classiﬁers

With the increase use of social media as a primary source of information, the distinction between correct and mis-leading information becomes very difﬁcult and critical, especially during the ongoing COVID-19 pandemic. Manyintervention strategies for COVID-19 depend on the quality and reliability of information shared between people. Sev-eral features in social media facilitate the spread of inaccurate information among users worldwide. Identifying andcombating misinformation is therefore a critical task during pandemics.In this work, we conducted an extensive experiment using real misinformation content from Twitter. We examineddifferent machine learning classiﬁers to identify Arabic misinformation related to COVID-19 automatically using anannotated dataset of 8786 tweets and employed word2vec and F

AST T EXT .Our results show that using word embeddings will indeed enhance the performance of the classiﬁer. F

AST T EXT produces better results with traditional classiﬁers and the CNN, while word2vec allows for better results with thedeep learning classiﬁers. Optimizing the AUC score improved the classiﬁers’ performance and the ability to handleimbalanced datasets. The XGB classiﬁer were shown to be capable of accurately identifying Arabic misinformationbased solely on a tweet’s text and it outperforms all other classiﬁers in terms of AUC, precision, recall, and F1. In theforeseeable future, we plan to improve deep learning classiﬁers by stacking multiple layers, and further optimize thehyperparameters, and possibly extend the study to include the very recent adabelief optimizer [65]. Finally, we indeedplan to consider other social networks that will help enriching our dataset and widening its applications.

Acknowledgement

This work was supported by King Abdulaziz City for Science and Technology. Grant Number: 5-20-01-007-0033.

References [1] A. Ghenai and Y. Mejova, “Catching zika fever: Application of crowdsourcing and machine learning for trackinghealth misinformation on twitter,” arXiv preprint arXiv:1707.03778 , 2017.[2] S. O. Oyeyemi, E. Gabarron, and R. Wynn, “Ebola, twitter, and misinformation: a dangerous combination?,”

Bmj

Proceedings of the National Academy of Sciences , vol. 113, no. 3,pp. 554–559, 2016.[5] Peter Dizikes, “study: on twitter, false news travels faster than true stories.” https://news.mit.edu/2018/study-twitter-false-news-travels-faster-true-stories-0308, 2020. (accessed: 2020-10-12).15

PREPRINT - J

ANUARY

15, 2021[6] J. Donovan, “Here’s how social media can combat the coronavirus ‘infodemic’,” 2020.[7] N. Persily and J. A. Tucker,

Social Media and Democracy: The State of the Field, Prospects for Reform . Cam-bridge University Press, 2020.[8] L. Wu, F. Morstatter, K. M. Carley, and H. Liu, “Misinformation in social media: deﬁnition, manipulation, anddetection,”

ACM SIGKDD Explorations Newsletter , vol. 21, no. 2, pp. 80–90, 2019.[9] M. R. Islam, S. Liu, X. Wang, and G. Xu, “Deep learning for misinformation detection on online social networks:a survey and new perspectives,”

Social Network Analysis and Mining , vol. 10, no. 1, pp. 1–20, 2020.[10] S. Lewandowsky, U. K. Ecker, C. M. Seifert, N. Schwarz, and J. Cook, “Misinformation and its correction: Con-tinued inﬂuence and successful debiasing,”

Psychological science in the public interest , vol. 13, no. 3, pp. 106–131, 2012.[11] M. S. Akhtar, A. Ekbal, S. Narayan, V. Singh, and E. Cambria, “No, that never happened!! investigating rumorson twitter,”

IEEE Intelligent Systems , vol. 33, no. 5, pp. 8–15, 2018.[12] C. Buntain and J. Golbeck, “Automatically identifying fake news in popular twitter threads,” in , pp. 208–215, IEEE, 2017.[13] A. H. Wang, “Don’t follow me: Spam detection in twitter,” in , pp. 1–10, IEEE, 2010.[14] R. M. B. Al-Eidan, H. S. Al-Khalifa, and A. S. Al-Salman, “Measuring the credibility of arabic text contentin twitter,” in , pp. 285–291,IEEE, 2010.[15] N. Y. Hassan, W. H. Gomaa, G. A. Khoriba, and M. H. Haggag, “Supervised learning approach for twittercredibility detection,” in ,pp. 196–201, IEEE, 2018.[16] G. Jardaneh, H. Abdelhaq, M. Buzz, and D. Johnson, “Classifying arabic tweets based on credibility usingcontent and user features,” in , pp. 596–601, IEEE, 2019.[17] R. El Ballouli, W. El-Hajj, A. Ghandour, S. Elbassuoni, H. Hajj, and K. Shaban, “Cat: Credibility analysis ofarabic content on twitter,” in

Proceedings of the Third Arabic Natural Language Processing Workshop , pp. 62–71, 2017.[18] S. F. SABBEH and S. Y. BAATWAH, “Arabic news credibility on twitter: An enhanced model using hybridfeatures.,”

Journal of Theoretical & Applied Information Technology , vol. 96, no. 8, 2018.[19] R. Mouty and A. Gazdar, “The effect of the similarity between the two names of twitter users on the credibilityof their publications,” in , pp. 196–201, IEEE,2019.[20] N. Hassan, W. Gomaa, G. Khoriba, and M. Haggag, “Credibility detection in twitter using word n-gram analysisand supervised machine learning techniques,”

Int. J. Intell. Eng. Syst , vol. 13, pp. 291–300, 2020.[21] S. M. Alzanin and A. M. Azmi, “Rumor detection in arabic tweets using semi-supervised and unsupervisedexpectation–maximization,”

Knowledge-Based Systems , vol. 185, p. 104945, 2019.[22] F. Saeed, M. Al-Sarem, E. A. Hezzam, and W. M. Yafooz, “Detecting health-related rumors on twitter usingmachine learning methods,”[23] Y. Leng, Y. Zhai, S. Sun, Y. Wu, J. Selzer, S. Strover, J. Fensel, A. Pentland, and Y. Ding, “Analysis of mis-information during the covid-19 outbreak in china: cultural, social and political entanglements,” arXiv preprintarXiv:2005.10414 , 2020.[24] J. C. M. Serrano, O. Papakyriakopoulos, and S. Hegelich, “Nlp-based feature extraction for the detection ofcovid-19 misinformation videos on youtube,” in

Proceedings of the 1st Workshop on NLP for COVID-19 at ACL2020 , 2020.[25] L. H. X. Ng and L. J. Yuan, “Is this pofma? analysing public opinion and misinformation in a covid-19 telegramgroup chat,” arXiv preprint arXiv:2010.10113 , 2020.[26] L. Singh, S. Bansal, L. Bode, C. Budak, G. Chi, K. Kawintiranon, C. Padden, R. Vanarsdall, E. Vraga,and Y. Wang, “A ﬁrst look at covid-19 information and misinformation sharing on twitter,” arXiv preprintarXiv:2003.13907 , 2020. 16

PREPRINT - J

ANUARY

15, 2021[27] A. Mourad, A. Srour, H. Harmanani, C. Jenainatiy, and M. Arafeh, “Critical impact of social networks info-demic on defeating coronavirus covid-19 pandemic: Twitter-based study and research directions,” arXiv preprintarXiv:2005.08820 , 2020.[28] R. J. Medford, S. N. Saleh, A. Sumarsono, T. M. Perl, and C. U. Lehmann, “An” infodemic”: Leveraging high-volume twitter data to understand public sentiment for the covid-19 outbreak,” medRxiv , 2020.[29] C. M. Pulido, B. Villarejo-Carballido, G. Redondo-Sama, and A. G´omez, “Covid-19 infodemic: Moreretweets for science-based information on coronavirus than for false information,”

International Sociology ,p. 0268580920914755, 2020.[30] G. K. Shahi, A. Dirkson, and T. A. Majchrzak, “An exploratory study of covid-19 misinformation on twitter,” arXiv preprint arXiv:2005.05710 , 2020.[31] L. McQuillan, E. McAweeney, A. Bargar, and A. Ruch, “Cultural convergence: Insights into the behavior ofmisinformation networks on twitter,” arXiv preprint arXiv:2007.03443 , 2020.[32] G. Caldarelli, R. De Nicola, M. Petrocchi, M. Pratelli, and F. Saracco, “Analysis of online misinformation duringthe peak of the covid-19 pandemics in italy,” arXiv preprint arXiv:2010.01913 , 2020.[33] T. Graham, A. Bruns, G. Zhu, and R. Campbell, “Like a virus: The coordinated spread of coronavirus disinfor-mation,” 2020.[34] E. Ferrara, “What types of covid-19 conspiracies are populated by twitter bots?,”

First Monday , 2020.[35] K. Ding, K. Shu, Y. Li, A. Bhattacharjee, and H. Liu, “Challenges in combating covid-19 infodemic–data, tools,and ethics,” arXiv preprint arXiv:2005.13691 , 2020.[36] M. K. Elhadad, K. F. Li, and F. Gebali, “Detecting misleading information on covid-19,”

IEEE Access , vol. 8,pp. 165201–165215, 2020.[37] M. S. Al-Rakhami and A. M. Al-Amri, “Lies kill, facts save: Detecting covid-19 misinformation in twitter,”

IEEE Access , vol. 8, pp. 155961–155970, 2020.[38] F. Alam, F. Dalvi, S. Shaar, N. Durrani, H. Mubarak, A. Nikolov, G. D. S. Martino, A. Abdelali, H. Sajjad,K. Darwish, et al. , “Fighting the covid-19 infodemic in social media: A holistic perspective and a call to arms,” arXiv preprint arXiv:2007.07996 , 2020.[39] L. Alsudias and P. Rayson, “Covid-19 and arabic twitter: How can arab world governments and public healthorganizations learn from social media?,” in

Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020 ,2020.[40] H. Mubarak and S. Hassan, “Arcorona: Analyzing arabic tweets in the early days of coronavirus (covid-19)pandemic,” arXiv preprint arXiv:2012.01462 , 2020.[41] , “Textblob: Simpliﬁed text processing.” https://textblob.readthedocs.io/en/dev/, 2020. (accessed: 2020-10-12).[42] , “farasapy.” https://pypi.org/project/farasapy/, 2020. (accessed: 2020-10-12).[43] S. Alqurashi, A. Alashaikh, and E. Alanazi, “Identifying information superspreaders of covid-19 from arabictweets,” , 2020.[44] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,”

Information processing &management , vol. 24, no. 5, pp. 513–523, 1988.[45] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efﬁcient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781 , 2013.[46] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,”

Trans-actions of the Association for Computational Linguistics , vol. 5, pp. 135–146, 2017.[47] M. Elrazzaz, S. Elbassuoni, K. Shaban, and C. Helwe, “Methodical evaluation of arabic word embeddings,”in

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: ShortPapers) , pp. 454–458, 2017.[48] M. A. Zahran, A. Magooda, A. Y. Mahgoub, H. Raafat, M. Rashwan, and A. Atyia, “Word representations invector space and their applications for arabic,” in

International Conference on Intelligent Text Processing andComputational Linguistics , pp. 430–443, Springer, 2015.[49] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. J´egou, and T. Mikolov, “Fasttext. zip: Compressing textclassiﬁcation models,” arXiv preprint arXiv:1612.03651 , 2016.[50] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efﬁcient text classiﬁcation,” arXiv preprintarXiv:1607.01759 , 2016. 17

PREPRINT - J

ANUARY

15, 2021[51] A. Agibetov, K. Blagec, H. Xu, and M. Samwald, “Fast and scalable neural embedding models for biomedicalsentence classiﬁcation,”

BMC bioinformatics , vol. 19, no. 1, p. 541, 2018.[52] R. ˇReh˚uˇrek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in

Proceedings of theLREC 2010 Workshop on New Challenges for NLP Frameworks , (Valletta, Malta), pp. 45–50, ELRA, May 2010.[53] X. Yang, C. Macdonald, and I. Ounis, “Using word embeddings in twitter election classiﬁcation,”

InformationRetrieval Journal , vol. 21, no. 2-3, pp. 183–207, 2018.[54] G. Varoquaux, L. Buitinck, G. Louppe, O. Grisel, F. Pedregosa, and A. Mueller, “Scikit-learn: Machine learningwithout learning the machinery,”

GetMobile: Mobile Computing and Communications , vol. 19, no. 1, pp. 29–33,2015.[55] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , “Pytorch: An imperative style, high-performance deep learning library,” in

Advances in neural informationprocessing systems , pp. 8026–8037, 2019.[56] V. Nair and G. E. Hinton, “Rectiﬁed linear units improve restricted boltzmann machines,” in

ICML , 2010.[57] E. Eban, M. Schain, A. Mackey, A. Gordon, R. Rifkin, and G. Elidan, “Scalable learning of non-decomposableobjectives,” in

Artiﬁcial intelligence and statistics , pp. 832–840, PMLR, 2017.[58] F. R. Alharbi and M. B. Khan, “Identifying comparative opinions in arabic text in social media using machinelearning techniques,”

SN Applied Sciences , vol. 1, no. 3, p. 213, 2019.[59] H. A. Almuzaini and A. M. Azmi, “Impact of stemming and word embedding on deep learning-based arabic textcategorization,”

IEEE Access , vol. 8, pp. 127913–127928, 2020.[60] A. Khatua, A. Khatua, and E. Cambria, “A tale of two epidemics: Contextual word2vec for classifying twitterstreams during outbreaks,”

Information Processing & Management , vol. 56, no. 1, pp. 247–257, 2019.[61] A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,”

Patternrecognition , vol. 30, no. 7, pp. 1145–1159, 1997.[62] J. Sulam, R. Ben-Ari, and P. Kisilev, “Maximizing auc with deep learning for classiﬁcation of imbalanced mam-mogram datasets.,” in

VCBM , pp. 131–135, 2017.[63] Q. Wang and A. Guo, “An efﬁcient variance estimator of auc and its applications to binary classiﬁcation,”

Statis-tics in Medicine , vol. 39, no. 28, pp. 4281–4300, 2020.[64] M. Liu, Z. Yuan, Y. Ying, and T. Yang, “Stochastic auc maximization with deep neural networks,” arXiv preprintarXiv:1908.10831 , 2019.[65] J. Zhuang, T. Tang, S. Tatikonda, N. Dvornek, Y. Ding, X. Papademetris, and J. S. Duncan, “Adabelief optimizer:Adapting stepsizes by the belief in observed gradients,” arXiv preprint arXiv:2010.07468arXiv preprint arXiv:2010.07468