Eating Garlic Prevents COVID-19 Infection: Detecting Misinformation on the Arabic Content of Twitter
Sarah Alqurashi, Btool Hamoui, Abdulaziz Alashaikh, Ahmad Alhindi, Eisa Alanazi
EE ATING G ARLIC P REVENTS
COVID-19 I
NFECTION :D ETECTING M ISINFORMATION ON THE A RABIC C ONTENT OF T WITTER
A P
REPRINT
Sarah Alqurashi ∗ Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi Arabia
Btool Hamoui
Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi Arabia
Abdulaziz Alashaikh
Computer and Networks Engineering DepartmentUniversity of JeddahJeddah, Saudi Arabia
Ahmad Alhindi
Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi Arabia
Eisa Alanazi
Center of Innovation and Development in Artif. Intell. (CIADA)Umm Al-Qura UniversityMakkah, Saudi ArabiaJanuary 15, 2021 A BSTRACT
The rapid growth of social media content during the current pandemic provides useful tools fordisseminating information which has also become a root for misinformation. Therefore, there isan urgent need for fact-checking and effective techniques for detecting misinformation in socialmedia. In this work, we study the misinformation in the Arabic content of Twitter. We constructa large Arabic dataset related to COVID-19 misinformation and gold-annotate the tweets into twocategories: misinformation or not. Then, we apply eight different traditional and deep machinelearning models, with different features including word embeddings and word frequency. The wordembedding models (F
AST T EXT and word2vec) exploit more than two million Arabic tweets relatedto COVID-19. Experiments show that optimizing the area under the curve (AUC) improves themodels’ performance and the Extreme Gradient Boosting (XGBoost) presents the highest accuracyin detecting COVID-19 misinformation online. ∗ Corresponding author: [email protected] a r X i v : . [ c s . I R ] J a n PREPRINT - J
ANUARY
15, 2021
The new coronavirus pandemic was accompanied by a large and rapid spread of rumors, false information, and fakenews. Misinformation has existed over the years and usually flourish on various important issues such as healthoutbreaks, climate change, and vaccinations. Human crises are fertile ground for misinformation, as it happenedduring the Zika virus [1], Ebola [2], and others. Moreover, misinformation is intensified during sudden and intensecrises such as the COVID-19 pandemic. However, in the modern era, social media has helped magnify the spread ofmisinformation among individuals. As in recent times, there has been a global increase in the spread of informationin general, especially misinformation related to COVID-19 through various social media. The unprecedented amountof information poses serious public health challenges, especially concerning infectious diseases, which prompted theWorld Health Organization (WHO) to warn against the infodemic. The infodemic is a massive amount of correct andincorrect information, making it difficult for individuals to access reliable information and credible guidance whenneeded [3]. This phenomenon, in turn, leads to a fast and easy spread of fake and unreliable information, especiallyon social media, which facilitates the diffusion of misinformation.Several conspiracy theories about the origins of the COVID-19 virus have spread on Arabic social media, all with acommon idea that the virus was a biological weapon. This misinformation started from social media accounts withno reliable proof to back their claims. Moreover, misleading information about the virus’s symptoms and how tocure the new virus and reduce its transmission circulates on social media. For example, a widespread misinformationclaimed that home remedies such as taking vitamin C and eating garlic can treat and prevent COVID-19 infection witha complete lack of evidence. Although some home remedies are harmless, some can be very dangerous. While thesemisinformation serve their promoters’ interests, it also harms societies. Especially that a high percentage of individualsdepend on a social media platform for information and news. Research has shown that the more individuals are exposedto false information and fake news, the more likely they will accept and believe it [4]. Misinformation confuses peopleand causes harm to the health of individuals. It may also incite violence, discrimination, or hostility against specificgroups in society. Furthermore, it may obstruct the efforts to control the current health crisis.Twitter is one of the most used social networking sites in the Arab world that has become a tool for spreading mis-information regarding COVID-19. A recent study shows that false information spreads six times faster than correctinformation on Twitter [5], which makes it challenging to find accurate information on Twitter, causing an increasein mental distress and anxiety during the pandemic. One of the misinformation concerning factors is the spread rateon Twitter exceeds physical distances. The early spread of conspiracy theories and other false and misleading infor-mation may occur on Twitter. However, it may reach a larger audience once it appears, and it may be amplified bysocial media influencers as well through reports in unreliable media sites, which reduces the effectiveness of officialsattempts in slowing the spread of such misleading information. As a result, individuals around the world are affectedmentally and physically by misinformation. The World Health Organization has teamed up with prominent socialmedia platforms such as Facebook, Twitter, and YouTube to fight the infodemic by verifying irrelevant informationand providing evidence-based information to the public [6]. Despite the efforts made by different entities aroundthe globe, including WHO, governments, and social media sites, misinformation continues to spread widely. Theproblem lies in the difficulty of detecting and correcting misinformation in the Arabic content before it spreads morewidely. The Arabic language also poses a challenge because it has many dialects and rich vocabulary, which makesmisleading information exists in more than one dialect making it harder to detect. As a result, there is an urgent needto develop systems that are capable of automatically identifying misinformation in Arabic content. In this work, weinvestigate detecting Arabic misinformation on Twitter using natural language processing and machine learning. Ourcontributions to this area are summarized as follows:• We extract a sample of tweets from a large Arabic dataset related to the COVID-19 pandemic. Humanannotators are utilized for labeling the sample. With high-quality, human-powered data annotation, we canestimate the credibility of the considered tweets automatically.• We build two Arabic word embedding models using F
AST T EXT and word2vec based on more than twomillion Arabic tweets related to COVID-19 for a comparative analysis between the classifiers.• We examine the prediction performance on five traditional classifiers: Random Forests (RF), Extreme Gradi-ent Boosting (XGB), Naive Bayes (NB), Stochastic Gradient Descent (SGD), and Support Vector Machines(SVM) with different features in addition to three other deep learning classifiers CNN, RNN, and CRNN.• We improve the performance of all the models by optimizing the area under the curve using grid search forthe traditional classifiers and AUC loss function for the deep learning models.2
PREPRINT - J
ANUARY
15, 2021
To reduce misinformation on social media, it is essential to understand what the term misinformation means. Somescholars describe misinformation as false and inaccurate information that unintentionally transmits [7]. Usually, or-dinary users spread this type of misinformation because of their confidence in the information source, whether theywere personally acquainted with or were influential users on their social network. They share the misinformation toinform people in their surroundings about a specific situation or story because they believe it is true.In contrast, disinformation is known as false and inaccurate information that is transmitted intentionally [7]. Usuallycarried by a group of people/writers or even publishers with a common goal to deceive the public and promote disinfor-mation. Disinformation includes conspiracy theories, fake news, and spams. The outcome of mis- and dis-informationis the same, whether it is published intentionally or not.On social media where users can post anything, it is difficult for researchers to determine whether a piece of infor-mation was intentionally created or not. Therefore, misinformation has been identified as an umbrella term for allfalse and inaccurate information, regardless of the goal or intention [8]. The umbrella term misinformation includesfake news, which is a type of misinformation that mimics traditional news, rumors, which are unverified informationthat can be correct, and spams, unwanted information that exhaust its recipient [8, 9]. These misinformation typesshare a negative impact, as their impact extends on every aspect of life, and may have social and economic conse-quences. Furthermore, misinformation has a significant impact on emergency response during disasters. It aims tomislead and confuse the public opinion and threaten public security and community stability, especially in the absenceof immediate intervention to combat it [9].
Due to social media ease of use, the spread of misinformation has expanded across ranges. The impact of misinforma-tion goes beyond personal life to affecting society and even the economy. One of the examples of misinformation thathas a negative effect is the spread of inaccurate information related to vaccinations. Anti-vaccinations groups claimthat vaccines cause autism, which caused fear of vaccinations among many parents, making them refuse or at leasthesitate to vaccinate their children, which caused an unprecedented increase in preventable diseases [10]. The fearof vaccination continue during the global pandemic of COVID-19, as some conspiracy theories spread through socialmedia platforms claiming that the COVID-19 vaccine contained a chip that controls humans.The amount of data on social media makes it difficult to distinguish between misleading and accurate information.Therefore, identifying misinformation via social media has been a popular topic in recent years.Many studies in the English language have examined the presence of misinformation on social media, such as detect-ing rumors [11], fake news [12], spam [13] and heath misinformation [1, 2]. However, most of the Arabic languageresearch focused on identifying information credibility of news disseminated in Twitter. Often, the tweets were an-notated based on an annotator judgment and machine learning models are used based on user or content features, ora combination of both features [14, 15]. Some studies added new features such as sentiment analysis [16, 17], userreplies polarity[18], the similarity between username and display name [19], and TF-IDF [20]. The work in [21] usedcontent and user features to detect Arabic rumor from Twitter using semi-supervised expectation-maximization (E-M).The proposed model achieved an F1 score of 80%. However, little work so far has focused on detecting and trackinghealth misinformation in the Arabic language. Recently, a study that tackled the problem of detecting Arabic cancertreatment-related rumors on Twitter was presented in [22]. They utilized ten machine learning models using TF-IDFfeatures with different n-grams extracted form a dataset of 208 annotated tweets. An oversampling technique wasapplied to the dataset where it achieved F1 score of 0.86 by the random-forest model with oversampling and 5 gramTF-IDF features.There is a great body of work related to COVID-19 infodemic in social media. The evolution of misinformationwas studied on the Weibo social media platform [23] using misinformation identified by fact-checking platforms.Another study [24] examines the identification of misinformation videos on YouTube using NLP and machine learning.Furthermore, the work in [25] presented an analysis of the evolution of the opinion of Singapore telegram group chatregarding COVID-19.The vast majority of COVID 19 infodemic studies on social media platforms focused on Twitter largely because Twitteris one of the most popular social media platforms. Twitter also provides access to a large amount of content in manylanguages. Along this line, many studies of misinformation on Twitter focused on analyzing the content of tweetsto understand Twitter conversion during COVID 19 [26, 27, 28]. To study the development of conversation aroundmisinformation on Twitter, Singh et al.[26], collected five common misinformation related to COVID-19, which are3
PREPRINT - J
ANUARY
15, 2021about the virus’s origin, vaccine development, flu comparison, heat kills the disease, and home remedies. Each tweet isassigned to corresponding misinformation based on words and phrases in the tweets. The authors noticed an increasein the conversation around the misinformation since January 2020. In [29], a study of disseminating COVID 19misleading information and reliable information on Twitter using communicative content analysis shows that thelikelihood of misleading information to be retweeted is less than accurate information.Several studies have relied on fact-checking websites as ground truth data. In [30], authors collected the COVID-19related tweets that have been mentioned in fact-checking articles to study the source of misinformation and how itis spreading. The retweet speeds were used as a proxy for the propagation speed of misinformation. Their worksuggests that the propagation speed of misinformation is higher than accurate information. Another study on howmisinformation content spreads over five months on Twitter was presented in [31]. On a different note, the work in[27] presents a different measure of the tweets’ credibility based on user specialty and occupation.Considerable work has also focused on the quality of the links and information sources found in tweets in manylanguages (e.g., English [26], Italy [32]). The links were examined and classified as reputable sources or not, usingfact-checking websites [32] and well-known domains [26]. Low-quality links were less used in tweets than high-quality links.Researchers also studied the type of accounts that help spread false information about COVID-19. The role andbehavior of bot accounts on Twitter during COVID-19 were analyzed in [33, 34] where it was shown that Twitter botsparticipate in misinformation propagation on Twitter, either for political or marketing gain [33].Machine learning techniques have also been adapted to detect misinformation. The work in [35] discussed the chal-lenges in designing and developing an AI solutions for infodemic detection. Moreover, authors presented a tool toestimate whether an article is a misinformation based on URL checker, fake news classifier, and website matcher. Amisleading information detection system was presented in [36]. The system relies on the fact-checking website andinternational organization data. The system is based on ensemble techniques built using 10 machine learning modelswith 7 feature extraction techniques. Another study in [37] deploys machine learning techniques on Twitter misinfor-mation using ensemble techniques based on user level and tweet level features. The models that had great accuracywere SVM and random forest.Most of the research has focused on the English language. However, there are very few studies on the Arabic language.In [38], they applied SVM, F
AST T EXT , and BERT on 218 Arabic tweets and 504 English tweets. The F
AST T EXT model provided the best result for Arabic text. The work in [39] study the Arabic conversation on Twitter by applyingtopic modeling. Machine learning models such as logistic regression, support vector machines, and naive Bayes wereused on 2000 labeled tweets to build rumors detection system. The highest accuracy is 84% achieved by logisticregression classifier with vector count features. Also, they found out that rumors are usually written in an academicway and promoted by fake health professionals. Another study of COVID-19 misinformation was presented in [40].The study published a large, manually annotated dataset of Arabic tweets related to COVID-19. The tweets werelabeled based on 13 classes, including only 421 rumors. The author employed machine learning and transformermodels using Mazajak embeddings and TF-IDF n-gram for words and characters. The best model was SVC withTF-IDF characters n-gram with 0.79 F1 scores.In all previous studies investigating the Arabic content of misinformation on social media, the used datasets were verylimited. In this work, we construct one of the largest datasets of tweets for misinformation in Arabic language. Weprovide a comparative analysis between the classifiers based on using TF-IDF and Arabic word embedding models,built based on more than two million Arabic tweets related to COVID-19. Furthermore, we further optimize the areaunder the curve (AUC) to better improve the models accuracy.
The proposed system comprises several stages shown in Figure 1. It begins by collecting tweets using the Twitterstreaming API and ends with evaluating the models performance and comparing them. In the remaining of thissection, we describe the steps in details.
We collected a large number of Arabic tweets using the Twitter streaming application interface and Tweepy Pythonlibrary for four months from January 1, 2020, to April 30, 2020. We extracted tweets based on a list of the mostcommon Arabic keywords associated with COVID-19. We filter the Twitter stream based on the Arabic language andobtain relevant tweets about the pandemic. Table 2 shows a list of all the relevant Arabic keywords used to collect4
PREPRINT - J
ANUARY
15, 2021
Paper Purpose
AST T EXT and BERT[39] Detect COVID19 misinformation 2000 Count vector ,TF-IDF, andword embedding Traditional classifiers[40] Analyzing COVID19 Arabic Tweets 8000 TF-IDF n-gram for words andcharacters Traditional classifiersOur work Detect COVID19 misinformation 8786 TF-IDF, FastText,and word2vec word embedding Traditional and deep learning classifiers
Table 1: Summary for different attempts in detecting misinformation from the Arabic content of Twitter.tweets about COVID-19. The dataset contains more than 4,514,136 million tweets. We store the tweet’s full object,including the timestamp of the tweet, the id of the tweet, user profile information including the number of followers,and geolocation of the tweet in a MongoDB NoSQL database. The dataset is available online on GitHub . When dealing with Arabic data, it is important to recognize the rich cultural and linguistic diversity across the Arabregion which may translate into different challenges (e.g., dialects) that must be addressed during the model’s devel-opment. It is also essential to consider the general features of the Twitter data. For example, tweets size is limited to280 characters. Despite this, the content of the tweets are varied and can consist of texts, symbols, URLs, pictures, andvideos. Furthermore, on Twitter, users tend to use informal writing methods to reduce the text’s length, while otherscan still comprehend it. Also, Twitter data contains large amounts of spelling errors and does not necessarily followthe language’s formal structure. Thus, Twitter data becomes very noisy. Accordingly, it is essential to apply somepre-processing to the raw text data before feeding it to the classifiers. We perform the following preprocessing stepsto the tweets:• We removed non-Arabic words.• We removed special characters such as ( (cid:63)(cid:44) (cid:33)(cid:44) (cid:46) (cid:44) (cid:12)(cid:64) ).• We performed text correction using Textblob python Library [41]• We Normalized Arabic text by : Replacing ( (cid:64) (cid:44) (cid:14)(cid:64) (cid:44) (cid:64)(cid:13) (cid:44) (cid:13)(cid:64) ) with ( (cid:64) ) and Replacing ( (cid:13)(cid:248) ) with ( (cid:64) ) and Replacing( (cid:248) ) with ( (cid:248)(cid:10) ) and Replacing ( (cid:16)(cid:232) ) with ( (cid:232) ) and Replacing ( (cid:13)(cid:240) ) with ( (cid:240) ) and Replacing ( (cid:16)(cid:232) ) with ( (cid:232) ) and Replacing( (cid:192) ) with ( (cid:188) )• We removed the repetition of characters such as ( (cid:201)(cid:103)(cid:46) (cid:64)(cid:64)(cid:64)(cid:64)(cid:65)(cid:171) ) turns into ( (cid:201)(cid:103)(cid:46) (cid:65)(cid:171) )• We removed stop words such as ( (cid:250)(cid:10) (cid:9)(cid:175) (cid:44) (cid:250)(cid:205)(cid:64)(cid:13) (cid:44) (cid:250)(cid:206)(cid:171) ) (from, to , in ).• We performed word stemming to convert each word to its corresponding root using Farasapy library [42] https://github.com/SarahAlqurashi/COVID-19-Arabic-Tweets-Dataset PREPRINT - J
ANUARY
15, 2021 A PREPRINT - J
ANUARY
9, 2021
Paper Purpose
AST T EXT and BERT[39] Detect COVID19 misinformation 2000 Count vector ,TF-IDF, andword embedding Traditional classifiers[40] Analyzing COVID19 Arabic Tweets 8000 TF-IDF n-gram for words andcharacters Traditional classifiersOur work Automatically detect Arabic misinformation related to COVID19 on Twitter 8786 TF-IDF, FastText,and word2vec word embedding Traditional and deep learning classifiers
Table 1: Summary of the Arabic Misinformation Literature.Figure 1: System architecture.tweets about COVID-19. The dataset contains more than 4,514,136 million tweets. We store the tweet’s full object,including the timestamp of the tweet, the id of the tweet, user profile information including the number of followers,and geolocation of the tweet in a MongoDB NoSQL database. The dataset is available online on GitHub . https://github.com/SarahAlqurashi/COVID-19-Arabic-Tweets-Dataset Figure 1: System architecture.
This work copes with health-related misinformation detection. We do this by relyingon trusted sources of information. A recent work [43] shows that the official account of the Ministry of Health in SaudiArabia was among the top influential accounts in March 2020. Hence, we collected the false information reported onboth the World Health Organization (WHO) website and the Ministry of Health in Saudi Arabia website. Table 3shows a sample of tweets containing misinformation.
Dataset annotation: our misinformation dataset is sampled from tweets collected from early March 2020 to the endof April 2020. To narrow down the set of tweets without misinformation content, we use the similar procedure asused by [1]. We first manually crafted a set of terms that best describe different misinformation. Then, we retrievedtweets related to those terms (e.g. “Vitamin C: (cid:250)(cid:10)(cid:230)(cid:133) (cid:9)(cid:225)(cid:30)(cid:10)(cid:211)(cid:65)(cid:16)(cid:74)(cid:74)(cid:10) (cid:9)(cid:175) ”, “Sarin gas: (cid:9)(cid:225)(cid:75)(cid:10)(cid:80)(cid:65)(cid:130)(cid:203)(cid:64) (cid:9)(cid:80)(cid:65) (cid:9)(cid:171) ”, “Mosquitoes: (cid:9)(cid:144)(cid:241)(cid:170)(cid:74)(cid:46)(cid:203)(cid:64) ”, and“Biological warfare: (cid:16)(cid:233)(cid:74)(cid:10)(cid:107)(cid:46) (cid:241)(cid:203)(cid:241)(cid:74)(cid:10)(cid:75)(cid:46) (cid:72)(cid:46) (cid:81)(cid:107) ”). The tweets were then combined into one dataset and labelled by two Arabicnative speaker volunteers. Before labeling the tweets, the annotators reviewed the list of the collected misinformation.Due to the substantial manual effort involved in labeling these tweets, each tweet in the dataset was labeled by exactlyone annotator. The tweet which contains misinformation were labeled ”1” and others were labeled by ”0”.In total, our misinformation dataset consists of 8,786 Arabic tweets, which contains 36,198 unique words after apply-ing pre-processing. Overall, our labelled misinformation dataset covers significant misleading and inaccurate contentthat were widely circulated among Arabic tweeters during March and April. The number of tweets containing misin-formation in April (709 tweets) was higher than its counterpart in March (602 tweets). Table 4 shows general statisticsabout the dataset. Recall that we consider a (Misinformation) as the tweet that has been labeled by “1” and (Other) asthe tweet that has been labeled by “0” by the annotators. From Table 4, we observe that the dataset is unbalanced; themajority class (Other) has 7,475 more tweets than the minority class (Misinformation). The misinformation data-setis freely accessible on GitHub. . https://github.com/SarahAlqurashi/COVID19-Misinformation-dataset- PREPRINT - J
ANUARY
15, 2021
Keyword English Translation Tracing Date (cid:250)(cid:10)(cid:107)(cid:46) (cid:65)(cid:16)(cid:74)(cid:203)(cid:64) (cid:128)(cid:240)(cid:81)(cid:75)(cid:10)(cid:65) (cid:9)(cid:174)(cid:203)(cid:64)
Coronavirus 2020-01-01 (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187)
Corona 2020-01-01 (cid:9)(cid:224)(cid:65)(cid:235)(cid:240)(cid:240)
Wuhan 2020-01-01 (cid:9)(cid:225)(cid:30)(cid:10)(cid:146)(cid:203)(cid:64)
China 2020-01-01 (cid:250)(cid:10)(cid:230)(cid:17)(cid:132) (cid:9)(cid:174)(cid:16)(cid:75)
Outbreak 2020-01-01 (cid:16)(cid:233)(cid:211)(cid:65)(cid:210)(cid:187)
Mask 2020-01-01 (cid:16)(cid:72)(cid:65)(cid:211)(cid:65)(cid:210)(cid:187)
Masks 2020-01-01 (cid:16)(cid:72)(cid:65)(cid:210)(cid:16)(cid:174)(cid:170)(cid:211)
Sterilizers 2020-01-01 (cid:213)(cid:230)(cid:10) (cid:16)(cid:174)(cid:170)(cid:16)(cid:75)
Sterilization 2020-01-01 (cid:9)(cid:225)(cid:75)(cid:10)(cid:89)(cid:74)(cid:10)(cid:203)(cid:64) (cid:201)(cid:130) (cid:9)(cid:171)
Washing hands 2020-01-01 (cid:250)(cid:10)(cid:205) (cid:9)(cid:81)(cid:9)(cid:30)(cid:214)(cid:207)(cid:64) (cid:200) (cid:9)(cid:81)(cid:170)(cid:203)(cid:64)
Home isolation 2020-02-01 (cid:16)(cid:233)(cid:109)(cid:26)(cid:13)(cid:39)(cid:65)(cid:103)(cid:46)
Pandemic 2020-01-22 (cid:90)(cid:65)(cid:75)(cid:46) (cid:240)
Epidemic 2020-01-22 (cid:250)(cid:10)(cid:205) (cid:9)(cid:81)(cid:9)(cid:30)(cid:214)(cid:207)(cid:64) (cid:81)(cid:106)(cid:46) (cid:109)(cid:204)(cid:39)(cid:64)
Home quarantine 2020-02-01 (cid:49)(cid:57) (cid:89)(cid:74)(cid:10) (cid:9)(cid:175)(cid:241)(cid:187)
COVID 19 2020-03-01 (cid:200)(cid:241)(cid:106)(cid:46) (cid:16)(cid:74)(cid:203)(cid:64) (cid:81) (cid:9)(cid:162)(cid:107)
Curfew 2020-03-15 (cid:250)(cid:10)(cid:171)(cid:65)(cid:210)(cid:16)(cid:74)(cid:107)(cid:46) (cid:66)(cid:64) (cid:89)(cid:171)(cid:65)(cid:74)(cid:46)(cid:16)(cid:74)(cid:203)(cid:64)
Social distancing 2020-04-01 (cid:250)(cid:10)(cid:171)(cid:65)(cid:9)(cid:74)(cid:146)(cid:203)(cid:64) (cid:129) (cid:9)(cid:174)(cid:9)(cid:74)(cid:16)(cid:74)(cid:203)(cid:64) (cid:9)(cid:80)(cid:65)(cid:234)(cid:107)(cid:46)
Ventilator 2020-04-01 (cid:129) (cid:9)(cid:174)(cid:9)(cid:74)(cid:16)(cid:75) (cid:16)(cid:135)(cid:74)(cid:10) (cid:9)(cid:147)
Shortness of breath 2020-04-01 (cid:16)(cid:233)(cid:109)(cid:187)
Cough 2020-04-01 (cid:232)(cid:80)(cid:64)(cid:81)(cid:107) temperature 2020-04-01 (cid:145)(cid:9)(cid:29)(cid:240) (cid:81)(cid:16)(cid:30)(cid:211)
One and a half meters 2020-04-01 (cid:81)(cid:106)(cid:46) (cid:109)(cid:204)(cid:39)(cid:64) (cid:16)(cid:72)(cid:65)(cid:74)(cid:10)(cid:203)(cid:65)(cid:170) (cid:9)(cid:175)
Quarantine activities 2020-04-01 (cid:250)(cid:10)(cid:106)(cid:146)(cid:203)(cid:64) (cid:81)(cid:106)(cid:46) (cid:109)(cid:204)(cid:39)(cid:64)
Quarantine 2020-04-01 (cid:65)(cid:75)(cid:10)(cid:80)(cid:67)(cid:214)(cid:207)(cid:64) (cid:90)(cid:64)(cid:240)(cid:88)
Malaria medicine 2020-04-25 (cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:74)(cid:10)(cid:130)(cid:29)(cid:10)(cid:89)(cid:211)(cid:80)
Remdesivir 2020-04-25 (cid:81) (cid:9)(cid:162)(cid:109)(cid:204)(cid:39)(cid:64) (cid:169) (cid:9)(cid:175)(cid:80)
Curfew lift 2020-04-26 (cid:249)(cid:10) (cid:13)(cid:75) (cid:9)(cid:81)(cid:109)(cid:46)(cid:204)(cid:39)(cid:64) (cid:81) (cid:9)(cid:162)(cid:109)(cid:204)(cid:39)(cid:64)
Partial curfew 2020-04-26 (cid:249)(cid:10) (cid:170)(cid:210)(cid:16)(cid:74)(cid:106)(cid:46) (cid:214)(cid:207)(cid:64) (cid:105)(cid:130)(cid:214)(cid:207)(cid:64)
Active surveillance 2020-04-29 (cid:161) (cid:17)(cid:130) (cid:9)(cid:28)(cid:203)(cid:64) (cid:145)(cid:106) (cid:9)(cid:174)(cid:203)(cid:64)
Active testing 2020-04-29
Table 2: The list of keywords that we used to collect the tweets.
This step involves transforming the pre-processed tweets texts into the feature vector where we construct the featurevector for each tweet from a vectorization or word embedding. In that sense, from tokenized vectors of words, webuild the feature vectors using TF-IDF [44] and word embeddings techniques [45, 46]. The concept behind each ofthem is briefly explained as follows:•
Sparse Vector Based on TF-IDF:
In this representation, the importance of a term/(n-gram) in a tweet isevaluated in relation to the whole dataset. This representation method gives high weights to terms that arespecific to some tweets and decreases the weight of frequently occurring words in the whole dataset. It iscomposed of term frequency (TF) and inverse document frequency (IDF) and it is computed using Equation1:
T F − IDF ( W i j ) = T F i x log( NDF i ) (1)7 PREPRINT - J
ANUARY
15, 2021
Misinformation headline Tweet examples
Eating garlic protectsagainst coronavirus Ar. (cid:9)(cid:224)(cid:65)(cid:130)(cid:9)(cid:29)(cid:66)(cid:64) (cid:213)(cid:230)(cid:132)(cid:107)(cid:46) (cid:250)(cid:10)(cid:9)(cid:175) (cid:233)(cid:171)(cid:65)(cid:9)(cid:74)(cid:214)(cid:207)(cid:64) (cid:233)(cid:75)(cid:10)(cid:241)(cid:16)(cid:174)(cid:16)(cid:74)(cid:203) (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:128)(cid:240)(cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:75)(cid:46) (cid:233)(cid:75)(cid:46) (cid:65)(cid:147)(cid:66)(cid:64) (cid:169)(cid:9)(cid:74)(cid:211) (cid:250)(cid:206)(cid:171) (cid:208)(cid:241)(cid:17)(cid:74)(cid:203)(cid:64) (cid:200)(cid:240)(cid:65)(cid:9)(cid:74)(cid:16)(cid:75) (cid:89)(cid:171)(cid:65)(cid:130)(cid:29)(cid:10)
En. Eating garlic helps prevent coronavirus infection, and to strengthen the body’s immunity.Gargling with salt protectsagainst corona virus Ar. (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:189)(cid:16)(cid:74)(cid:75)(cid:10)(cid:65)(cid:210)(cid:109)(cid:204) (cid:233)(cid:162)(cid:74)(cid:10)(cid:130)(cid:29)(cid:46) (cid:16)(cid:134)(cid:81)(cid:163) (cid:105)(cid:202)(cid:214)(cid:207)(cid:64)(cid:240) (cid:13)(cid:250)(cid:9)(cid:175)(cid:64)(cid:89)(cid:203)(cid:64) (cid:90)(cid:65)(cid:214)(cid:207)(cid:65)(cid:75)(cid:46) (cid:232)(cid:81) (cid:9)(cid:171)(cid:81) (cid:9)(cid:170)(cid:203)(cid:64)(cid:240) (cid:129)(cid:210) (cid:17)(cid:130)(cid:202)(cid:203) (cid:9)(cid:144)(cid:81)(cid:170)(cid:16)(cid:74)(cid:203)(cid:64)
En. Exposure to the sun and gargling with warm water and salt are simple ways to protect you from coronavirusCoronavirus is saringas Ar. (cid:116)(cid:26)(cid:39)(cid:10)(cid:80)(cid:65)(cid:16)(cid:75) (cid:250)(cid:10)(cid:230)(cid:238) (cid:16)(cid:68)(cid:9)(cid:74)(cid:16)(cid:28)(cid:131) (cid:241)(cid:109)(cid:46)(cid:204)(cid:39)(cid:65)(cid:75)(cid:46) (cid:232)(cid:64)(cid:65)(cid:16)(cid:174)(cid:75)(cid:46) (cid:232)(cid:81)(cid:16)(cid:30)(cid:9)(cid:175) (cid:90)(cid:65) (cid:9)(cid:146)(cid:16)(cid:174)(cid:9)(cid:75)(cid:64) (cid:89)(cid:171)(cid:241)(cid:211)(cid:240) (cid:81)(cid:238)(cid:68)(cid:17)(cid:133)(cid:64) (cid:90)(cid:64)(cid:241)(cid:234)(cid:203)(cid:65)(cid:75)(cid:46) (cid:232)(cid:80)(cid:65) (cid:17)(cid:130)(cid:16)(cid:28)(cid:9)(cid:75)(cid:64) (cid:232)(cid:89)(cid:211) (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:129)(cid:28)(cid:10)(cid:203)(cid:240) (cid:9)(cid:225)(cid:75)(cid:10)(cid:80)(cid:65)(cid:130)(cid:203)(cid:64) (cid:9)(cid:80)(cid:65) (cid:9)(cid:171) (cid:241)(cid:109)(cid:46)(cid:204)(cid:39)(cid:65)(cid:75)(cid:46) (cid:81)(cid:229)(cid:17)(cid:132)(cid:16)(cid:74)(cid:9)(cid:74)(cid:214)(cid:207)(cid:64)
En Sarin gas are spread in the atmosphere and not coronavirus, its diffusion in the air is months,Mosquitoes transmitinfection Ar. (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) (cid:128)(cid:240)(cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:203) (cid:9)(cid:144)(cid:241)(cid:170)(cid:74)(cid:46)(cid:203)(cid:64)(cid:240) (cid:233) (cid:9)(cid:174)(cid:74)(cid:10)(cid:203)(cid:66)(cid:64) (cid:16)(cid:72)(cid:65)(cid:9)(cid:75)(cid:64)(cid:241)(cid:74)(cid:10)(cid:109)(cid:204)(cid:39)(cid:64) (cid:201)(cid:16)(cid:174)(cid:9)(cid:75) (cid:16)(cid:73)(cid:74)(cid:46)(cid:17)(cid:28)(cid:75)(cid:10) (cid:233)(cid:9)(cid:74)(cid:28)(cid:10)(cid:16)(cid:74)(cid:211) (cid:80)(cid:88)(cid:65)(cid:146)(cid:211)
En trustable sources transferee pets and mosquitoes to the coronavirus5G Networks Spreadingthe Coronavirus Ar. (cid:209)(cid:234)(cid:211)(cid:65)(cid:130)(cid:107)(cid:46) (cid:64) (cid:250)(cid:10)(cid:9)(cid:175) (cid:9)(cid:225)(cid:30)(cid:10)(cid:106)(cid:46) (cid:130)(cid:187)(cid:240)(cid:66)(cid:64) (cid:145)(cid:16)(cid:174)(cid:9)(cid:75) (cid:250)(cid:10)(cid:9)(cid:175) (cid:73)(cid:46) (cid:28)(cid:46)(cid:130)(cid:16)(cid:29) (cid:65)(cid:220)(cid:216) (cid:233)(cid:74)(cid:10)(cid:17)(cid:75)(cid:80)(cid:65)(cid:191) (cid:232)(cid:80)(cid:241)(cid:146)(cid:29)(cid:46) (cid:128)(cid:240)(cid:81)(cid:30)(cid:10) (cid:9)(cid:174)(cid:203)(cid:64) (cid:81)(cid:229)(cid:17)(cid:132)(cid:9)(cid:29) (cid:250)(cid:10)(cid:9)(cid:175) (cid:16)(cid:72)(cid:89)(cid:171)(cid:65)(cid:131) (cid:53) (cid:16)(cid:233)(cid:186)(cid:74)(cid:46) (cid:17)(cid:131) (cid:233)(cid:74)(cid:10)(cid:9)(cid:74)(cid:28)(cid:10)(cid:146)(cid:203)(cid:64) (cid:9)(cid:224)(cid:65)(cid:235)(cid:240)(cid:240) (cid:250)(cid:10)(cid:9)(cid:175)(cid:209)(cid:234) (cid:9)(cid:174)(cid:16)(cid:75)(cid:64)(cid:241)(cid:235) (cid:250)(cid:10)(cid:9)(cid:175) (cid:51) (cid:16)(cid:233)(cid:186)(cid:74)(cid:46) (cid:17)(cid:131) (cid:208)(cid:64)(cid:89) (cid:9)(cid:106)(cid:16)(cid:74)(cid:131)(cid:66) (cid:64)(cid:241)(cid:170)(cid:107)(cid:46) (cid:81)(cid:9)(cid:175) (cid:189)(cid:203) (cid:9)(cid:88) (cid:9)(cid:225)(cid:30)(cid:10)(cid:146)(cid:203)(cid:64) (cid:16)(cid:73)(cid:187)(cid:80)(cid:88)(cid:64) (cid:168)(cid:80)(cid:64)(cid:241) (cid:17)(cid:130)(cid:203)(cid:64) (cid:250)(cid:10)(cid:9)(cid:175)
En In China Wuhan, Network 5G helped spread the virus in a catastrophic manner, causing a lack of oxygen in Humanbodies and falling onto the streets. China realized this, so they returned to using Network 3G in their phones
Table 3: Examples of Misinformation in tweetsMisinformation Other Total
T F i is the number of occurrences of i in j , N is the total number of tweets, and DF i is the numberof tweets containing the word i . We constructed TF-IDF vectors twice; once with unigrams and another withN-grams. We obtained for each tweet a sparse vector of dimension 5000 with both unigrams and N-gramsand used the Scikit-learn tool for the implementation.• Word Embeddings Creation:
In natural language processing, word embedding refer to the techniques usedin mapping words or phrases to vectors of real numbers. Word embedding methods represent words ascontinuous vectors in a low dimensional space. These vectors capture semantic information between words;the words with similar meaning will have vectors closer to each other.Building word embedding model using a large-scale training dataset is important to obtain meaningful em-beddings [47]. We built a word vectors model exploiting our whole COVID-19 dataset collected from January2020 to April 2020. By removing retweets and duplicated tweets, we ended with 2,821,940 tweets. We con-sider two noticeable word embeddings generation methods: word2vec, and F
AST T EXT . To train our models,we adopt the pre-processing pipeline of [48]. We investigate the following two types of word embeddings inthis work: – word2vec [45]: This is probably the most widely used technique to learn word embeddings utilizinga shallow feed forward neural network. To build word2vec model, we take into consideration themaximum length of a tweet is 280 characters, hence, we use small context window sizes W = 3 .The model trained using the CBOW algorithm with dimension D = 200 . For the rest of parameters,we set the batch size to , negative sampling to , minimum word frequency to and iterations to . – F AST T EXT [49]: in F
AST T EXT , the smallest unit is character-level n-grams and each word consistsof a bag of character n-grams. This representation helps capture the meaning of shorter words andprovides extraction of all prefixes and suffixes of a given word. For this reason, F
AST T EXT has beenshown to be more accurate and effective comparing to word2vec [50]. To train F
AST T EXT model,we used a small window of size with a dimension of . We set minimum word frequency to and iterations to .It is worth noting that very few studies have been developed F AST T EXT models for the Arabiclanguage.The general word embeddings model such as the F
AST T EXT model developed by [46]was trained on the Arabic Wikipedia Articles written in Modern Standard Arabic (MSA), henceemploying such a model will not perform well on Twitter datasets. In the literature, the Arabicmisinformation studies that employed F
AST T EXT model, used F
AST T EXT model in an unsupervisedmanner to produce feature vectors in [39] while it was used in a supervised manner to predict theclass labels in [38].However, there is a lack of detail about the training set sizes and parameters usedin training the word embeddings model used in [39].A recent study showed that unsupervised pre-training F
AST T EXT on domain-specific can improve the classification quality over the supervised8
PREPRINT - J
ANUARY
15, 2021one, particularly when the dataset labels are limited[51]. Hence, we opted to employ the F
AST T EXT method where models are pre-trained through unsupervised training in our classification models.We use the Gensim [52] implementation for the word2vec and F
AST T EXT tools and we use the scheme thatwas used by [53] to build the tweet-level representation for the machine learning models. Given the word2vecor
FASTTEX models, we retrieve the vector representation of each word in each tweet, by averaging the wordvectors of all words per tweets, as follow: V tweet vec = (cid:80) ni =1 W i n (2)where n is the number of words in the tweet, and W i is the word2vec embedding for the i word. Thisrepresentation retains the number of dimensions ( D = 200 ) in the word embedding models. The wordembeddings models used in our work is made freely available . To automatically predict COVID-19 misinformation in the Arabic Twitter, we used different types of classifiers. Inthis section, we will present them in detail. The first type of classifiers includes traditional (e.g., not deep) classifierswhich are: support vector machine (SVM), multinomial naive Bayes (NB), Extreme Gradient Boosting (XGBoost),Random forest (RF), and Stochastic Gradient Descent (SGD). We used the implementation of these classifiers fromthe scikit-learn library [54].The second type are deep learning models. The deep learning classifiers involved a convolutional neural network(CNN), Recurrent Neural Networks with bidirectional long short-term memory (RNN BiLSTM), Convolutional Re-current Neural Networks (CRNN). We used the implementations of these classifiers in Pytorch [55]. Each proposeddeep learning model consists of an input embedding layer, hidden layer, a dense output layer, and an activation func-tion. The embedding layer is the first layer of our deep learning models, and it creates a dense vector representationfrom the inputted text sequence. It can be initialized by a pre-train word embedding model or learned while training themodel. We experiment with three types of embedding layers to train the models. In the first experiment, the weightsof the embedding layer are initialized randomly, and it will learn the embedding for all the words in the dataset. Thesecond and third embedding layers are initialized using the weights from pre-trained word2vec and F
AST T EXT . Oncethe embedding layer maps each sequence text into a vector representation, the embedding representation is fed intothe classifiers. The dense output layer takes the number of categories available as its output dimension. We used thesigmoid activation function and cross-entropy loss function. In the following, we describe the models structures .•
Convolutional Neural Network (CNN) : In the CNN model, we use a one-dimensional convolution layerwith a multi-scale kernel 4 and 5 with a fixed length of 100 for each filter dimensionality. The kernel sizedefines the number of words to consider as the convolution passes over the word vector resulting in differentn-grams. The application of a convolution operation using one filter window over the word vector producesa new features map. After each convolution operation, we apply a nonlinear transformation using a RectifiedLinear Unit (ReLU) [56]. The convolved result is pooled using the maximum pooling operation to capturethe text’s most relevant features. Then all feature maps are concatenated in one single vector with a fixedlength. Finally, we feed this vector through a fully-connected layer with a 0.5 dropout rate.•
Recurrent Neural Networks (RNN) : The RNN model consists of one bi-directional LSTM layer. The bi-directional LSTM train two LSTMs on the input sequence. The first one examines the input sequence inforward order, and the second one examines the input sequence backward and then combines the informationfrom both ends to derive a single representation. This helps in learning a better feature representation andcapturing more sequential patterns from both directions. The bi-directional LSTM layer is followed by adropout layer and a fully connected layer.•
Convolutional Recurrent Neural Networks (CRNN) : For the final model, we used a combination of one-dimensional convolution layer and five bi-directional LSTM layers to create CRNNs (Convolutional Recur-rent Neural Networks). The model uses a multi-scale convolutional layer with a kernel of 4 and 5 to extractmultiple map features from the input text. Each kernel has a fixed length of 100. We apply a nonlineartransformation using a Rectified Linear Unit (ReLU) [56] to each feature map. Then, the max-pooling layerpools them separately to extract essential text features. Then the extracted features are concatenated and fedas input to bi-LSTM layers. The bi-LSTM extracted the text features, and the output of bi-LSTM is fed to afully connected layer. available at: https://github.com/BatoolHamawi/COVID-19WordEmbeddings PREPRINT - J
ANUARY
15, 2021Deep learning models have some advantages. For example, CNN automatically selects relevant words in tweets whilethe RNN-BiLSTM network captures the word patterns in tweets in two directions from right to left and vice versa and,unlike CNN, can manage the different lengths of tweets. The CRNN model combines the benefits of both networks.
When dealing with imbalanced classification tasks, it is natural for the classifier to get biased toward the majority class.One of the most used techniques to solve the imbalance issue is to change the evaluation metric to a metric that tellsa more truthful story. Therefore, when evaluating the models performance, we report different metrics including theArea Under the ROC Curve (AUC), precision, recall, and F1. The definition of these measurements is briefly outlinedas follows:
The Area Under the ROC Curve (AUC) : indicates the classifiers’ ability to distinguish between classes through theprobability curve ( ROC ). The AUC is defined as follows:
AU C = (cid:88) i ∈ ( T P + F P + F N + T N ) ( T P R i + T P R i − ) . ( F P R i + F P R i − )2 (3) Precision : represent the percentage of positively classified tweets that actually correct. The precision is mathemati-cally expressed as follows:
P recision = T PT P + F P (4)
Recall : indicates the ability of the classifiers to classify all positive instances correctly. The recall is mathematicallyexpressed as follows:
Recall = T PT P + F N (5)Where
T P is the number of correctly identified tweets as misinformation,
F P is the number of incorrectly identifiedtweets as misinformation,
T N is the number of correctly identified tweets as not misinformation, and
F N the numberof incorrectly identified tweets as not misinformation.
F1 score : indicates the weighted harmonic mean of both precision and recall. The F1 is mathematically expressed asfollows: F score = 2 P recision · RecallP recision + Recall (6)
First, we shuffled the data to ensure that the model is not affected by order of the data. We randomly split the sampleof 8786 annotated tweets into training and testing sets (80:20 splits) for the traditional classifiers and for the deeplearning classifiers the sample was randomly split into training, testing, and validation sets (60:20:20 splits).
Classifiers Hyper-parameters
RF Classifier Criterion:entropy, max depth:8, max features:log2, n estimators:500,class weight:balancedXGB Classifier Colsample bytree:0.8, gamma:2, max depth:5, min child weight:1, subsample:1.0NB Classifier Alpha= 0.5, fit prior= TrueSGD Classifie Alpha:0.0056, l1 ratio:0.13, loss: modified huber, penalty:l2,max iter:6000 , class weight:balancedSVC Classifier C:1, gamma:1, kernel:rbf, probability:True , class weight:balancedTable 5: Hyper-parameter Settings for Traditional Classifiers10
PREPRINT - J
ANUARY
15, 2021
Classifiers Hyper-parametersCross Entropy Loss :
CNN max feature :3194 , max feature(F
AST T EXT ): 26275 , max feature(word2vec): 247180 ,dropout : 0.5 , kernel size: 4 & 5 ,filter: 100,epochs:500 , batch size: 32 , embedding size :200RNN max feature :3194 , max feature(F
AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 1 , dropout : 0.5 , epochs:500 ,batch size: 32 , embedding size :200CRNN max feature :3194 , max feature(F
AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 5 , dropout : 0.5 ,kernel size: 4 & 5 ,filter: 100, epochs:500 ,batch size: 32 , embedding size :200
AUCPR Loss :
CNN max feature(F
AST T EXT ): 26275 , max feature(word2vec ): 247180 ,dropout : 0.5 , kernel size: 4 & 5 ,filter: 100,epochs:600 , batch size: 32 , embedding size :200RNN max feature(F
AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 1 , dropout : 0.5 , epochs:600 ,batch size: 32 , embedding size :200CRNN max feature(F
AST T EXT ):26275 , max feature(word2vec ):247180 ,number of hidden node : 5 , dropout : 0.5 ,kernel size: 4 & 5 ,filter: 100, epochs:600 ,batch size: 32 , embedding size :200
Table 6: Hyper-parameter Settings for Deep Learning Classifiers.
Since our dataset is imbalanced, we constructed a grid search to find the best hyper-parameters and maximize the AUCscore. In the grid-search function, we chose AUC as the scoring parameter with 5-fold cross-validation. The modeltrains in each fold with all training data using all parameter combinations. To find the optimum parameter for the fold,each trained model is evaluated on the validation set. Then the trained model with the optimum parameters is usedon the test set. This procedure is repeated until the model that maximizes the AUC score is found. Table 5 show thehyper-parameter settings for traditional classifiers.
Classifiers FeaturesTF-IDF Word Level TF-IDF NgramAccuracy AUC Score Precision Recall F1 Score Accuracy AUC Score Precision Recall F1 ScoreRF Classifier 80.7% 80.3% 0.37 0.57 0.45 78.2% 75.8 % 0.32 0.54 0.41XGB Classifier 87.5% 80.6% 0.73 0.16 0.26 87.0% 70.9% 0.73 0.10 0.18NB Classifier 87.9% 80.9% 0.70 0.22 0.34 86.7% 78.1% 0.55 0.24 0.34SGD Classifier 79.9% 83.3% 0.37 0.68 0.48 79.4% 78.9% 0.35 0.54 0.44SVC Classifier 87.8% 82.9% 0.59 0.39 0.47 82.0% 75.8% 0.38 0.50 0.43
Table 7: Traditional Classifier Overall PerformanceWe trained the traditional classifiers using unigram and n-gram TF-IDF feature representations. The n-gram was acombination of bigrams and trigrams. We also report the results based on word2vec and F
AST T EXT embeddingsmethods. Table 7 shows the Accuracy, AUC, Precision, Recall, and F1 measure results of the traditional classifiersfor unigram and n-gram TF-IDF feature representations. The SVM classifier with unigram feature representationsachieved the best accuracy of 87.8%, AUC score of 82.9%, and F-measure of 0.47, while the SGD classifier reached11
PREPRINT - J
ANUARY
15, 2021 (a) (b)
Figure 2: Visualization of the ROC curves of traditional classifiers using (a) word2vec and (b) F
AST T EXT wordembedding techniques.the highest recall 0.68 and the best AUC score of 83.3% with unigram feature representations. For the precision, theXGB classifier achieved the highest precision, 0.73 for both unigram and n-gram. The results indicate that N-gram’ssize influences the accuracy rate with different classifiers. All the classifiers achieved the highest performance whenusing TF-IDF unigram.
Classifiers Featuresword2vec F
AST T EXT
Accuracy AUC Score Precision Recall F1 Score Accuracy AUC Score Precision Recall F1 ScoreRF Classifier 83.3% 83.6% 0.47 0.56 0.51 84.3% 84.3% 0.50 0.53 0.52XGB Classifier 86.2% 85.4% 0.67 0.25 0.37 86.8% 85.4% 0.72 0.27 0.39NB Classifier 74.4% 81.2% 0.35 0.741 0.47 73.4 80.4 0.33 0.69 0.45SGD Classifier 74.0% 81.0% 0.34 0.71 0.46 73.8% 81.4% 0.34 0.74 0.47SVC Classifier 76.6% 84.2% 0.38 0.81 0.52 77.8% 85.3 % 0.40 0.80 0.53
Table 8: Traditional Classifier Overall Performance based on word embedding methodsUsing Both word2vec and F
AST T EXT word embeddings results in slightly higher AUC and F1 scores. The overallAUC is increased by 1 to 5 points, as shown in Table 8. Nevertheless, almost all classifier’s performance improvedwith the trained word embeddings except for the SGD, where the AUC score decreased by 1 to 2 points. The highestAUC Score was generated by XGB classifier using both word embedding methods. However, the XGB classifier withF
AST T EXT performs the best among the traditional classifiers, giving as much as 85.4% AUC Score as well as secondbest precision of 0.72 an F1 score of 0.39, which signifies that the predication by XGB classifier is much better thanall other classifiers. Followed by the SVC classifier with a close AUC Score of 85.3 % and 0.80 recall with highest F1score 0.53 among all classifiers. The ROC curve generated by the traditional classifiers using both word embeddingmethods are shown in Figure 2.
We trained the deep learning classifiers using adam optimizer to learn model parameters with varying learning rates,a batch size of 32, and 500 epochs to optimize the cross-entropy loss. Table 6 shows the hyper-parameter settings forthe deep learning classifiers. We reported the results with and without the pre-trained word embeddings. Without thepre-trained word embeddings, the accuracy of the CNN, RNN, and CRNN, 85.0%, 84.3%, 85.3%, respectfully, andthe AUC score was 50%, 50%, 50%, as shown in Table 9. The deep learning classifiers generalize toward the majorityclass. With the pre-trained word2vec and F
AST T EXT embeddings, the classifiers performance increased. The CRNNhas the best improvement with word2vec embedding as it outperformed all other classifiers with the highest AUC,precision, recall, and F1 score. Whereas for the CNN and RNN have the best improvement with the F
AST T EXT embedding, while the CRNN showed the worse performance with the F
AST T EXT embedding. The most significantimprovement was with F
AST T EXT embedding and the CNN classifier’s AUC score, which improved about 37.8%compared to the previous results. 12
PREPRINT - J
ANUARY
15, 2021 embeding method Classifier Accuracy AUC Score Precision Recall F1 Score
Without pre-trained word embedding CNN 85.0% 50 % 0 0 0RNN 84.3% 50 % 0 0 0CRNN 85.3% 50 % 0 0 0F
AST T EXT
CNN 70% 68.6 % 0.75 0.43 0.54RNN 83.6% 64.3 % 1 0.28 0.44CRNN 85.0% 50 % 0 0 0word2vec CNN 85.7% 57.1 % 1 0.14 0.25RNN 82.9% 57.1 % 1 0.14 0.25CRNN 85.3% 64.3 % 1 0.29 0.44
Table 9: Deep learning Classifier Overall PerformanceTo handle the imbalanced data set and further improve the classifier performance, we conducted a second experiment.We trained the classifiers using AUCPRLoss loss function that optimize for AUC based on [57], they introducedsimple building block bounds that provide a unified framework for efficient, salable optimization of a wide range ofobjectives, including directly optimizing AUC. We used Adam’s optimizer with varying learning rates, a batch sizeof 32, and 600 epochs. Table 9 shows the deep learning classifiers results After AUC optimization. Without the pre-trained word embeddings, the performance of the classifier remarkably improved for all classifiers. The loss functionhad improved the AUC score about 26.8% for all classifiers, and the models were able to detect the minority class. embeding method Classifier Accuracy AUC Score Precision Recall F1 Score
Without pre-trained word embedding CNN 84.2% 64.3 % 1 0.29 0.44RNN 81.6% 64.3 % 1 0.29 0.44CRNN 86.0% 64.3 % 1 0.29 0.44F
AST T EXT
CNN 74.9% 54.3 % 0.50 0.14 0.22RNN 83.5% 64.3 % 1 0.14 0.25CRNN 85.0% 50 % 0 0 0word2vec CNN 68.4% 61.8% 0.67 0.29 0.40RNN 84.5% 57.1 % 1 0.14 0.25CRNN 85.1% 64.3 % 1 0.29 0.44
Table 10: Deep learning Classifier Overall Performance after AUC Score OptimizationUsing the pre-trained word embeddings with AUCPRLoss function seems to improve only some of classifiers. Usingword2vec improve the CNN classifer by 4.7 points. However, further hyper tuning the classifiers parameter mayincrease the performance. Table 10 shows the overall performance for deep learning classifiers after optimizing theAUC. Among the deep learning classifiers, The CNN with F
AST T EXT embeddings with cross-entropy loss achievedthe best performance overall with the highest AUC score, precision, recall, and F1 score.
As a language, Arabic is a rich and complex language that has a vast vocabulary. It is also a highly morphological andderivative language. The complexity increases due to the informal nature of social media texts. There are two mainforms of the Arabic language present on social media: Modern Standard Arabic (MSA) and Dialectical Arabic (DA).Where the MSA is used for formal writing, and DA is used for informal daily communication. Nevertheless, the latteris the most common form.Further complicating matters is that the Arabic language has many different dialects that people use in social media.The different dialects are one of the reasons for introducing many new words into any language, especially stop words[58]. Another challenge is the diacritics used in the Arabic orthographic system. Diacritics are used to representsmall vowels and to clarify the meanings of words. There are thirteen diacritics in the Arabic language [59]. ManyArabic vocabularies has more than one meaning based on diacritics. Such as (cid:73)(cid:46) (cid:130)(cid:107) , which can mean thought orcounting, depending on the context. Furthermore, there are Arabic vocabularies that have different meaning based onthe context, like the word (cid:208)(cid:65)(cid:171) , which can mean public or year based on the context [59]. In most of the Arabic tweets,the text is written without diacritics, where the reader is supposed to understand the purpose of the meaning. However,13
PREPRINT - J
ANUARY
15, 2021 (a) (b)
Figure 3: Visualization of the ROC curves of deep learning after AUC Score classifiers using (a) word2vec and (b)F
AST T EXT word embedding techniques.this does not apply to machines. In addition, the Arabic language contains a lot of grammatical rules that change theshape and meaning of the words.Despite all these challenges, the classifiers showed promising results in distinguishing COVID-19 misinformation inArabic tweets, which means that available machine learning methods can deliver high-performance and promisingclassifiers using an imbalanced dataset of tweets. Compared to deep learning, traditional classifiers have better per-formance with higher AUC values. Based on the experimental results, it is evident that feature selection can be aneffective technique to improve traditional classifiers’ performance.Although the deep learning models are biased toward the minority class, the models’ performance can increase usingpre-train word embedding or by optimizing the AUC score. Using pre-trained word embedding on the disease-specificdataset can be more accurate than using other generic pre-trained word embedding in detecting health misinformation[60]. As shown by the results, all classifiers perform better with the help of pre-train embedding than without it.The F
AST T EXT word embedding improved all classifiers’ performance except for CRNN and NB, While word2vecimprove the results for CRNN. The key difference between word2vec and F
AST T EXT is that during the learning phase,F
AST T EXT tackles each word as composed of character n-grams whereas word2vec tackles words as the smallest unit.Arabic is a morphological rich language, in addition, the social media posts such as tweets usually include informalwriting and could be written with a misspelling which creates ambiguity. For example, the word corona : (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:241)(cid:187) couldbe written with several spellings ( (cid:65)(cid:9)(cid:75)(cid:240)(cid:80)(cid:65)(cid:191) , (cid:65)(cid:9)(cid:75)(cid:80)(cid:241)(cid:187) , (cid:65)(cid:9)(cid:75)(cid:240)(cid:81)(cid:186)(cid:203)(cid:64) , (cid:65)(cid:9)(cid:75)(cid:80)(cid:240)(cid:241)(cid:187) , (cid:65)(cid:9)(cid:75)(cid:240)(cid:240)(cid:81)(cid:187) ). The F AST T EXT trained the embeddings vectorson the subwords units which consider the language morphology [46]. Therefore, F
AST T EXT deals with mispelling andallows learning meaningful representation for rare words while word2vec ignores them. For this reason, we believethe F
AST T EXT tweet vectors representation is better compared to the word2vec representation. The Area underthe ROC Curve (AUC) measures how well a machine learning model distinguishes between positive and negativetweets. A few studies have shown that optimizing the AUC is extremely useful for evaluating the classifier whenclass distributions are heavily imbalanced [61, 62]. Our results confirm that maximizing the AUC score improvessome of classifiers’ results for imbalanced datasets. It increases the ability of the classifiers to recognizes the minorityclasses. Nevertheless, optimizing for AUC is difficult because it requires dataset sorting, which makes it relativelyexpensive. As well, AUC is not continuous in the training set and, hence, most studies optimize for a variant of AUCthat is differentiable. Many methods have been developed that directly optimize the AUC during the training of theclassifiers [63, 64]. Future studies are needed to investigate the impact of different AUC optimization techniques fordetecting the misinformation.While the proposed dataset covers a diverse range of misinformation content, one limitation is that our work havebeen limited to tweets disseminated during March and April 2020. This is largely motivated by the fact that mostof the Arabic-speaking countries reported their first confirmed cases of COVID-19 during March, 2020. Due to thelack of proper awareness and knowledge among people, the false information mostly spread at the early stage of thepandemic. Experimenting with a larger datasets spanning a longer duration (e.g., 5 months) would be useful to extendour work and validate it. 14
PREPRINT - J
ANUARY
15, 2021 (a) learning curves for XGB classifier (b) learning curves for SVC classifier
Figure 4: learning curves for the classifiers
With the increase use of social media as a primary source of information, the distinction between correct and mis-leading information becomes very difficult and critical, especially during the ongoing COVID-19 pandemic. Manyintervention strategies for COVID-19 depend on the quality and reliability of information shared between people. Sev-eral features in social media facilitate the spread of inaccurate information among users worldwide. Identifying andcombating misinformation is therefore a critical task during pandemics.In this work, we conducted an extensive experiment using real misinformation content from Twitter. We examineddifferent machine learning classifiers to identify Arabic misinformation related to COVID-19 automatically using anannotated dataset of 8786 tweets and employed word2vec and F
AST T EXT .Our results show that using word embeddings will indeed enhance the performance of the classifier. F
AST T EXT produces better results with traditional classifiers and the CNN, while word2vec allows for better results with thedeep learning classifiers. Optimizing the AUC score improved the classifiers’ performance and the ability to handleimbalanced datasets. The XGB classifier were shown to be capable of accurately identifying Arabic misinformationbased solely on a tweet’s text and it outperforms all other classifiers in terms of AUC, precision, recall, and F1. In theforeseeable future, we plan to improve deep learning classifiers by stacking multiple layers, and further optimize thehyperparameters, and possibly extend the study to include the very recent adabelief optimizer [65]. Finally, we indeedplan to consider other social networks that will help enriching our dataset and widening its applications.
Acknowledgement
This work was supported by King Abdulaziz City for Science and Technology. Grant Number: 5-20-01-007-0033.
References [1] A. Ghenai and Y. Mejova, “Catching zika fever: Application of crowdsourcing and machine learning for trackinghealth misinformation on twitter,” arXiv preprint arXiv:1707.03778 , 2017.[2] S. O. Oyeyemi, E. Gabarron, and R. Wynn, “Ebola, twitter, and misinformation: a dangerous combination?,”
Bmj
Proceedings of the National Academy of Sciences , vol. 113, no. 3,pp. 554–559, 2016.[5] Peter Dizikes, “study: on twitter, false news travels faster than true stories.” https://news.mit.edu/2018/study-twitter-false-news-travels-faster-true-stories-0308, 2020. (accessed: 2020-10-12).15
PREPRINT - J
ANUARY
15, 2021[6] J. Donovan, “Here’s how social media can combat the coronavirus ‘infodemic’,” 2020.[7] N. Persily and J. A. Tucker,
Social Media and Democracy: The State of the Field, Prospects for Reform . Cam-bridge University Press, 2020.[8] L. Wu, F. Morstatter, K. M. Carley, and H. Liu, “Misinformation in social media: definition, manipulation, anddetection,”
ACM SIGKDD Explorations Newsletter , vol. 21, no. 2, pp. 80–90, 2019.[9] M. R. Islam, S. Liu, X. Wang, and G. Xu, “Deep learning for misinformation detection on online social networks:a survey and new perspectives,”
Social Network Analysis and Mining , vol. 10, no. 1, pp. 1–20, 2020.[10] S. Lewandowsky, U. K. Ecker, C. M. Seifert, N. Schwarz, and J. Cook, “Misinformation and its correction: Con-tinued influence and successful debiasing,”
Psychological science in the public interest , vol. 13, no. 3, pp. 106–131, 2012.[11] M. S. Akhtar, A. Ekbal, S. Narayan, V. Singh, and E. Cambria, “No, that never happened!! investigating rumorson twitter,”
IEEE Intelligent Systems , vol. 33, no. 5, pp. 8–15, 2018.[12] C. Buntain and J. Golbeck, “Automatically identifying fake news in popular twitter threads,” in , pp. 208–215, IEEE, 2017.[13] A. H. Wang, “Don’t follow me: Spam detection in twitter,” in , pp. 1–10, IEEE, 2010.[14] R. M. B. Al-Eidan, H. S. Al-Khalifa, and A. S. Al-Salman, “Measuring the credibility of arabic text contentin twitter,” in , pp. 285–291,IEEE, 2010.[15] N. Y. Hassan, W. H. Gomaa, G. A. Khoriba, and M. H. Haggag, “Supervised learning approach for twittercredibility detection,” in ,pp. 196–201, IEEE, 2018.[16] G. Jardaneh, H. Abdelhaq, M. Buzz, and D. Johnson, “Classifying arabic tweets based on credibility usingcontent and user features,” in , pp. 596–601, IEEE, 2019.[17] R. El Ballouli, W. El-Hajj, A. Ghandour, S. Elbassuoni, H. Hajj, and K. Shaban, “Cat: Credibility analysis ofarabic content on twitter,” in
Proceedings of the Third Arabic Natural Language Processing Workshop , pp. 62–71, 2017.[18] S. F. SABBEH and S. Y. BAATWAH, “Arabic news credibility on twitter: An enhanced model using hybridfeatures.,”
Journal of Theoretical & Applied Information Technology , vol. 96, no. 8, 2018.[19] R. Mouty and A. Gazdar, “The effect of the similarity between the two names of twitter users on the credibilityof their publications,” in , pp. 196–201, IEEE,2019.[20] N. Hassan, W. Gomaa, G. Khoriba, and M. Haggag, “Credibility detection in twitter using word n-gram analysisand supervised machine learning techniques,”
Int. J. Intell. Eng. Syst , vol. 13, pp. 291–300, 2020.[21] S. M. Alzanin and A. M. Azmi, “Rumor detection in arabic tweets using semi-supervised and unsupervisedexpectation–maximization,”
Knowledge-Based Systems , vol. 185, p. 104945, 2019.[22] F. Saeed, M. Al-Sarem, E. A. Hezzam, and W. M. Yafooz, “Detecting health-related rumors on twitter usingmachine learning methods,”[23] Y. Leng, Y. Zhai, S. Sun, Y. Wu, J. Selzer, S. Strover, J. Fensel, A. Pentland, and Y. Ding, “Analysis of mis-information during the covid-19 outbreak in china: cultural, social and political entanglements,” arXiv preprintarXiv:2005.10414 , 2020.[24] J. C. M. Serrano, O. Papakyriakopoulos, and S. Hegelich, “Nlp-based feature extraction for the detection ofcovid-19 misinformation videos on youtube,” in
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL2020 , 2020.[25] L. H. X. Ng and L. J. Yuan, “Is this pofma? analysing public opinion and misinformation in a covid-19 telegramgroup chat,” arXiv preprint arXiv:2010.10113 , 2020.[26] L. Singh, S. Bansal, L. Bode, C. Budak, G. Chi, K. Kawintiranon, C. Padden, R. Vanarsdall, E. Vraga,and Y. Wang, “A first look at covid-19 information and misinformation sharing on twitter,” arXiv preprintarXiv:2003.13907 , 2020. 16
PREPRINT - J
ANUARY
15, 2021[27] A. Mourad, A. Srour, H. Harmanani, C. Jenainatiy, and M. Arafeh, “Critical impact of social networks info-demic on defeating coronavirus covid-19 pandemic: Twitter-based study and research directions,” arXiv preprintarXiv:2005.08820 , 2020.[28] R. J. Medford, S. N. Saleh, A. Sumarsono, T. M. Perl, and C. U. Lehmann, “An” infodemic”: Leveraging high-volume twitter data to understand public sentiment for the covid-19 outbreak,” medRxiv , 2020.[29] C. M. Pulido, B. Villarejo-Carballido, G. Redondo-Sama, and A. G´omez, “Covid-19 infodemic: Moreretweets for science-based information on coronavirus than for false information,”
International Sociology ,p. 0268580920914755, 2020.[30] G. K. Shahi, A. Dirkson, and T. A. Majchrzak, “An exploratory study of covid-19 misinformation on twitter,” arXiv preprint arXiv:2005.05710 , 2020.[31] L. McQuillan, E. McAweeney, A. Bargar, and A. Ruch, “Cultural convergence: Insights into the behavior ofmisinformation networks on twitter,” arXiv preprint arXiv:2007.03443 , 2020.[32] G. Caldarelli, R. De Nicola, M. Petrocchi, M. Pratelli, and F. Saracco, “Analysis of online misinformation duringthe peak of the covid-19 pandemics in italy,” arXiv preprint arXiv:2010.01913 , 2020.[33] T. Graham, A. Bruns, G. Zhu, and R. Campbell, “Like a virus: The coordinated spread of coronavirus disinfor-mation,” 2020.[34] E. Ferrara, “What types of covid-19 conspiracies are populated by twitter bots?,”
First Monday , 2020.[35] K. Ding, K. Shu, Y. Li, A. Bhattacharjee, and H. Liu, “Challenges in combating covid-19 infodemic–data, tools,and ethics,” arXiv preprint arXiv:2005.13691 , 2020.[36] M. K. Elhadad, K. F. Li, and F. Gebali, “Detecting misleading information on covid-19,”
IEEE Access , vol. 8,pp. 165201–165215, 2020.[37] M. S. Al-Rakhami and A. M. Al-Amri, “Lies kill, facts save: Detecting covid-19 misinformation in twitter,”
IEEE Access , vol. 8, pp. 155961–155970, 2020.[38] F. Alam, F. Dalvi, S. Shaar, N. Durrani, H. Mubarak, A. Nikolov, G. D. S. Martino, A. Abdelali, H. Sajjad,K. Darwish, et al. , “Fighting the covid-19 infodemic in social media: A holistic perspective and a call to arms,” arXiv preprint arXiv:2007.07996 , 2020.[39] L. Alsudias and P. Rayson, “Covid-19 and arabic twitter: How can arab world governments and public healthorganizations learn from social media?,” in
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020 ,2020.[40] H. Mubarak and S. Hassan, “Arcorona: Analyzing arabic tweets in the early days of coronavirus (covid-19)pandemic,” arXiv preprint arXiv:2012.01462 , 2020.[41] , “Textblob: Simplified text processing.” https://textblob.readthedocs.io/en/dev/, 2020. (accessed: 2020-10-12).[42] , “farasapy.” https://pypi.org/project/farasapy/, 2020. (accessed: 2020-10-12).[43] S. Alqurashi, A. Alashaikh, and E. Alanazi, “Identifying information superspreaders of covid-19 from arabictweets,” , 2020.[44] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,”
Information processing &management , vol. 24, no. 5, pp. 513–523, 1988.[45] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781 , 2013.[46] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,”
Trans-actions of the Association for Computational Linguistics , vol. 5, pp. 135–146, 2017.[47] M. Elrazzaz, S. Elbassuoni, K. Shaban, and C. Helwe, “Methodical evaluation of arabic word embeddings,”in
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: ShortPapers) , pp. 454–458, 2017.[48] M. A. Zahran, A. Magooda, A. Y. Mahgoub, H. Raafat, M. Rashwan, and A. Atyia, “Word representations invector space and their applications for arabic,” in
International Conference on Intelligent Text Processing andComputational Linguistics , pp. 430–443, Springer, 2015.[49] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. J´egou, and T. Mikolov, “Fasttext. zip: Compressing textclassification models,” arXiv preprint arXiv:1612.03651 , 2016.[50] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprintarXiv:1607.01759 , 2016. 17
PREPRINT - J
ANUARY
15, 2021[51] A. Agibetov, K. Blagec, H. Xu, and M. Samwald, “Fast and scalable neural embedding models for biomedicalsentence classification,”
BMC bioinformatics , vol. 19, no. 1, p. 541, 2018.[52] R. ˇReh˚uˇrek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in
Proceedings of theLREC 2010 Workshop on New Challenges for NLP Frameworks , (Valletta, Malta), pp. 45–50, ELRA, May 2010.[53] X. Yang, C. Macdonald, and I. Ounis, “Using word embeddings in twitter election classification,”
InformationRetrieval Journal , vol. 21, no. 2-3, pp. 183–207, 2018.[54] G. Varoquaux, L. Buitinck, G. Louppe, O. Grisel, F. Pedregosa, and A. Mueller, “Scikit-learn: Machine learningwithout learning the machinery,”
GetMobile: Mobile Computing and Communications , vol. 19, no. 1, pp. 29–33,2015.[55] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. , “Pytorch: An imperative style, high-performance deep learning library,” in
Advances in neural informationprocessing systems , pp. 8026–8037, 2019.[56] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in
ICML , 2010.[57] E. Eban, M. Schain, A. Mackey, A. Gordon, R. Rifkin, and G. Elidan, “Scalable learning of non-decomposableobjectives,” in
Artificial intelligence and statistics , pp. 832–840, PMLR, 2017.[58] F. R. Alharbi and M. B. Khan, “Identifying comparative opinions in arabic text in social media using machinelearning techniques,”
SN Applied Sciences , vol. 1, no. 3, p. 213, 2019.[59] H. A. Almuzaini and A. M. Azmi, “Impact of stemming and word embedding on deep learning-based arabic textcategorization,”
IEEE Access , vol. 8, pp. 127913–127928, 2020.[60] A. Khatua, A. Khatua, and E. Cambria, “A tale of two epidemics: Contextual word2vec for classifying twitterstreams during outbreaks,”
Information Processing & Management , vol. 56, no. 1, pp. 247–257, 2019.[61] A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,”
Patternrecognition , vol. 30, no. 7, pp. 1145–1159, 1997.[62] J. Sulam, R. Ben-Ari, and P. Kisilev, “Maximizing auc with deep learning for classification of imbalanced mam-mogram datasets.,” in
VCBM , pp. 131–135, 2017.[63] Q. Wang and A. Guo, “An efficient variance estimator of auc and its applications to binary classification,”
Statis-tics in Medicine , vol. 39, no. 28, pp. 4281–4300, 2020.[64] M. Liu, Z. Yuan, Y. Ying, and T. Yang, “Stochastic auc maximization with deep neural networks,” arXiv preprintarXiv:1908.10831 , 2019.[65] J. Zhuang, T. Tang, S. Tatikonda, N. Dvornek, Y. Ding, X. Papademetris, and J. S. Duncan, “Adabelief optimizer:Adapting stepsizes by the belief in observed gradients,” arXiv preprint arXiv:2010.07468arXiv preprint arXiv:2010.07468