[PDF] Cross-SEAN: A Cross-Stitch Semi-Supervised Neural Attention Model for COVID-19 Fake News Detection

Abstract

As the COVID-19 pandemic sweeps across the world, it has been accompanied by a tsunami of fake news and misinformation on social media. At the time when reliable information is vital for public health and safety, COVID-19 related fake news has been spreading even faster than the facts. During times such as the COVID-19 pandemic, fake news can not only cause intellectual confusion but can also place lives of people at risk. This calls for an immediate need to contain the spread of such misinformation on social media. We introduce CTF, the first COVID-19 Twitter fake news dataset with labeled genuine and fake tweets. Additionally, we propose Cross-SEAN, a cross-stitch based semi-supervised end-to-end neural attention model, which leverages the large amount of unlabelled data. Cross-SEAN partially generalises to emerging fake news as it learns from relevant external knowledge. We compare Cross-SEAN with seven state-of-the-art fake news detection methods. We observe that it achieves 0.95 F1 Score on CTF, outperforming the best baseline by 9\%. We also develop Chrome-SEAN, a Cross-SEAN based chrome extension for real-time detection of fake tweets.

Full PDF

CCross-SEAN: A Cross-Stitch Semi-Supervised NeuralAttention Model for COVID-19 Fake News Detection

William Scott Paka † , Rachit Bansal (cid:63) , Abhay Kaushik ‡ ,Shubhashis Sengupta (cid:5) , Tanmoy Chakraborty †† IIIT-Delhi, New Delhi, India; (cid:63)

DTU-Delhi, India; ‡ IIT-Kanpur, India; (cid:5)

AccentureLabs, India { william18026, tanmoy } @iiitd.ac.in; rachitbansal [email protected];[email protected]; [email protected] Abstract

CTF , the ﬁrst COVID-19 Twitter fake news dataset withlabelled genuine and fake tweets. Additionally, we propose

Cross-SEAN , a cross-stitch based semi-supervised end-to-end neural attention model, which leverages thelarge amount of unlabelled data. Cross-SEAN partially generalises to emerging fakenews as it learns from relevant external knowledge. We compare Cross-SEAN withseven state-of-the-art fake news detection methods. We observe that it achieves . F1 Score on

CTF , outperforming the best baseline by . We also develop Chrome-SEAN , a Cross-SEAN based chrome extension for real-time detection of fake tweets.

1. Introduction

The increase in accessibility to Internet has dramatically changed the way we com-municate and share ideas. Social media consumption is one of the most popular ac-tivities online. Nowadays, it is a trend to rely on such platforms for news updates.The absence of a veriﬁcation barrier allows misinformation on sites online. Due to the

Preprint submitted to Applied Soft Computing February 19, 2021 a r X i v : . [ c s . C L ] F e b omplexity of the issue, a universal deﬁnition for the term “Fake News” is non-existent.Few of the deﬁnitions used for prior works are as follows: ‘A news article that is in-tentionally and veriﬁably false’ [1, 2] relating to news that are deceptive in nature, ‘Anews article or message published and propagated through media, carrying false infor-mation regardless of the means and motives behind it’ relating to various forms of falsenews and misinformation [3, 4, 5]. Few broader deﬁnitions of survey by Zhou et al. [6]states ‘Fake news is false news’, ‘Fake news is intentionally false news published bya news outlet.’. For our purpose, we deﬁne fake tweet as any tweet with informationwhich contradict the statements released by the governmental health organisations ,and genuine tweets to be the tweets obtained from their ofﬁcial accounts.On 30 January 2020, The World Health Organisation (WHO) has declared COVID-19 to be a Public Health Emergency of International Concern and issued a set of Tem-porary Recommendations. A recent study observed 25% increase in average user socialmedia activity due to the global lockdown [7]. UNESCO stated, “during this coron-avirus pandemic, fake news is putting lives at risk.” Fake news, ranging from the specu-lations around origin of the virus to baseless prevention and cures, is spreading rapidlywithout any valid evidence. WHO has recently declared the spread of COVID-19 re-lated misinformation as an ‘Infodemic’; according to their deﬁnition, “An infodemicis an overabundance of information, both online and ofﬂine. It includes deliberate at-tempts to disseminate wrong information to undermine the public health response andadvance alternative agendas of groups or individuals.” WHO, CDC (Centers for Dis-ease Control and Prevention) and other other government bodies have set up speciﬁcweb pages in order to curb major misconceptions about the virus and to maintain publicawareness. Any single false news that gains enormous traction can negate the signif-icance of a body of veriﬁed facts. When a tweet with misinformation is retweeted byan inﬂuential person or by a veriﬁed account, the marginal impact grows largely. Theanalysis, identiﬁcation, and elimination of fake news thus have become a task of ut-most importance. Therefore, there is an immediate need to detect the fake news and https://en.wikipedia.org/wiki/List_of_health_departments_and_ministries igure 1: An example of Origin, Propagation and Social Context of a popular misinformed Tweet. Theresponses for a tweet with misinformation seem to be coherent to it, and could ultimately spread it wider anddeeper into followers networks. Both the tweet and responses contradict the reliable news source. are also exploited to spread false information. Twitter usually deletes tweets and usersthat are ﬂagged post-veriﬁcation; however, this is not a scalable solution for automatedfake news veriﬁcation.Due to the lockdown and work from home conditions during COVID-19 pandemic,Twitter witnessed a 30% rise in daily average usage. With isolation from the externalworld, users turn to social media platforms for any updates related to the pandemic.Due to uninformed knowledge, users tend to retweet content which may not be totallyaccurate. At the beginning of the pandemic, very limited information is available tothe public on the realities of the virus. Even veriﬁed users such as Elon Musk tweetedstating that “Kids are essentially immune” which provides statistical evidence in whichthere are no infected people below the age 19. Public health experts later releaseda statement debunking his claim. Due to the scarcity of reliable information source,multiple fact checking sites depend on statements released by Public Health bodies.Although few users tweet and retweet false content without any ill-intention, thereexist users who create and spread false news for political gains. Diffusion of faketweets and genuine tweets vary in a pandemic setting such as this [12]. Tweeting apolitical tweet with false information multiple times from several accounts with various4 a) Friend count vs follower count for users tweet-ing genuine and fake tweets (plotted across 500samples for each class). (b) Favourite count vs Retweet count of users post-ing genuine and fake tweets (plotted across 250samples for each class).Figure 2: Correlations between (a) user features and (b) tweet features for genuine and fake tweets. In (b),note that a large number of samples are present close to the origin. trending hashtags, called ‘Hashtag hijacking’ is also observed. Fig. 2(a) shows thecount of favourites and retweets for both genuine and fake tweets, whereas Fig. 2(b)shows the friends and followers count of users posting genuine and fake tweets. We canclearly observe from Fig. 2(a) that genuine tweets tend to have higher favourite countcompared to retweet count whereas the fake tweets tend to have higher retweet count,propagating the false information to a wider range. We can also observe from Fig. 2(b)that users posting genuine content have higher number of friends than followers, andusers posting fake content have higher number of followers than friends – this settingagain allows spread of fake news towards larger audiences through the users postingfake content.The rest of our paper is organised as follows: we discuss related works on fakenews detection and semi-supervised models for text classiﬁcation in Section 2. Section3 describes our four-stage dataset collection and annotation processes, and further anal-ysis on this dataset on various feature aspects are shown in Section 4. The proposedCross-SEAN model architecture and training strategies are introduced in Section 5,while it’s evaluation and a detailed ablation study is shown in Section 6. For real-timeusage of Cross-SEAN, the developed chrome extension, Chrome-SEAN and the user5tudy is described in Section 7. Finally, the paper is concluded with discussions on theshortcomings and future work in Section 8. Our contribution I:

CTF - A COVID-19 fake news dataset and its analysis.

Withthe aforementioned concerns, it is evident that more research is required to detect andneutralise fake tweets and keep the users warned. Although research communities areinterested to work on the challenging task of COVID-19 fake news detection whichis one of the pressing issues of our time, the absence of a publicly available labelledCOVID-19 misinformation dataset is a major bottleneck to design automated detec-tion models. Also, not everyone possesses the resources to collect such a dataset, asit is cumbersome. We ﬁll this gap by introducing

CTF , the ﬁrst COVID-19 Twitterfake news dataset, consisting of a mixture of both labelled and unlabelled tweets. Ourdataset contains a total of . K labelled tweets, among which . K are labelledas ‘genuine’ and . K as ‘fake’. In addition, it contains . M unlabelled tweets,which can be used to enrich the diversity of the dataset, in terms of linguistic and con-textual features in general. A detailed analysis of the dataset unfolds many interestingobservations. E.g., fake news content tends to – (i) accompany less URLs and moremultimedia content, (ii) receive much lesser likes and retweets, (iii) exhibit mostlyneutral and negative sentiment, as compared to genuine content. Our dataset collec-tion is a four stage process, starting from hydration of Tweets, collection of supportingstatements, usage of ﬁne-trained Transformer models such as BERT and RoBERTa, tomanual annotation. As COVID-19 is an emerging topic, we rely on certain governmenthealth organisations and fact checking sites such as PolitiFact, Snopes, TruthOrFiction,etc, which release statements on widely popular misconceptions. We then use tweetson the collected facts using BERT and RoBERTa to identify supporting or contradict-ing claims, which are then partially annotated. The major part of our genuine tweetsare taken from governmental health organisations. Our contribution II: Cross-SEAN.

Two major issues in any fake news detectiontask are the lack of labelled data to train a deep neural model and the inability to detectfake news that are different from the training data (emerging fake news). To addressthese issues, we propose Cross-SEAN, a cross-stitch based semi-supervised attentionneural model. Cross-SEAN works in a semi-supervised way leveraging the vast unla-6elled data to learn the writing style of tweets in general. It considers user metadata,tweet metadata, and external knowledge in addition to tweet text as its inputs. Externalknowledge is collected on the ﬂy in the form of stances close to tweets from trusteddomains and allows a way for Cross-SEAN to not restrict to the train data, as externalknowledge can contain information which is absent in the train data partially helpingwith early detection. When multiple inputs are involved, simple concatenation of lay-ers might undermine few inputs’ signiﬁcance on the model. We employ cross-stitchmechanism which provides a way to ﬁnd the optimal combination of model parametersthat are used to pass the inputs to various sections of the network. Attention mecha-nisms have the ability of ‘attending to’ particular parts of the input when processing thedata, allowing Cross-SEAN to be capable of representing the words which are beingconcentrated on, for a given tweet text.We compare Cross-SEAN with seven state-of-the-art models for fake news detec-tion. Experimental results show that Cross-SEAN achieves . F1 Score on

CTF ,outperforming seven baselines by at least . We show comparative evaluation ofbaselines with Cross-SEAN on various features and present a thorough ablation studyof Cross-SEAN to understand the importance of different features and various compo-nents of the objective function. Our contribution III: Chrome-SEAN.

For easy and real-time usage by Twit-ter users, we ﬁnally introduce a chrome extension, called Chrome-SEAN which usesCross-SEAN to classify a tweet while in the tweet page. To evaluate Chrome-SEAN,we collect feedback from human subjects. We further perform online learning condi-tioned on the feedback and the conﬁdence of model. The extension is deployed andconﬁgured to handle concurrent requests.In summary, our major contributions are four-fold:•

CTF , the ﬁrst labelled COVID-19 misinformation dataset.•

Cross-SEAN , a model to curb COVID-19 fake news on Twitter. It is one of thefew semi-supervised models introduced for the task of fake news detection.• Detailed analyses of the dataset to unfold the underlying patterns of the COVID-19 related fake tweets. 7

Chrome-SEAN , a chrome extension to ﬂag COVID-19 fake news on Twitter.

Reproducibility:

We have made the code and a part of

CTF publicly availableat https://github.com/williamscott701/Cross-SEAN . The com-plete dataset will be made public upon acceptance of the paper. Section 6 describesmore about the settings to reproduce the results.

2. Related Work

As our work revolves around fake news and semi-supervised learning, we presentthe related work in two parts: (i) fake news detection, and (ii) text-based semi-supervisedlearning. Due to the abundance of literature in both these areas, we focus our attentionto those studies which we deem as pertinent to the current work.

Fake news detection:

Fake news or misinformation on social media has gained alot attention due to the exponential usage of social media. Some of early studies triedto detect fake news on the basis of linguistic features of text [13, 14, 15]. A group ofrecent approaches have used temporal linguistic features with recurrent neural network(RNN) [16] and modiﬁed RNN [17, 18] to detect fake news. Hybrid approaches byKwon et al. [19] combined user, linguistic, structural and temporal features for fakenews classiﬁcation. Lately, convolution networks have been adopted along with re-current networks to detect fake news [20, 21]. [22] used the graphical convolutionalnetworks and transformer based encodings for the task of rumor detection of tweets.The authors have leveraged the structural and graphical properties of a tweet’s propa-gation and tweet’s text. Since satire can also lead to spread of misinformation, Rubin etal. [23] proposed a classiﬁcation model using 5 features to identify satire and humournews. Another study focused on detecting fake news using n-gram analysis throughthe lenses of different feature extraction methods [24]. Granik and Mesyura [25] de-tected fake news using Naive Bayes classiﬁer and also suggested potentials avenuesto improve their model. [26] proposed a combination of text mining techniques andsupervised artiﬁcial intelligence algorithms, for the task of fake news detection. Theauthor have shown that the best mean values in terms of accuracy, precision, and F-8easure have been obtained from the Decision Tree algorithm. Apart from textual fea-tures, visual features have also been employed for fake news detection. [27] proposed asimilarity-aware fake news detection method which utilizes the multi-modal data for ef-fective fake news detection. On similar lines, [28] developed a click bait video detectorwhich is another prevalent form of online false content. Despite the success of super-vised models, news spreads on social media at very high speed when an event happens,only very limited labeled data is available in practice for fake news detection. Somestudies such as [29, 30] have been involved around weakly supervised learning for fakenews detection. In similar directions, Yu et al. [31] used constrained semi-supervisedlearning for social media spammer detection, while Guacho et al. [32] used tensor em-beddings to form a semi-supervised model for content based fake news detection. [33]have proposed a two-path deep semi supervised learning for timely detection of fakenews. The authors have veriﬁed their system on two datasets and demonstrated effec-tive fake news detection. [34] have analysed the credible web sources and proposed areality parameter for effective fake news prediction. Many recent studies [35, 36, 6]have provided extensive literary survey by investigating datasets, features and modelsalong with potential future research prospects for fake news detection.

Semi-supervised models for text classiﬁcation:

Semi-supervised learning (SSL)is proved to be powerful for leveraging unlabelled data when we lack the resourcesto create large-scale labelled dataset. Prior research on semi-supervised learning canbroadly be divided into three classes– multi-view, data augmentation and transfer learn-ing [37]. The objective of multi-view approaches is to use multiple views of labelledas well as unlabelled data. Johnson and Zhang [38] obtained multiple views for textcategorisation by learning embedding of small text regions from unlabelled data andintegrating them to a supervised model. Gururangan et al. [39] and Chen et al. [40]leveraged variational autoencoders in the form of sequence-to-sequence modelling ontext classiﬁcation and sequential labelling. Data augmentation approaches involve aug-menting either the features or labels. Nigam et al. [41] classiﬁed the text using a com-bination of Naive Bayes and Expectation Maximisation algorithms and demonstratedsubstantial performance improvements. Miyato et al. [42] utilized adversarial andvirtual adversarial training to the text domain by applying perturbations to the word9mbeddings. Chen et al. [43] introduced MixText that combines labelled, unlabelledand augmented data for the task of text classiﬁcation. They interpolated text in hiddenspace using Mixup [44] to create a large number of augmented training samples. Xieet al. [45] used advanced augmentation methods (RandAugment and back-translation)to effectively noise unlabelled examples. Transfer learning approaches aim to initialisetask-speciﬁc model weights with the help of pre-trained weights on auxiliary tasks. Daiand Le [46] used a sequence autoencoder, which reads the input sequence into a vectorand predicts the input sequence again to use unlabelled data for improving sequencelearning with recurrent networks. Amir and Erik [47] employed a semi-supervisedmodel based on the combined use of random projection scaling, and support vectormachines to perform reasoning on a knowledge base. The authors showed a signiﬁcantimprovement in emotion recognition and polarity detection tasks over then state-of-theart methods. Howard et al. [48] proposed the Universal Language Model Fine-tuning(ULMFiT), which has been proved as an effective transfer learning method for variousNLP tasks. Both studies [48, 46] showed the improvement in the performance of textclassiﬁcation using transfer learning.The most of the aforementioned methods for fake news detection are tested ondatasets with high volume of labelled data. Moreover, when multiple features are con-sidered, their optimal combination is not explored. There is no published work relatedto COVID-19 fake news detection. We strive to address these issues by ﬁrst introduc-ing the novel

CTF dataset and then leveraging the unlabelled data in order to reduce thevast dependency on the labelled data in our proposed Cross-SEAN model. We also em-ploy cross-stitch for optimal combination of inputs into various sections of the modeland show interesting analysis.

3. Dataset Collection and Annotation

In this section, we introduce our novel dataset, called

CTF ( C OVID-19 T witter F ake News). The formation of this dataset underwent four stages mentioned below. Stage 1. Segregating COVID-19 related tweets:

Multiple COVID-19 Twitterdatasets (unlabelled) have recently been made public on Kaggle and other sources;10 able 1: Different attributes including keywords, hashtags, and sources for statements and URLs along withthe respective number of tweets they are responsible for. The table compiles the numeric details of Section3. Here, WHO: World Health Organisation, CDC: Centers for Disease Control, NIH: National Instituteof Health, CPHO: Central Public Health Ofﬁce, PHE: Public Health England, HHS: Human and HealthServices. among them, we used the datasets released by [49], [50], and [51]. Alongside, thereexist a few publicly available datasets containing COVID-19 related tweet IDs beingreleased everyday in chronological order. We collected the tweet IDs from [52] and[53]. Due to the hydrating process (which is time consuming) and the non-existenceof fake tweets (as Twitter deletes them upon identiﬁcation), the tweet IDs did not turnout to be very useful. However, we still considered them in our dataset to learn the lan-guage semantics explained in the subsequent sections. We also collected tweets usingthe Twitter API based on some predeﬁned hashtags (e.g., ‘WHO’, ‘covid19’, ‘wuhan’,‘bioweapon’, etc.). Since the genuineness of news correlates to the credibility of thesource, we collected tweets published by the aforementioned governmental health or-ganisations and gathered their ofﬁcial Twitter IDs. We extracted tweets from theseaccounts and considered them genuine.

Stage 2. Collecting COVID-19 supporting statements:

There exist fact checkingsites which analyse popular news across social media and label them as fake or genuine11ased on veriﬁed sources. We crawled various fact-checking sites such as

Snopes , Poli-tiFact , FactCheck and

TruthOrFiction for content related to COVID-19. We extractedURLs, the content of URLs and their corresponding labels (genuine or fake) from thefact checking websites. To support this data, more genuine URLs were extracted fromthe Twitter accounts of the ofﬁcial health bodies. To increase public awareness aboutany widely accepted misinformation, governmental bodies across the world have setupspeciﬁc web pages , which are also scraped. This stage resulted in a bulk amountof data related to the content and URLs which are known to be fake/genuine and act asthe supporting statements for the next stage.

Stage 3: Filtering genuine and fake tweets:

We assumed that when a fake orgenuine URL is being shared, all the tweets accompanying the URL also belong to thesame class as URLs are generally added in support to the text. Based on this, a totalof . K and . K tweets were labelled as fake and genuine, respectively. Althoughthis assumption may garner some unwanted noise since a tweet might contradict theopinion presented to the referred URL, on manual inspection we found out that thisassumption surprisingly held true for most of the cases, as elaborated in the next sec-tion. In addition, all the tweets posted by governmental health organisations relatedto COVID-19 with speciﬁc hashtags as mentioned above, form a majority of our gen-uine data. This is based on the assumption that such health organisations post contentwhich either curb fake news or are genuine in itself. We gathered K genuine tweetsvia this method. Next, we used the pre-processed tweet texts with two Transformermodels, BERT [54] and RoBERTa [55], to populate the dataset further. Several studies[56, 57, 58] have proposed numerous improved methods for sentiment analysis for theever growing online social data. Basiri et al. [58] achieved state-of-the-art results onlong review and short tweet polarity classiﬁcations. BERT is used to generate embed-dings of both tweet text and the supporting statements collected and cosine distanceis computed with a high threshold of . to label the tweet into genuine or fake based

12n the polarity. This step resulted in . K tweets labelled as fake. For RoBERTa, weused the ﬁne-tuned version on the Stanford Natural Language Inference (SNLI) Cor-pus [59]; this allowed to take in a pair of sentences and check if they are contradicting,neutral or entailing. We formed pairs of tweets and supporting statements to identifygenuine or fake tweets based on contradicting and entailing results. This approach gaveus an extensive set of . K fake tweets. Stage 4: Human annotation:

We performed manual veriﬁcation of a part of , labelled tweets ( , fake, , genuine) obtained from Stage 3. We em-ployed three human annotators, who are experts in social media and have signiﬁcantexpertise in fact veriﬁcation, to verify the labels. The annotators ended up annotating , tweets ( fake and genuine) with an inter-annotator’s agreement of . (Krippendorf’s α ) with the following instructions provided:• A tweet is considered to be ‘fake’ if and only if: – It contradicts or undermines facts from a pre-deﬁned list. Note that a com-bined list was made from the aforementioned genuine sources. – It supports or elevates a commonly identiﬁed misinformation. – It is written in the form of sarcasm or humour, but promotes a misleadingstatement.• Other tweets which do not satisfy any of the above, would be either unlabelledor genuine, as per the annotator’s discretion.• If the tweet text in itself does not provide enough context to annotate with conﬁ-dence, the annotators could refer to the tweet and user features.On further observation, it is found that an average of labels given by the auto-mated techniques from Stage-3 matched the labels given by the human annotators for , samples. Thus, despite using a fully-automated and fast annotation pipeline,which allowed us to have a relatively large labelled corpus, only a noise of exists.During cross-validation, we use 20% of the human-veriﬁed tweets for testing, and13emaining 80% tweets along with the unveriﬁed tweets constitute the training set. Wemaintain the same distribution of fake and genuine tweets present in the entire datasetin both the training and test sets. (a) Hashtags (b) Sentiment (c) LikesFigure 3: (a), (b) and (c) show the distribution of hashtags, sentiment and likes across the tweets, respectively.

4. Dataset AnalysisPresence of hashtags:

Hashtags have long been an important tool on Twitter toorganise, sort, follow and spread tweets. Our dataset consists of a total of and , unique hashtags in genuine and fake tweets, respectively. We tabulate the distri-bution of hashtags for tweets in Fig. 3(a). It is evident that ‘ Presence of URLs:

To account for prevalence of misinformation, we analyse theURLs present in our entire dataset. A total of , genuine and , fake tweets It may plant some noise in the training set which a sophisticated classiﬁer should ignore while beingtrained. . and . URLs per genuine and faketweet, respectively. The contrast between the numbers may suggest that in general,genuine tweets have a higher tendency of supporting the claims.

Presence of multimedia:

Twitter supports three types of media formats in a tweet–photo (P), video (V) and GIF (G). However, it supports only one type of media in aparticular tweet with a limit of four photos and only one video/GIF. In our dataset, faketweets contain a total of , media ﬁles (2036P, 381V, 74G) across , tweets,with an average of . per tweet, while genuine tweets contain , media (1129P,339V, 5G) with an average of . . Sentiment of tweets:

To obtain overall sense of public opinion related to COVID-19, we analyse sentiment of the tweets [60] using the texblob tool. Fig. 3(b) showsthat in the highly negative (-2) and neutral (0) sentiment zones, fake news are groupedmore than the genuine news. The average sentiment polarity for fake tweets is . compared to . in genuine tweets, on a scale of -2 to 2, as shown in Fig. 3(b). Likes and retweets:

The existing propagation based approaches [25, 61] showedthe signiﬁcance of likes and retweets for fake news detection. The average number oflikes per genuine tweet is found to be . , which is signiﬁcantly higher than that( . ) of fake tweet. The tweet-wise data of likes is summarised in Fig. 3(c). The largenumber tweets of popular public health organisation explains the higher average likesper genuine tweet. About of fake tweets in our dataset are retweets of some othertweet, of the fake retweets are quoted with the comments, and of genuinetweets are retweets with 8% of them being retweets with comment. Visual representations:

We show t-SNE visual representations of labelled andunlabelled tweets on tweet text, tweet features and user features in Figs. 3, 4 and 5. Fig3 shows tweet text representations on labelled and unlabelled data. Sentence BERTis used to convert the tweet text to vector form. While the overlap of genuine andfake tweets can be observed from Fig. 4(a), the polarisation of topics can be observedfrom the unlabelled data from Fig. 5(a). Certain user features and tweet features areidentiﬁed and are mentioned in Section 5.1; these are in turn used for the visualisations https://textblob.readthedocs.io/en/dev/ a) TT-L (b) TF-L (c) UF-LFigure 4: (a), (b) and (c) show the t-SNE visual representations of tweet text, tweet features and user featuresof the labelled data, respectively. Here, TF → Tweet Features, UF → User Features and L → Labelled Data.(a) TT-UL (b) TF-UL (c) UF-ULFigure 5: (a), (b) and (c) show the t-SNE visual representations of tweet text, tweet features and user featuresof the unlabelled data, respectively. Here, TF → Tweet Features, UF → User Features and UL → UnlabelledData.

16n labelled and unlabelled data in Figs. 4 and 5 respectively. The polarisation in Fig.5 supports the same in Fig. 5(a). The labelled representation shows high non-linearoverlap and indicates the complexity of the classiﬁcation task.

5. Cross-SEAN: Our Proposed Method

In this section, we describe Cross-SEAN for fake news detection. We explainindividual components of the model, followed by the training strategy. Fig. 6 showsthe architecture of Cross-SEAN. Monti et al. [62] showed that content, social context or propagation in isolationis insufﬁcient for neural models to detect fake news. Hence, we employ additionalfeatures related to both the users and tweets along with the content of the tweets. Forthe tweet features (TFs), we consider the attributes available in the tweet object andsome handcrafted features from the tweet, amounting to a total of 10 features – numberof hashtags, number of favourites, number of retweets, retweet status, number of URLspresent, average domain score of the URL ( s ), number of user mentions, media count inthe tweet, sentiment of the tweet text, counts of various part-of-speech tags and countsof various linguistic sub-entities . Polarisation of users on similar beliefs is widelyobserved on Twitter [63]. To capture this, we extract 8 features for each correspondinguser (UFs) – veriﬁed status, follower count, favourites count, number of tweets, recenttweets per week, length of description, presence of URLs and average duration betweensuccessive tweets .These features can provide additional information of the user characteristics andtheir activities. These not only help the model identify bots and malicious fake ac-counts, but also help recognise a pattern amongst users who post false and unveriﬁedinformation.On visualising the tweet and user features on labelled and unlabelled data in Fig.3, we observe the formation of clusters of similar tweets, indicating the polarity of the Cross -Stitch based S emi-Supervised E nd-to-End A ttention N eural Network k (=10, by default) sentences are retrievedfor the tweet.In addition to this, we make use of the large amount of unlabelled data ( . M)available in

CTF –• We use one-half of the unlabelled data to ﬁne-tune word embeddings to encodethe tweet text. We expect this to help the model learn the linguistic, syntacticand contextual composition of not only general Twitter Data but also the domaindata, i.e., the COVID-19 pandemic in case of

CTF .• We leverage the other half of the unlabelled data for unsupervised training us-ing an additional adversarial loss. Experimental results presented in Section 6.2show that doing this reduces stochasticity and makes the model more robust withthe nature of adversarial training. 18 igure 6: A schematic diagram of Cross-SEAN.

We elaborate on various components of the model architecture and the trainingintricacies in the following sections.

Our entire training data is composed of labelled and unlabelled samples, denoted by X L and X U respectively. X L consists of a total of n L data points: ( x L , y L ), ( x L , y L ), · · · , ( x n L L , y n L L ), where x iL is the i th tweet and y iL is its label. X U consists of a total of n U unlabelled data points: x U , x U , ..., x n U U . In both the cases, each input sample, x iK (for K ∈ ( L, U )) comprises four input sub-sets – tweet text ( x iT T ), external knowledgetext ( x iEK ), tweet features ( x iT F ) and user features ( x iUF ).In each pass through our model, these four inputs are encoded separately as describedbelow. Encoding textual data:

The tweet text of sequence length N is represented asa one-hot vector of vocabulary size V . A word embedding layer E ∈ R V × D trans-forms the one-hot vector into a dense tensor e ∈ R N × D consisting of ( e , e , ..., e N ).19hese token vectors are further encoded using a Bidirectional LSTM, the forward andbackward layers of which process the N vectors in opposite directions.The forward LSTM emits a hidden state h ft at each time-step, which is concate-nated with the corresponding hidden state h bt of the backward LSTM to produce avector h t ∈ R (2 × H ) , h t = h ft ⊕ h bt , ∀ t ∈ [1 , N ] (1)where H is the hidden size of each LSTM layer.At each layer, a ﬁnal state output f k ∈ R H is also obtained ( ∀ k ∈ ( f, b ) ).At this stage, a net hidden vector h containing N hidden vectors from the twoLSTM layers is combined with the ﬁnal state vector f using attention across the hiddenstates, given as: v = N (cid:88) j =1 α ij h j ; α ij = Sof tmax ( h i • f j ) (2)where, f = f f ⊕ f b , f ∈ R × H (3) h = h · · · ⊕ · · · h N , h ∈ R N, (2 × H ) (4)We refer vector v obtained after attention across the hidden states as v T T , repre-senting the encoded feature of the tweet text.In addition to this, we use Sentence BERT [54] to ﬁnd contextual embedding e EK of the external knowledge corresponding to each input batch. We do this consideringthe vast difference between our tweet text input and the external knowledge text. The e EK vector is then passed through a linear layer to obtain an encoded representation v EK of the external knowledge. Encoding tweet and user features:

As shown in Fig. 3, we follow a highly concur-rent yet distinct mechanism to encode both tweet and user features. Firstly, x T F ∈ R K t TF v UF Cross-StitchUnit v UT v TU Figure 7: Working of a cross-stitch unit. Here, the notation is as deﬁned in Eq. 5. Note that the weights ofthe linear layers in the Cross-stitch unit are initialised with a unit matrix. and x UF ∈ R K u are passed through separate linear layers which interpolate them tohigher dimensional dense feature vectors v T F ∈ R K T and v UF ∈ R K U , respectively.As both x T F and x UF are handcrafted, we employ cross-stitch units, which not onlyallow the model to learn the best combination of inputs from both the features and shareacross multiple layers in the network, but also introduce a common gradient ﬂow paththrough the non-linear transformation. The transformation produced by cross-stitch isas follows: v (cid:48) j = α ij • v j + β i , ∀ i, j ∈ (1 , K T + K U ) (5)where α ij and β i denote the weights of the fully connected layer performing thecross-stitch operations.The two outputs of the cross-stitch are denoted by v T U and v UT , respectively .Note that the shape of the two vectors remains unchanged after this transformation. Connected components in Network:

We concatenate v T T and v T U , which are thetransformed feature vectors of the tweet text x T T and tweet features x T F , respectively.This produces v T = v T T ⊕ v T U , a concatenated representation of all textual features.This is done considering the inherent similarity between the tweet text and the tweetfeatures over user features. We then perform afﬁne transformations of the three vec-tors, v T , v EK and v UT , through separate feed-forward linear layers and concatenate toobtain the ﬁnal decoded vector v , effectively containing transformed feature represen- The ﬁrst letter in the subscript of v denotes feature vector assuming that it contains most informationfrom the same vector. v is then down-scaled using a fully-connectednetwork, regularized using dropout before ﬁnally obtaining the probability distributionacross the two classes. p ( y | x ; θ ) = Sof tmax ( v (cid:48) ) = Sof tmax ( || v (cid:48) T || v (cid:48) UT || (cid:48) ) (6)where, v (cid:48) represents the transformed vector after it passes through the respectivefeed forward sub-network, and θ represents the model parameters at the current time(from now on, we refer to this as f ( x ) ). For training our model, we use a mixed objective function, which is a weightedsum of both supervised and unsupervised losses: L mix = λ ML L ML + λ AT L AT + λ V AT L V AT

The losses are as follows: (i) L ML represents maximum likelihood loss and min-imizes the loss between the predicted and true labels. (ii) Additionally, we use theAdversarial Training Loss L AT , which introduces a regularization with model trainingby adding a denoising objective [42]. The goal through this training is to make themodel robust to adversarial perturbations in the input. We ﬁnd this specially useful forfake news detection as it allows the model to attend to a wide spectrum of tweets withminor variations to improve the generality. An adversarial signal r adv , deﬁned in termsof the L norm on the gradient g L , with current model parameters is used to perturbthe word embedding inputs e of x T T , e(cid:63) = e + r adv , even when this perturbationdepends upon the gradient computed over the output w.r.t all the labelled inputs x L .The L AT objective function in Eq. 9 is given as a modiﬁcation of L ML (Eq. 7). (iii)It can be observed that the above two objectives require us to know the true label ofthe data input, thus pertaining to the labelled data only. Here, to expand the conceptof adversarial training to unlabelled data, we make use of virtual adversarial trainingloss L V AT , which too is aimed to add robustness against adversarial inputs. Just as inEq. 9, we apply the perturbation on the word embedding e , except r adv is now deﬁned22s in Eq. 12. δ represents a small random perturbation vector [42], using a 2nd-orderTaylor series expansion followed by the power iteration method. The VAT loss is thendeﬁned as in Eq. 13. We denote f ( x ) = h ( E ( x )) , where E ( x ) ∈ R N × D is the wordembedding vector. L ML = − n L n L (cid:88) i =1 y i log( f ( x i )) + (1 − y i ) log(1 − f ( x i )) (7) r adv = − (cid:15) L / || g L || ; g L = −∇ x L log( f ( x L )) (8) L AT = − n L n L (cid:88) i =1 P + Q (9) where, P = y i log( h ( E ( x i )) + r adv ) (10) Q = (1 − y i ) log(1 − f ( h ( E ( x i ) + r adv ))) (11) r v − adv = (cid:15)g/ || g || ; g = −∇ x KL [ f ( x ) || h ( E ( x ) + δ )] (12) L V AT = 1 n L + n U n L + n U (cid:88) i =1 KL [ f ( x ) || h ( E ( x ) + r v − adv )] (13) Model Features used by the model PerformanceTT TF UF UL Accuracy Precision Recall F1 ScoreMTL (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Cross-SEAN (cid:88) (cid:88) (cid:88) (cid:88)

Table 2: Features used by the competing models and performance comparison on

CTF (TT: Tweet Text, TF:Tweet Features, UF: User Features, UL: Unlabelled Data). . Experimental Setup and Results All our experiments were performed on a single 16 GB Nvidia Tesla V-100 GPU.Our base model is a single layer Bi-LSTM with a maximum sequence length of 128 anda hidden dimension of 512. We performed experiments with a wide range of embeddingsizes ranging from 128 to 768 and found the best results with 300 dimensions. Weinitially ﬁne-tuned the word embeddings on ∼ M unlabelled tweet texts before usingthem for training. We used the Adam optimiser for all our experiments with a learningrate of 0.001, β = 0 . , β = 0 . and a decay factor of 0.5. We used dropout with p drop of 0.3 in all our feed-forward networks, where the number of layers exceeds 2.Early stopping with a patience of was also used along with gradient clipping with amaximum L norm of 1. We kept λ ML , λ AT and λ V AT as 1.

We compare Cross-SEAN with seven state-of-the-art methods described as follows.

MTL [65] uses a multitask learning framework by leveraging soft parameter sharingon classiﬁcation (primary) and regression (secondary) tasks based on tweet text andtweet features. 1HAN and 3HAN [66] use hierarchical attention based GRU networks. is the base version of 3HAN, where uses 3-level hierarchical attentionfor words, sentences and headlines learning in a bottom up manner. [67]uses hierarchical structure by applying attention mechanism at both word and sentencelevels.

CSI [68] uses a three module approach that consists of

Capture , Score and

In-tegrate , combining what they deﬁne as the three common characteristics among fakenews, i.e., text , response and source to identify misinformation. Furthermore, we alsouse dEFEND [69] as a baseline, which uses a GRU-based word-level and sentence-level encoding along with a module for sentence-comment co-attention. MixText [43]is a semi-supervised approach that produces results by leveraging large amount of train-ing samples and interpolating text in hidden space.Table 2 shows that Cross-SEAN outperforms all the baselines by a margin of atleast more than 6% accuracy and 9% F1 Score, with dEFEND being the best baseline.24 bjective Function ResultML AT VAT Accuracy F1 Score (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Table 3: Results of Cross-SEAN with different variations of the mixed objective function.

Linear Layers Attention Cross-Stitch PerformanceTF1 UF1 TF2 UF2 TF3 UF3 TF4 UF4 Accuracy F1 Score128 0.910 0.88464 256 (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Table 4: Results with various fully-connected network combinations. Here, TF i and UF i represent the i thlayer transposing a feature vector of tweet features and user features respectively. Two joined cells representa concatenated form of the respective vectors feeding as inputs to the corresponding layer. (a) Objective functions: In Table 3, we test the performance of Cross-SEAN ondifferent combinations of the mixed objective function. We vary the values of λ ML , λ AT , λ V AT between - . A steady increase in the performance can be seen as wemove from a vanilla supervised training objective (only maximum likelihood loss) toan additional semi-supervised mixed objective function.Fig. 8 shows the variation of different objectives functions – ML, AT and VAT,individually, when trained with different combinations of the mixed objective function.For instance, Fig. 8(a) shows the variation of the individual ML Loss when differentcombinations of the net objective function is used.From Fig. 8(a), the regularisation effect of the two adversarial losses, AT and VAT,is apparent as it can be observed that their introduction considerably effects the indi-vidual ML loss, making it drop to a larger extent, in fewer iterations. Even though theintroduction of AT alone seems to make the loss curve more stochastic, the net loss25 a) ML Loss (b) AT Loss (c) VAT Loss (d) Net LossFigure 8: Variation of individual loss functions of Cross-SEAN with different combinations of the mixedobjective function. is considerably lower. This can be seen in addition to the surprising smoothing effectwhich is observed wherever the VAT loss is considered, including Fig. 8(b) and 8(c).These two properties of AT and VAT losses respectively, motivate their usage together,thus resulting into an efﬁcient and smooth decrease of loss and strengthening our hy-pothesis of leveraging unlabelled data. This is further ensured by another interestingobservation by using only AT and VAT losses for the training – although as expected,we achieve a deteriorated accuracy as shown in Table 3, the corresponding losses inFigs. 8(b)-8(c) show high consistency and smoothness. Fig. 8(d) shows the ﬁnal losscurve when all the 3 losses are used, i.e., when λ ML = λ AT = λ V AT = 1 . (b) Model Components: Fig. 4 shows the importance of different components usedin Cross-SEAN such as cross-stitch, attention and feed-forward layers for tweet anduser features. We experiment across several combinations of tweet features and userfeatures with concatenation and usage of cross passing through various layers as shownin Fig. 4. We ﬁnd that the best architecture is with the cross-stitch on tweet and userfeatures when one output of the cross-stitch is combined in the early stages of thenetwork and the other output is fused in the later stage. Also the use of attention showsperformance improvement of the ﬁnal model.In our initial set of experiments, the cross-stitch was introduced between the en-coded representation of the tweet text, obtained after passing it through Bi-LSTM, anda concatenated form of tweet and user features. A considerable difference in the per-formance is observed between the two, the former being the superior one. We relatethis to the fact that the encoded representation of the tweet text is considerably different26rom the additional features, while they in themselves are very similar. Further, sincethe tweet features are inherently more similar to the tweet text, the cross-stitch outputcorresponding to the tweet features is ﬁrst concatenated to the encoded tweet text andlastly with the user features. This is also shown in Table 4, where the architectureused in the last row evidently outperforms the one in the 3rd row, which representsconcatenation of the three outputs on the same level.

7. Chrome-SEAN: A Chrome Extension

Cross-SEAN is an end-to-end model which enables for identiﬁcation of fake tweetsin real time. Keeping the users warned is a very important step and would help withan easy access through the browser. In order to help users detect misinformation onTwitter in real time, we deploy Cross-SEAN as a Chrome browser extension, calledChrome-SEAN that replicates the performance of the model while performing a lot ofother features as well.Chrome-SEAN is built as Chrome extension, which uses jQuery to send and re-ceive requests from POST API method. We deployed the Cross-SEAN model usingFlask in our local servers which can receive the POST API requests concurrently.To handle the load balancing over multiple concurrent requests, we use Redis . Theserver is not burdened with resource intensive requests, and the combination of Flaskand Redis performs efﬁcient communication through APIs.Chrome-SEAN ﬁrst identiﬁes the tweet ID through the URL while scanning Twit-ter, and sends it to the server using an API. Chrome-SEAN also provides the optionto enter the tweet ID manually. Upon requesting to Cross-SEAN, the raw data is ﬁrsttransformed to the necessary format and then passed through the model. The detectedclass along with its conﬁdence from softmax layer is returned back to the extensionand displayed. Fig. 9 shows the working of Chrome-SEAN in two stages. The formerstage - Extraction of the tweet is performed in the browser side and is instant, where as https://jquery.com https://flask.palletsprojects.com/en/1.1.x/ https://redis.io/ igure 9: The working of Chrome-SEAN, a Chrome extension of Cross-SEAN. the latter stage - Veriﬁcation of the tweet takes on an average of 1.2 seconds per tweet.As shown in Fig. 9, we take users’ feedback on our ﬁnal classiﬁcation output andconsider it as a true label in the extended online dataset. Additionally, we employan online training mechanism on the basis of users’ feedback if it differs from theclass identiﬁed and check the conﬁdence of the model; the model is trained only if theconﬁdence is lower than . . We take special care before online training to make themodel robust to attackers attempting to pollute the results. To handle load balancingon the server, we make use of Redis. User Study:

Chrome-SEAN was tested by 35 users until now. We ﬁrst randomlysampled tweets from the human-annotated set of tweets which were not a part of thetraining set, assigned them to users and asked them to test on similar tweets, totalling tweet inputs, ranging from a wide variety of sub-topics, users and timelines. It wasobserved that of these input tweets were made within the last days, werefrom new users with less than tweets, and had a retweets count of less than 10.We asked users to provide feedback on each tweet they tested with Chrome-SEAN,in accordance with the true label. We found that out of ratings were positive,i.e., deeming the prediction by Chrome-SEAN correct, resulting into an accuracy of and F1 Score of . . Such high level of accuracy on such a diverse set ofinputs depicts Cross-SEAN’s ability to pick the appropriate input features when makinga prediction. 28 . Discussion and Conclusion This work introduced the task of COVID-19 fake news detection on Twitter. Wecollected related tweets from diverse sources. Post human annotations, we proposed

CTF , the ﬁrst labelled Twitter dataset, consisting of COVID-19 related labelled genuineand fake tweets along with a huge set of unlabelled data. We also presented a thoroughanalysis to understand surface-level linguistic features.As the amount of labelled data is limited, we made use of the vast unlabelled data totrain the neural attention model in a semi-supervised fashion as learning the semanticstructures of language around COVID-19 helps the model learn better. We collectedexternal knowledge for all the tweets by taking the most relevant stance from crediblesources on the web. As fake news around COVID-19 are emerging, even if the modelis not trained on a certain fake news topics, we assume that external knowledge froma trusted source could help aid the classiﬁcation. We built a neural attention modelwhich takes various inputs such as tweet text, tweet features, user features and exter-nal knowledge for each tweet. We employed cross-stitch units for optimal sharing ofparameters among tweet features and user features. As tweet text and tweet featuresare closely related, we performed optimal sharing of information by concatenating oneoutput of cross-stitch early in the network and the other latter. Maximum likelihoodand adversarial training are used for supervised loss, while virtual adversarial train-ing for unsupervised loss. Usage of adversarial losses further adds regularisation androbustness to the model. We then incorporated this model into Cross-SEAN, a novelcross-stitch model which performs under a semi-supervised setting by leveraging bothunlabelled and labelled data with optimal data sharing across various tweet informa-tion.Cross-SEAN is highly effective, outperforming seven state-of-the-art models sig-niﬁcantly. We contrasted features of baseline models with Cross-SEAN and showedvarious metrics. We showed a thorough ablation study with various fully-connectednetwork combinations of the model and the respective accuracy contrasting the impor-tance of individual components of the model. We also showed variation of individualloss functions with the different conﬁgurations of the mixed objective function.29o make use of Cross-SEAN in real time by general users, we developed Chrome-SEAN, a chrome extension based on Cross-SEAN to ﬂag fake tweets, which showedreasonable performance in a small-scale user study. Chrome-SEAN is built to be ro-bust to handle vast amount of concurrent requests. We introduced several features toChrome-SEAN which can further help collect labelled data using user feedback. Cross-SEAN further trains in an online fashion, for a given feedback if the conﬁdence of themodel is low. Chrome-SEAN is further tested by human subjects.

Shortcomings of Cross-SEAN:

We observe following shortcomings of Cross-SEAN:• The nature of language used in micro-blogging sites such as Twitter, in certaintimes makes the external knowledge noisy. Often times, a few trusted newssources on the Internet are biased on political topics which in turn create bias inthe external knowledge.• Although external knowledge adds additional information relative to the test timehelping emerging fake news, it may not promise complete robustness and earlydetection.• Although the tweet features, user features and external knowledge can attributeto general fake news, Cross-SEAN is a model speciﬁcally tuned for COVID-19fake news, and is not tested on general fake news on Twitter.

Future work:

We plan improve on the following points:• We intend to study the dynamic graph structure of the follower-followee andtweet-retweet network, and extract representations from tweet and user nodes tohelp early detection of COVID-19 fake news.• We will add additional improved ﬁlters to the process of extracting externalknowledge to remove possible bias and noise.• We will work towards explainability of Cross-SEAN using the current structuresof attention mechanism. 30 We plan to incorporate semantic information from other forms of media such asimages, GIFs or videos that could be available in the tweet. Even the textualinformation present in such media will be extracted and used for detection.

References [1] H. Allcott, M. Gentzkow, Social media and fake news in the 2016 election, Jour-nal of economic perspectives 31 (2) (2017) 211–36.[2] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media:A data mining perspective (2017). arXiv:1708.01967 .[3] N. Kshetri, J. Voas, The economics of “fake news”. it prof. 19 (6), 8–12 (2017)(2017).[4] A. Kucharski, Study epidemiology of fake news, Nature 540 (7634) (2016) 525–525.[5] J. Golbeck, M. Mauriello, B. Auxier, K. H. Bhanushali, C. Bonk, M. A. Bouza-ghrane, C. Buntain, R. Chanduka, P. Cheakalos, J. B. Everett, et al., Fake news vssatire: A dataset and analysis, in: Proceedings of the 10th ACM Conference onWeb Science, 2018, pp. 17–21.[6] X. Zhou, R. Zafarani, A survey of fake news: Fundamental theories, detectionmethods, and opportunities, ACM Computing Surveys (CSUR) 53 (5) (2020) 1–40.[7] K.-C. Yang, C. Torres-Lugo, F. Menczer, Prevalence of low-credibility informa-tion on twitter during the covid-19 outbreak, arXiv preprint arXiv:2004.14484.[8] P. N. Howard, G. Bolsover, B. Kollanyi, S. Bradshaw, L.-M. Neudert, Junk newsand bots during the us election: What were michigan voters sharing over twitter,CompProp, OII, Data Memo.[9] K. H. Jamieson, J. N. Cappella, Echo chamber: Rush Limbaugh and the conser-vative media establishment, Oxford University Press, 2008.3110] J. H. Harvey, W. J. Ickes, R. F. Kidd, New Directions in Attribution Research:Volume 1, Vol. 1, Psychology Press, 2018.[11] M. Fisher, Syrian hackers claim ap hack that tipped stock market by $136 billion.is it terrorism, Washington Post 23.[12] S. Masud, S. Dutta, S. Makkar, C. Jain, V. Goyal, A. Das, T. Chakraborty, Hate isthe new infodemic: A topic-aware modeling of hate speech diffusion on twitter,arXiv:2010.04377 arXiv:2010.04377 .[13] C. Castillo, M. Mendoza, B. Poblete, Information credibility on twitter, in: Pro-ceedings of the 20th international conference on World wide web, 2011, pp. 675–684.[14] V. Qazvinian, E. Rosengren, D. R. Radev, Q. Mei, Rumor has it: Identifying mis-information in microblogs, in: Proceedings of the conference on empirical meth-ods in natural language processing, Association for Computational Linguistics,2011, pp. 1589–1599.[15] A. Gupta, P. Kumaraguru, C. Castillo, P. Meier, Tweetcred: Real-time credibilityassessment of content on twitter, in: International Conference on Social Informat-ics, Springer, 2014, pp. 228–243.[16] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong, M. Cha, Detectingrumors from microblogs with recurrent neural networks, in: IJCAI, AAAI Press,2016, pp. 3818–3824.[17] T. Chen, X. Li, H. Yin, J. Zhang, Call attention to rumors: Deep attention basedrecurrent neural networks for early rumor detection, in: Paciﬁc-Asia Conferenceon Knowledge Discovery and Data Mining, Springer, 2018, pp. 40–52.[18] S. R. Sahoo, B. Gupta, Multiple features based approach for automatic fakenews detection on social networks using deep learning, Applied Soft Computing100 (2021) 106983. doi:https://doi.org/10.1016/j.asoc.2020.106983 . 32RL [19] S. Kwon, M. Cha, K. Jung, Rumor detection over varying time windows, PloSone 12 (1).[20] Y. Liu, Y.-F. B. Wu, Early detection of fake news on social media through prop-agation path classiﬁcation with recurrent and convolutional networks, in: Thirty-Second AAAI Conference on Artiﬁcial Intelligence, 2018.[21] R. K. Kaliyar, A. Goswami, P. Narang, S. Sinha, Fndnet – a deep convolutionalneural network for fake news detection, Cognitive Systems Research 61 (2020) 32– 44. doi:https://doi.org/10.1016/j.cogsys.2019.12.005 .URL [22] B. Malhotra, D. K. Vishwakarma, Classiﬁcation of propagation path and tweetsfor rumor detection using graphical convolutional networks and transformerbased encodings, in: 2020 IEEE Sixth International Conference on MultimediaBig Data (BigMM), IEEE, 2020, pp. 183–190.[23] V. L. Rubin, N. Conroy, Y. Chen, S. Cornwell, Fake news or truth? using satir-ical cues to detect potentially misleading news, in: Proceedings of the secondworkshop on computational approaches to deception detection, 2016, pp. 7–17.[24] H. Ahmed, I. Traore, S. Saad, Detecting opinion spams and fake news using textclassiﬁcation, Security and Privacy 1 (1) (2018) e9.[25] M. Granik, V. Mesyura, Fake news detection using naive bayes classiﬁer, in: 2017IEEE First Ukraine Conference on Electrical and Computer Engineering (UKR-CON), IEEE, 2017, pp. 900–903.[26] F. A. Ozbay, B. Alatas, Fake news detection within online social media usingsupervised artiﬁcial intelligence algorithms, Physica A: Statistical Mechanics andits Applications 540 (2020) 123174. 3327] X. Zhou, J. Wu, R. Zafarani, Safe: Similarity-aware multi-modal fake news de-tection, in: Paciﬁc-Asia Conference on Knowledge Discovery and Data Mining,Springer, 2020, pp. 354–367.[28] D. Varshney, D. K. Vishwakarma, A uniﬁed approach for detection of clickbaitvideos on youtube using cognitive evidences, Applied Intelligence 1–22.[29] S. Helmstetter, H. Paulheim, Weakly supervised learning for fake news detectionon twitter, in: 2018 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining (ASONAM), IEEE, 2018, pp. 274–277.[30] G. Gravanis, A. Vakali, K. Diamantaras, P. Karadais, Behind the cues: A bench-marking study for fake news detection, Expert Systems with Applications 128(2019) 201–213.[31] D. Yu, N. Chen, F. Jiang, B. Fu, A. Qin, Constrained nmf-based semi-supervisedlearning for social media spammer detection, Knowledge-Based Systems 125(2017) 64–73.[32] G. B. Guacho, S. Abdali, N. Shah, E. E. Papalexakis, Semi-supervised content-based detection of misinformation via tensor embeddings, in: 2018 IEEE/ACMInternational Conference on Advances in Social Networks Analysis and Mining(ASONAM), IEEE, 2018, pp. 322–325.[33] X. Dong, U. Victor, L. Qian, Two-path deep semisupervised learning for timelyfake news detection, IEEE Transactions on Computational Social Systems.[34] D. K. Vishwakarma, D. Varshney, A. Yadav, Detection and veracity analysis offake news via scrapping and authenticating the web search, Cognitive SystemsResearch 58 (2019) 217–229.[35] P. Meel, D. K. Vishwakarma, Fake news, rumor, information pollution in socialmedia and web: A contemporary survey of state-of-the-arts, challenges and op-portunities, Expert Systems with Applications (2019) 112986.3436] A. Bondielli, F. Marcelloni, A survey on fake news and rumour detection tech-niques, Information Sciences 497 (2019) 38–55.[37] D. S. Sachan, M. Zaheer, R. Salakhutdinov, Revisiting lstm networks for semi-supervised text classiﬁcation via mixed objective function, in: Proceedings of theAAAI Conference on Artiﬁcial Intelligence, Vol. 33, 2019, pp. 6940–6948.[38] R. Johnson, T. Zhang, Semi-supervised convolutional neural networks for textcategorization via region embedding, in: Advances in neural information pro-cessing systems, 2015, pp. 919–927.[39] S. Gururangan, T. Dang, D. Card, N. A. Smith, Variational pretraining forsemi-supervised text classiﬁcation, in: Proceedings of the 57th Annual Meet-ing of the Association for Computational Linguistics, Association for Computa-tional Linguistics, Florence, Italy, 2019, pp. 5880–5894. doi:10.18653/v1/P19-1590 .URL [40] M. Chen, Q. Tang, K. Livescu, K. Gimpel, Variational sequential labelers forsemi-supervised learning, in: Proceedings of the 2018 Conference on Empir-ical Methods in Natural Language Processing, Association for ComputationalLinguistics, Brussels, Belgium, 2018, pp. 215–226. doi:10.18653/v1/D18-1020 .URL [41] K. Nigam, A. K. McCallum, S. Thrun, T. Mitchell, Text classiﬁcation from la-beled and unlabeled documents using em, Machine learning 39 (2-3) (2000) 103–134.[42] T. Miyato, A. M. Dai, I. Goodfellow, Adversarial training methods for semi-supervised text classiﬁcation, arXiv preprint arXiv:1605.07725.[43] J. Chen, Z. Yang, D. Yang, MixText: Linguistically-informed interpolation ofhidden space for semi-supervised text classiﬁcation, in: Proceedings of the 58thAnnual Meeting of the Association for Computational Linguistics, Association35or Computational Linguistics, Online, 2020, pp. 2147–2157. doi:10.18653/v1/2020.acl-main.194 .URL [44] H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, mixup: Beyond empirical riskminimization, arXiv preprint arXiv:1710.09412.[45] Q. Xie, Z. Dai, E. Hovy, M.-T. Luong, Q. V. Le, Unsupervised data augmentationfor consistency training, arXiv preprint arXiv:1904.12848.[46] A. M. Dai, Q. V. Le, Semi-supervised sequence learning, in: Advances in neuralinformation processing systems, 2015, pp. 3079–3087.[47] A. Hussain, E. Cambria, Semi-supervised learning for big social data analysis,Neurocomputing 275 (2018) 1662–1673.[48] J. Howard, S. Ruder, Universal language model ﬁne-tuning for text classiﬁcation,arXiv preprint arXiv:1801.06146.[49] Carlson, Coronavirus Tweets, Tweets (json) for Coronavirus on Kaggle, , online; ac-cessed 2020 (2020).[50] Shane Smith, Coronavirus (covid19) Tweets -early April, , online; accessed2020 (2020).[51] Sven Celin, COVID-19 tweets afternoon 31.03.2020., , online; accessed 2020(2020).[52] Echen, COVID-19-TweetIDs-GIT, https://github.com/echen102/covid-19-TweetIDs , online; accessed 2020 (2020).3653] U. Qazi, M. Imran, F. Oﬂi, Geocov19: a dataset of hundreds of millions of mul-tilingual covid-19 tweets with location information, SIGSPATIAL Special 12 (1)(2020) 6–15.[54] N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using SiameseBERT-networks, in: Proceedings of the 2019 Conference on Empirical Methodsin Natural Language Processing and the 9th International Joint Conference onNatural Language Processing (EMNLP-IJCNLP), Association for ComputationalLinguistics, Hong Kong, China, 2019, pp. 3982–3992. doi:10.18653/v1/D19-1410 .URL [55] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach,arXiv preprint arXiv:1907.11692.[56] E. Cambria, D. Das, S. Bandyopadhyay, A. Feraco, Affective computing andsentiment analysis, in: A practical guide to sentiment analysis, Springer, 2017,pp. 1–10.[57] M. S. Akhtar, A. Ekbal, E. Cambria, How intense are you? predicting intensitiesof emotions and sentiments using stacked ensemble [application notes], IEEEComputational Intelligence Magazine 15 (1) (2020) 64–75.[58] M. E. Basiri, S. Nemati, M. Abdar, E. Cambria, U. R. Acharya, Abcdm: Anattention-based bidirectional cnn-rnn deep model for sentiment analysis, FutureGeneration Computer Systems 115 (2021) 279–294.[59] S. R. Bowman, G. Angeli, C. Potts, C. D. Manning, A large annotated corpus forlearning natural language inference, in: Proceedings of the 2015 Conference onEmpirical Methods in Natural Language Processing, Association for Computa-tional Linguistics, Lisbon, Portugal, 2015, pp. 632–642. doi:10.18653/v1/D15-1075 .URL doi:https://doi.org/10.1016/j.asoc.2020.106754 .URL