Self-Supervised Claim Identification for Automated Fact Checking
SSelf-Supervised Claim Identification for Automated Fact Checking
Archita Pathak
University at Buffalo (SUNY)Buffalo, NY [email protected]
Mohammad Abuzar Shaikh
University at Buffalo (SUNY)Buffalo, NY [email protected]
Rohini K. Srihari
University at Buffalo (SUNY)Buffalo, NY [email protected]
Abstract
We propose a novel, attention-based self-supervised approach to identify “claim-worthy” sentences in a fake news article,an important first step in automated fact-checking. We leverage aboutness of headlineand content using attention mechanism forthis task. The identified claims can be usedfor downstream task of claim verification forwhich we are releasing a benchmark datasetof manually selected compelling articleswith veracity labels and associated evidence.This work goes beyond stylistic analysis toidentifying content that influences readerbelief. Experiments with three datasets showthe strength of our model . The explosion of fake news on social media hasresulted in global unrest and has been a major con-cern for governments and societies worldwide .According to a recent Pew Research Study, Amer-icans rate it as a larger problem than racism, cli-mate change, or illegal immigration . Since, it’sinexpensive to create a website and easily dissem-inate content on the social media platforms, thereis a rising need for automated fake news detec-tion. Furthermore, AI solutions are also requiredto follow good practices, specifically avoiding cen-sorship, violation of fundamental rights such asfreedom of expression, and ensuring data privacy(de Cock Buning, 2018). However, to date, AImodels proposed for fake news detection do not Data and code available at:https://github.com/architapathak/Self-Supervised-ClaimIdentification scale for detecting real-time fake news .Much of the research on automated text-basedfake news detection can be classified into threebroad categories: (1) linguistic approach, whichfocuses on lexical, stylometric and pattern learningmechanisms (Potthast et al., 2017; Rashkin et al.,2017; Wang, 2017; Singhania et al., 2017; P´erez-Rosas et al., 2018); (2) network-based approach,which leverages features such as the speed and vol-ume of propagation of fake news articles on socialmedia platforms (Castillo et al., 2011; Yang et al.,2012; Kwon et al., 2013; Ma et al., 2015; Jin et al.,2016; Ruchansky et al., 2017; Wu and Liu, 2018);and (3) automated fact-checking approach, whichis an effort to assist manual fact-checkers by au-tomating some of their tasks such as detection andverification of claims (Graves, 2018).While most work in automated fact-checkinghas been focused on claim verification task, veryfew methods have been proposed for detection ofclaims (Hassan et al., 2017; Jaradat et al., 2018;Konstantinovskiy et al., 2018). The approachesin these efforts are majorly related to political dis-course . However, our focus is on fake news , whichare broader than political discourse since (i) theyare deliberately written with a divisive agenda tocause social unrest, (ii) they are not constrained toonly politics, and (iii) the headline plays an equallyimportant role in compelling people to read thearticle.In this paper, we focus on articles where there isa deliberate intent to influence readers through fab-ricated or manipulated claims in the headline andthe content. Such articles have a compelling writ-ing style similar to the mainstream media. Hence,we build datasets containing these type of com-pelling articles along with veracity labels and as-sociated evidence supporting the label of each arti- a r X i v : . [ c s . C L ] F e b le. We, then, use these datasets to identify “claim-worthy” sentences. In our work, we define “claim”as “statements which are important to the pointof the article but one would require to have themverified.” Our working hypothesis is that in fake newswhich are created to cause harm, these are the sen-tences most relevant to the headline. Exploitingthe hypothesis that the essence of a news article isencapsulated in its headline (Jaime Sis´o and MER-CEDES, 2009; Kuiken et al., 2017; Wahl-Jorgensenand Hanitzsch, 2009), we propose a self-supervisedmethod to explore the aboutness of the content withthe headline of the article to extract the most rele-vant sentences. Bruza and Huibers (1996) defines aboutness as: an information carrier i will be saidto be about information carrier j if the informa-tion borne by j holds in i . The idea is taken fromInformation Retrieval domain where it is used tosignify implications between query and document,specifically to explore the underlying meaning orconcept within the document and the query (Az-zopardi et al., 2009). In our work, headline is mod-elled as a query while each of the sentences of thearticle acts as a document, and we use the conceptof aboutness to find the relevant sentences. Weshow that attention-based mechanisms are able tosuccessfully capture this concept in the news arti-cle.
Contribution:
In this work: (i) we introducea self-supervised representation learning modelthat eliminates the prerequisite that requires humanto annotate data, which is a time consuming andcostly task; (ii) the proposed headline-to-sentenceattention-based approach for claim identification isnovel; previous unsupervised approach for this taskuse weak supervisory signal which does not capturethe context of the article efficiently; and (iii) wepropose a benchmark dataset for evidence-basedfake news detection. Our dataset contains evidencefor each of the fake news articles that contributesto the overall degree of veracity of the article.
Claim Identification/Detection:
The task ofclaim identification/detection was first introducedby Levy et al. (2014) who, with the help of humanannotators, provided a dataset and a fundamentalapproach in identifying context dependent claims.In their dataset, which was originally developedby Aharoni et al. (2014), each statement indicates whether it should be considered as a context de-pendent claim (CDC) or not. Levy et al. (2014)reported encouraging results obtained through asupervised learning algorithm using a cascade ofclassifiers. A rule-based model was introducedby Eckle-Kohler et al. (2015) to bifurcate claimand premise statements in an argumentative dis-course environment. However, these methods weregeneric to only a small set of corpora. Furthermore,Levy et al. (2017) also introduced an unsupervisedapproach to detect claims, which involves a weaksupervisory signal “that” for training. However,this approach does not capture the aboutness of thearticle to understand the context of “claim-worthy”sentences.In 2017, Hassan et al. (2017) introduced Claim-Buster, a platform developed by training a super-vised learning model on a large annotated cor-pus of televised debates in the USA. Their modelused SVM classifier to detect claim-worthy factualclaims and produced a score of how important aclaim is to factcheck. The 20,000 sentences in thecorpus were annotated by human coders to distin-guish between claim-worthy factual claims fromopinions and boring statements. However, anno-tating a sentence as an important or unimportantclaim is a non-trivial task as this decision changesdepending on who’s asking, political context andannotator’s background (Graves, 2018).The model proposed by Hassan et al. (2017)only learns the labelled instances and does notexplore the contextual information of the writtentext. A context-aware approach in the political dis-course environment was introduced by Genchevaet al. (2017) who created a rich representation ofthe sentences from 2016 US presidential debates.Their dataset was compiled by taking the outputsof factchecking of the debates from 9 factcheckingorganizations. Their models were created to pre-dict if the claim would be highlighted by at leastone or by a specific organization. However, theauthors don’t have any formal definition of claimin their paper, and their model is specific to certainorganizations which led to several false positives.Another context-aware approach for claim de-tection was proposed by Konstantinovskiy et al.(2018) who used sentence embeddings, pre-trainedon a large dataset of NLI. This work also created acrowd-sourced annotated dataset of sentences fromUK political TV shows, annotated across 7 classes.However, their classifiers for the fine-grained clas-ification to detect 7 classes of sentences did notyield good results due to lack of enough annotateddata, thus requiring more annotations which is acostly and time consuming task.We build a model that can be trained in a self-supervised setting to overcome the challenges as-sociated with annotated dataset of claims. We alsouse attention-based approach to capture aboutness and rich contextual information between headlineand all the sentences of the article. The perfor-mance on manually created test sets demonstratepromising results in identifying “claim-worthy”sentences even when no sentence-level annotationwas used for training.
Fake News Dataset:
A variety of fake newsdatasets have been released in the recent years,most notably Buzzfeed and Stanford (Allcott andGentzkow, 2017) datasets containing list of popu-lar fake news articles from 2016 US presidentialelections. However, these datasets only containwebpage URLs of the original article and major-ity of them don’t exist anymore. Following this,several other datasets were published such as Fakenews challenge dataset which was used for thetask of stance detection; Getting Real about FakeNews
Kaggle dataset which was created by usingBS detector tool; and FakeNewsCorpus which isan open-source large scale collection of fake newsarticles. However, these articles are labelled as fakebased on the domain of the websites they comefrom. Since, the content of these articles are notverified for degree of veracity, using them directlyfor training may lead to several false positives.This problem was overcome by recently releasedlarge dataset, NELA-GT-2018 (Norregaard et al.,2019), which contains articles with ground truthratings retrieved from 8 different assessment sites.However, the label definitions are not generic anddependent on the external organizations. Pathakand Srihari (2019) also introduced intuitive groundtruth labels based on the degree of veracity ofthe fake news articles, however, the dataset is notpublicly available. Additionally, they also do notspecify the relationship of their labels with the la-bels used by established fact-checking organiza-tions. Furthermore, due to lack of evidence in thesedatasets, they cannot be used for downstream task https://github.com/several27/FakeNewsCorpus of evidence-based verification, which is one of themotivations of this paper. We overcome all theselimitations in our datasets described in the follow-ing section. We introduce two datasets of compelling fake newsarticles which have writing style similar to main-stream media. The first dataset, DNF-700, whereDNF stands for D isi NF ormation , contains articleson politics published within 4 months of 2016 USPresidential Elections from questionable sources(non-mainstream). To compile this dataset, wefirst extracted fake news articles from workingweb page URLs of Stanford dataset (Allcott andGentzkow, 2017). However, majority of webpageURLs in this dataset are expired and we could ex-tract only 26 fake news articles. Therefore, wethen used “ Getting real about fake news ” Kaggle dataset to sample more articles on politics. Since,most of the articles in this dataset contain anoma-lies (eg: incomplete article, social media commentslabelled as fake etc.), we manually verified the writ-ing style and discarded obvious fakes - articles withpoor grammar and excessive usage of punctuations.However, the degree of veracity of each article inthis dataset is not checked and some articles maycontain personal opinions.The second dataset, DNF-300, is more so-phisticated subset of DNF-700, containing 290compelling articles on Politics and 10 onHealth/Medical news. Unlike other fake newsdatasets in which veracity and evidence for articlesare not provided, DNF-300 contains articles associ-ated with veracity labels as well as correspondingevidence. The process of annotating this datasetinvolves identifying sentences from each articlebased on their persuasive tone and relevance withthe headline. These sentences were then queriedon the web and top 10 results were considered togather evidence from credible sources . Based onthe evidence found, we label the entire article intofour categories: { (0) false ; (1) partial truth ; (2) opinions stated as fact ; (3) true } . These labelsare inspired by (Pathak and Srihari, 2019); Table-1 shows the description and distribution of theselabels while the comparison with two popular fact-checking websites is displayed in Figure-1. An Credible sources were extracted fromhttps://mediabiasfactcheck.com/ The sources range be-tween left, center and right biased news sources abel False Partial True OpinionStated AsFact TrueDesc. (i) No evi-dence couldbe found, or(ii) foundevidenceto refutethe entirearticle Articleabout trueevent, how-ever, foundevidence re-futing someof the claims Article containfalse/manipulatedclaims, how-ever, it’s anopinion articlewhich cannotbe labelled asfake Foundevidencesupport-ing theentirearticle
Total
126 75 79 20
Table 1: DNF-300 label description and distribution.Claims here are the sentences manually selected basedon their persuasive tone and relevance with the head-line. Interestingly, some of the articles, which werelabelled as fake in other datasets due to the domain ofpublishing website, turned out to be true news. example from the dataset is shown in Table-2
Partial Truth False Opinion Stated asFactMostly False Mixture Unproven FalseHalf True Mostly False False Pants on FireSnopesDNF-300PolitiFact
Figure 1: Label Comparison with Snopes and Politi-Fact ratings.
This dataset is also a key contribution of thispaper as the articles are manually read and veri-fied. Additionally, the dataset contains two novelfeatures which are essential for the fake news ver-ification task: (i) generic veracity-based label set,independent of any external organization, and (ii)ground truth evidence corresponding to each label.In addition to these two datasets, we also trainour model for claim identification on the datasetintroduced for context dependent claim detection(CDCD) by Levy et al. (2014). Although thisdataset (CDC) does not contain fake news articles,it has manually annotated sentences based on theirrelevance to a certain topic. These annotations wereutilized for the evaluation of our self-supervisedlearning model described in the following section.More details on the datasets and examples can befound in Appendix.
Problem Definition:
Given an article with a setof sentences S = { S , S , ...S i , ...S n } and a head-line H , the task of our multihead attention claim identification network (MA-CIN) is to extract thesentence most relevant to the headline. Our self-supervised model exploits the rich contextual infor-mation to extract the relevant sentences which areconsidered as “claim-worthy”. Approach:
For this task, we implement two typesof attention: (i) self-attention on all sentence vec-tors so that each sentence S i is aware of all othersentences in S ; (ii) cross-attention of headline vec-tor on each sentence vector, so that all self-attendedsentences are also aware of the headline’s context.We then generate headline based on the context-aware sentences, and compare it with the originalheadline in three different settings as listed below:1. Headline Vector (MA-CIN (HV)):
In thissetting, the original headline vector acts as thesupervisory signal for self-supervised learn-ing. We minimize the mean squared error(MSE) between the generated and the originalheadline vectors for training.2.
Headline One-Hot Word Vector (MA-CIN(OHWV)):
In this setting, the words in theoriginal headline act as the supervisory sig-nal. We use LSTM (Hochreiter and Schmid-huber, 1997) to predict at most 50 words, froma vocabulary of 20,000 words, to generatea one-hot-vector for each word of the newheadline. We then minimize the categoricalcross-entropy error (CCE) at each time stepcorresponding to each word in original andnew headlines for training.3.
Combined HV & OHWV (MA-CIN (Com-bined)):
In this setting, both original headlinevector and the words act as supervisory signal.Therefore, we combine the two loss functionsmentioned above to train the model.For this, we build several layers in our architecture(see Figure-2), which are delineated as follows:
Each sentence S i ∈ S , and headline H are con-verted to a fixed length 300-dimensional vector, s i and h , such that s i , h ∈ R × D , where D = 300 .For uniformity, we calculate the maximum num-ber of sentences L that an article can contain inthe respective corpus. Next, we zero pad the dif-ference in the quantity of sentence vectors in eacharticle such that every article can be represented asa vector A ∈ R L × D . eadline : Allergens in Vaccines Are Causing Life-Threatening Food AllergiesIt would probably surprise few people to hear that food allergies are increasingly common in U.S.children and around the world . According to one public health website , food allergies in childrenaged 0-17 in the U.S. increased by 50% from 1997 to 2011. Although food allergies are now sowidespread as to have become almost normalized, it is important to realize that millions of Americanchildren and adults suffer from severe rapid-onset allergic reactions that can be life-threatening.Foods represent the most common cause of anaphylaxis among children and adolescents. TheUnited Kingdom has witnessed a 700% increase in hospital admissions for anaphylaxis and a500% increase in admissions for food allergy since 1990. The question that few are asking is whylife-threatening food allergies have become so alarmingly pervasive. A 2015 open access casereport by Vinu Arumugham in the Journal of Developing Drugs , entitled “ Evidence that FoodProteins in Vaccines Cause the Development of Food Allergies and Its Implications for VaccinePolicy ,” persuasively argues that allergens in vaccines—and specifically food proteins—may bethe elephant in the room. As Arumugham points out, scientists have known for over 100 years thatinjecting proteins into humans or animals causes immune system sensitization to those proteins.And, since the 1940s, researchers have confirmed that food proteins in vaccines can induce allergyin vaccine recipients. Arumugham is not the first to bring the vaccine-allergy link to the public’sattention. Heather Fraser makes a powerful case for the role of vaccines in precipitating peanutallergies in her 2011 book, The Peanut Allergy Epidemic: What’s Causing It and How to Stop It. Type : 1 (
Partial Truth ) Authors :Admin - Orissa
URLs : galacticconnection.com
Evidence
Reason : The key claim is written in such a way so that it misleads people in thinking all the foodrelated allergies in US are caused by vaccines. Found evidence which says these type of allergiesare rare.
Table 2: An example on
Partial Truth type from DNF-300 dataset.
To effectively capture local relevance, we leverage1D-CNN (LeCun et al., 1998) to extract the featuresfrom the article vector A . For our experiments thekernel size for each convolution layer is K × D × C , where K is kernel-width and C is the numberof filters. This means the network will processK sentences at a time. The size of K and C is ahyper-parameter and as per our experiments, weset K = 4 with an assumption that not more than4 consecutive sentences will be relevant to eachother. Inspired by the attention implementation in (Zhanget al., 2018; Vaswani et al., 2017), to capture globalrelevance, the article features from the previous1D-CNN layer are transformed into feature spaces q , k to calculate attention, where q ( x ) = W q x and k ( x ) = W k x . β j,i = exp ( z ij ) (cid:80) Ni =1 exp ( z ij ) , where z ij = q ( x i ) T k ( x j ) (1) β ∈ R N × N is the attention coefficient, which is thenormalized relevance score between the sentence x i and x j . β is then matrix multiplied by v , where v ( x ) = W v x , to obtain the context rich output o j ∈ R C × . o j = N (cid:88) i = i β j,i v ( x i ) , where o j ∈ { o , o , . . . , o N } (2)Finally, the output of the self-attention layer is o ∈ R C × N , which is computed as o j = g ( o j ) , where, g ( x ) = W g x (3)In the above equations, x ∈ R C × N is obtainedafter applying 1D convolution on sentence vectors, W q ∈ R C × C , W k ∈ R C × C , W v ∈ R C × C , W g ∈ D CNN s HeadlineVector D en s e CA CA M CA ConcatSA SA M SA Concat DimensionReduction Decoder Generated Headline VectorSentenceVectors MultiheadSelf Attention MultiheadCross Attention Generated Headline0ne-Hot Word Vector
Figure 2: Architecture of Multihead Attention - Claim Identification Network (MA-CIN). The model is trainedby using self-supervised learning approach using three variants of supervisory-signal - headline vector, headlinewords and the combination of both vector and words. R C × C and output o ∈ R C × N . Following the workby (Zhang et al., 2018) we preferred the value of C = C for computation effectiveness. We alsomultiply a λ and γ , learnable scale parameters,to the output of our attention module and inputvector respectively to allow the network to choosebetween local and global sentences effectively. o = γx + λo (4) γ is initialized to 1 and λ is initialized to 0, so asto allow the local context to be captured effectivelyduring the early iterations and as the value of λ increases it allows the network to add more contextto the representation. In the architecture, we could apply self attentionto the input x M times resulting into M attentionheads. The output of one attention head is denotedby o . We concatenate the outputs o M to get a richerrepresentation allowing the network to capture var-ious relationships. msa o = (cid:13)(cid:13) Mi =1 o i = o (cid:107) . . . (cid:107) o M (5)where, msa o ∈ R MC × N is the long range contextaware output of multihead self attention. Here, (cid:13)(cid:13) denotes concatenation across axis C . The headline vector is transformed into a featurespace h = W h h , where h ∈ R C × and then, it’srelevance is calculated with msa o , obtained fromthe previous layer, by using equations defined in 4.3. Finally, after applying multihead concatena-tion using 5, we obtain headline-context aware rep-resentation, mca o ∈ R MC × N . We fix M = 4 forall our experiments. To generate the headline vector d h as close to theinput headline vector h , we apply Mean SquaredError between d h and h and calculate the headlinevector generation loss L v L v = 1 n ( n (cid:88) i =1 ( d h i − h i ) ) (6)For estimating the probability of a word fromthe vocab in the predicted headline we calculatethe cross-entropy between the predicted headlinewords d hw and input headline one-hot vector HW . L w = − (cid:88) i d hw i log( HW i ) (7)The total loss L total = L v + L w is then evaluatedfor all samples b ∈ B , where B is one batch. We train our Multihead Attention model for ClaimIdentification, MA-CIN, on datasets mentioned inSection 3. The CDC dataset contains total of 522articles. Amongst these, there are 47 articles with8 or more annotated claim sentences which areconsidered as evaluation set (CDC Eval) for thisdataset. Next, for DNF-300 and DNF-700, weasked two annotators to manually tag at least 5sentences as “claim-worthy” in each of the 50 ar-ticles. Sentences which were consented by both ataset Configuration CDC Eval DNF EvalPrec. Rec. F1 Prec. Rec. F1
Spacy
Baseline 0.09 0.14 0.11 0.33 0.42 0.37
CDC
Baseline (Levy et al., 2014) 0.23 - - - - -MA-CIN(HV) 0.18 0.08 0.11 0.39 0.53 0.45MA-CIN(OHWV) 0.25 0.10 0.15 0.40 0.54 0.46MA-CIN(Combined)
DNF − MA-CIN(HV) 0.20 0.09 0.12 0.37 0.54 0.44MA-CIN(OHWV) 0.19 0.08 0.11 0.40 0.5 0.44MA-CIN(Combined)
DNF − MA-CIN(HV) 0.19 0.08 0.11 0.39 0.53 0.48MA-CIN(OHWV)
Table 3: Comparison of MA-CIN model configurations over three datasets and two evaluation sets for identifica-tion of “claim-worthy” sentences.
Headline: Clinton Received Debate Questions Week Before Debate
Figure 3: Interpretation of relevance of sentences with the headline of an example article from DNF-300. GT andPD indicate ground truth and top-5 predicted “claim-worthy” sentences, respectively. MA-CIN model was able topredict 3 most relevant sentences correctly. Last column shows the attention weights between headline and eachof the sentences of the article. Sentence 2 has been correctly predicted as the most relevant while sentence 1 is theleast relevant. the annotators as “claim-worthy” were finalized asground truth claims for these 50 articles, and usedas testing set for evaluating the model performanceon DNF datasets. The remaining 475 articles fromCDC, 250 articles from DNF-300, and 650 arti-cles from DNF-700 were split into 5 folds to trainthe model using a 5-Fold cross validation (Kohavi,1995), where we use 4 folds for training and 1 foldfor validation. Each of the three settings, describedin Section- 4: MA-CIN(HV), MA-CIN(OHWV)and MA-CIN(Combined), was trained with eachof the three datasets, and evaluated on DNF Evaland CDC Eval. Total number of parameters forthese three settings are 15,012,916 (10,240 non-trainable), 40,975,656 (10,240 non-trainable) and41,812,564 (12,288 non-trainable) respectively. Allother network parameters are displayed in supple- mental material.In each setting, we use batch normalization,ReLU non-linearity as an activation function, anda dropout of 0.5 for every convolution operation.We trained all the models for 2000 epochs, where,for every training we used Adam optimizer witha learning rate lr = 0 . , β = 0 . and β = 0 . . There was no weight decay set as themodel was trained in a self-supervised setting withfinite epochs and an already small learning rate.Glove 300D word embedding was used for all ourexperiments and the number of input sentences wasset to 500. The models were trained on three 11GiBNvidia 1080Ti GPUs in parallel. We evaluate MA-CIN models on two evalua-tion sets, DNF Eval and CDC Eval. With self-upervised setting we first rank the sentences basedon relevance with the headline and then extractthe top five sentences along with their sentenceids as “claim-worthy” sentences. For evaluationon DNF Eval, we calculate the true positives (TP), false positives (FP) and false negatives (FN) fromground truth claim ids. To evaluate on CDC Eval,since we do not have ground truth claim ids, wecalculate cosine similarity between the extractedsentences and the ground truth claims. We experi-ment with various similarity threshold to calculateTP, FP and FN, and set the final threshold to 0.95to report best performing results. Finally, thesemetrics are used to report Precision@1, Recall@1and F-1 scores.
Table-3 shows the performance of baseline (CDC)(Levy et al., 2014) and three variants of MA-CINmodels. We report two baselines - (1) spacy, and(2) Levy et al. (2014) using supervised learningmethod on CDC dataset which contains annotatedclaims. Since, Levy et al. (2014) do not reportRecall and F1 scores, we have reported their Preci-sion@1 score in this paper. We also train MA-CINmodels on this dataset by removing all the anno-tations for self-supervised training. We observethat:1. The combined variant of our self-supervisedapproach performs slightly better than thebaseline on the CDC dataset. This showsthat, MA-CIN models are able to learn simi-lar properties as the baseline but without anysentence-level annotations. Thus, this elimi-nates the need to have an annotated dataset forclaim identification.2. MA-CIN models give comparable results onall three datasets. This shows the scalabilityof the models to identify “claim-worthy” sen-tences from any given article.3. The combined variant of MA-CIN, which gen-erates both the headline vector and the wordin headline, performs better on all the datasets,except one: MA-CIN (OHWV) model trainedon DNF-300 and evaluated on CDC Eval per-forms slightly better than the combined model,however, the difference in the performance isvery small.
Figure 4: Interpretation of sentence-to-sentence rele-vance through attention weights.
Attention weights help make the model inter-pretable to the end users by depicting relationshipbetween all sentences as well as with the headline.From Figure-3, we can see that out of the top-5predicted claims, 3 of them are present in the hu-man evaluated test set. The last column, whichcontains attention coefficients between the head-line and each sentence, depicts some interestingresults -(i) based on the human evaluation, the sentencehaving the least relevance with the headline is sen-tence 1. While this sentence contains words alsopresent in the headline, the underlying meaning isnot the same. This has been successfully capturedby MA-CIN model by predicting sentence 1 as theleast relevant claim;(ii) further, highly ranked sentences 2, 12, and 13have been correctly predicted as relevant claims bythe model. This shows the model’s ability in learn-ing the semantic relationship between the headlineand the content of the article, and subsequentlyputting importance on sentences that are relevantto the headline’s underlying meaning. This prop-erty, which is also called “ aboutness ”, is efficientlyexhibited by the model.(iii) sentence 3, which is predicted by MA-CINmodel as relevant with a score of 0.73, is notpresent in the ground truth. This indicates that thetwo annotators did not agree to have this sentenceverified, even if it is relevant to the point of therticle. To analyze this further, we plan to conductuser studies as one of the future avenues.(iv) sentence 4 is also predicted as a relevantclaim but it’s missing from the ground truth sincethe annotators did not agree to have this verified.The reason for this prediction could be because self-attention is able to identify the premise of highlyrelevant sentences. Hence, sentence 4, which is thecontinuation of highly relevant sentence 3, is alsogiven importance by the headline. This relevancebetween sentences 3 and 4 is depicted in Figure-4,where the attention weight between these two is thehighest.From Figure-4, we also observe that:(i) sentence 4 is highly relevant to sentences 3 to8, which is intuitive, since the story of the internforms the premise of the claims in the article;(ii) sentences 2 and 4 have been shown to havethe least relevance with each other which is alsotrue as shown in Figure-3. The two sentences, ifconsidered in isolation, make two different claimswhich are not related to each other;(iii) the model has made sure that a sentencedoes not assign high relevance to itself as it wouldbe counter-intuitive.
Since, the evaluation methodologies for CDCdataset has not been explained clearly, in our pa-per, we have considered vector cosine similaritybetween the ground truth claim in the CDC Evaland the extracted claim from the model which mayleave a margin of error in the evaluation scores.Additionally, ground truth in DNF Eval is manu-ally generated and may contain subjective biases.Although these biases have been overcome by MA-CIN models, as explained in 6.1, but we also plan toenhance the ground truth judgement using crowd-sourced annotation. We intend to use these annota-tions to fine-tune the models.
In this work, we build a novel, self-supervised ap-proach to identify “claim-worthy” sentences - animportant task for automated fact checking. The fo-cus of this work is on fake news articles where thereis a deliberate intent to influence people or causesocial unrest. We have introduced novel datasets ofsuch articles with features essential for the down-stream task of fake news verification. Using pow-erful attention models, we explore the notion of aboutness of the headline and the content of thearticle to identify “claim-worthy” sentences. Ex-periments with three datasets show the strengthof our model architecture in overcoming human-induced biases, which is quite common when usingsentence-level claim-annotated datasets. Based onthe comparison with the baseline, which was imple-mented using annotated dataset, we show that ourmodels do not require annotated claims for trainingto identify claim-worthy sentences efficiently. Wehave also showed that our model is scalable to anydataset with topic and content.Future work involves increasing the robustnessof the models presented in this paper. We planto use crowdsourced annotation on the dataset re-leased with this paper to measure the influence ofthe article on general readers and then use these in-dicators to fine-tune our models. Experimentationwith more robust sentence encoders is another av-enue of future work. Additionally, going forward,we plan to identify a maximum of 3 claims perarticle which will be used for evidence-based fakenews detection. We also plan to expand the dataset,presented in this work, to include fake news articleson topics other than Politics and Health.
References
Ehud Aharoni, Anatoly Polnarov, Tamar Lavee, DanielHershcovich, Ran Levy, Ruty Rinott, Dan Gut-freund, and Noam Slonim. 2014. A BenchmarkDataset for Automatic Detection of Claims and Ev-idence in the Context of Controversial Topics. In
Proceedings of the First Workshop on Argumenta-tion Mining , pages 64–68, Baltimore, Maryland. As-sociation for Computational Linguistics.Hunt Allcott and Matthew Gentzkow. 2017. Social me-dia and fake news in the 2016 election.
Journal ofeconomic perspectives , 31(2):211–36.Leif Azzopardi, Gabriella Kazai, Stephen Robertson,Stefan R¨uger, Milad Shokouhi, Dawei Song, andEmine Yilmaz. 2009.
Advances in Information Re-trieval Theory: Second International Conference onthe Theory of Information Retrieval, ICTIR 2009Cambridge, UK, September 10-12, 2009 Proceed-ings , volume 5766. Springer.PD Bruza and Theo WC Huibers. 1996. A study ofaboutness in information retrieval.
Artificial Intelli-gence Review , 10(5-6):381–407.Carlos Castillo, Marcelo Mendoza, and BarbaraPoblete. 2011. Information credibility on twitter. In
Proceedings of the 20th international conference onWorld wide web , pages 675–684. ACM. de Cock Buning. 2018.
A multi-dimensional ap-proach to disinformation: Report of the independenthigh level group on fake news and online disinforma-tion . Publications Office of the European Union.Judith Eckle-Kohler, Roland Kluge, and IrynaGurevych. 2015. On the Role of Discourse Mark-ers for Discriminating Claims and Premises in Ar-gumentative Discourse. In
Proceedings of the 2015Conference on Empirical Methods in Natural Lan-guage Processing , pages 2236–2242, Lisbon, Portu-gal. Association for Computational Linguistics.Pepa Gencheva, Preslav Nakov, Llu´ıs M`arquez, Al-berto Barr´on-Cede˜no, and Ivan Koychev. 2017.A context-aware approach for detecting worth-checking claims in political debates. In
Proceedingsof the International Conference Recent Advances inNatural Language Processing, RANLP 2017 , pages267–276, Varna, Bulgaria. INCOMA Ltd.D Graves. 2018. Understanding the promise and limitsof automated fact-checking.
Reuters Institute for theStudy of Journalism .Naeemul Hassan, Fatma Arslan, Chengkai Li, andMark Tremayne. 2017. Toward automated fact-checking: Detecting check-worthy factual claimsby claimbuster. In
Proceedings of the 23rd ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining , pages 1803–1812.ACM.Sepp Hochreiter and J¨urgen Schmidhuber. 1997.Long short-term memory.
Neural Computation ,9(8):1735–1780.Mercedes Jaime Sis´o and MERCEDES. 2009. Titlesor headlines? anticipating conclusions in biomedi-cal research article titles as a persuasive journalisticstrategy to attract busy readers.
Miscel´anea: A Jour-nal of English and American Studies , 39:29–51.Israa Jaradat, Pepa Gencheva, Alberto Barr´on-Cede˜no,Llu´ıs M`arquez, and Preslav Nakov. 2018. Claim-Rank: Detecting check-worthy claims in Arabicand English. In
Proceedings of the 2018 Confer-ence of the North American Chapter of the Associa-tion for Computational Linguistics: Demonstrations ,pages 26–30, New Orleans, Louisiana. Associationfor Computational Linguistics.Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo.2016. News verification by exploiting conflictingsocial viewpoints in microblogs. In
Thirtieth AAAIConference on Artificial Intelligence .Ron Kohavi. 1995. A study of cross-validation andbootstrap for accuracy estimation and model selec-tion. In
Proceedings of the 14th International JointConference on Artificial Intelligence - Volume 2 ,IJCAI’95, pages 1137–1143, San Francisco, CA,USA. Morgan Kaufmann Publishers Inc. Lev Konstantinovskiy, Oliver Price, Mevan Babakar,and Arkaitz Zubiaga. 2018. Towards automatedfactchecking: Developing an annotation schema andbenchmark for consistent automated claim detection. arXiv preprint arXiv:1809.08193 .Jeffrey Kuiken, Anne Schuth, Martijn Spitters, andMaarten Marx. 2017. Effective headlines of news-paper articles in a digital environment.
Digital Jour-nalism , 5(10):1300–1314.Sejeong Kwon, Meeyoung Cha, Kyomin Jung, WeiChen, and Yajun Wang. 2013. Prominent features ofrumor propagation in online social media. In , pages 1103–1108. IEEE.Yann LeCun, L´eon Bottou, Yoshua Bengio, and PatrickHaffner. 1998. Gradient-based learning applied todocument recognition.
Proceedings of the IEEE ,86(11):2278–2323.Ran Levy, Yonatan Bilu, Daniel Hershcovich, EhudAharoni, and Noam Slonim. 2014. Context Depen-dent Claim Detection. In
Proceedings of COLING2014, the 25th International Conference on Compu-tational Linguistics: Technical Papers , pages 1489–1500, Dublin, Ireland. Dublin City University andAssociation for Computational Linguistics.Ran Levy, Shai Gretz, Benjamin Sznajder, Shay Hum-mel, Ranit Aharonov, and Noam Slonim. 2017. Un-supervised corpus-wide claim detection. In
ArgMin-ing@EMNLP .Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, andKam-Fai Wong. 2015. Detect rumors using time se-ries of social context information on microbloggingwebsites. In
Proceedings of the 24th ACM Inter-national on Conference on Information and Knowl-edge Management , CIKM ’15, pages 1751–1754,New York, NY, USA. ACM.Jeppe Norregaard, Benjamin D. Horne, and Sibel Adali.2019. NELA-GT-2018: A large multi-labelled newsdataset for the study of misinformation in news arti-cles.
CoRR , abs/1904.01546.Archita Pathak and Rohini Srihari. 2019. BREAK-ING! presenting fake news corpus for automatedfact checking. In
Proceedings of the 57th AnnualMeeting of the Association for Computational Lin-guistics: Student Research Workshop , pages 357–362, Florence, Italy. Association for ComputationalLinguistics.Ver´onica P´erez-Rosas, Bennett Kleinberg, AlexandraLefevre, and Rada Mihalcea. 2018. Automatic de-tection of fake news. In
Proceedings of the 27thInternational Conference on Computational Linguis-tics , pages 3391–3401, Santa Fe, New Mexico, USA.Association for Computational Linguistics.Martin Potthast, Johannes Kiesel, Kevin Reinartz,Janek Bevendorff, and Benno Stein. 2017. A sty-lometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638 .annah Rashkin, Eunsol Choi, Jin Yea Jang, SvitlanaVolkova, and Yejin Choi. 2017. Truth of varyingshades: Analyzing language in fake news and polit-ical fact-checking. In
Proceedings of the 2017 Con-ference on Empirical Methods in Natural LanguageProcessing , pages 2931–2937.Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017.Csi: A hybrid deep model for fake news detection.In
Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management , CIKM’17, pages 797–806, New York, NY, USA. ACM.Sneha Singhania, Nigel Fernandez, and Shrisha Rao.2017. 3han: A deep neural network for fake newsdetection. In
International Conference on Neural In-formation Processing , pages 572–581. Springer.Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention IsAll You Need. arXiv:1706.03762 [cs] . ArXiv:1706.03762.Karin Wahl-Jorgensen and Thomas Hanitzsch. 2009.
The handbook of journalism studies . Routledge.William Yang Wang. 2017. ” liar, liar pants on fire”:A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 .Liang Wu and Huan Liu. 2018. Tracing fake-newsfootprints: Characterizing social media messages byhow they propagate. In
Proceedings of the EleventhACM International Conference on Web Search andData Mining , WSDM ’18, pages 637–645, NewYork, NY, USA. ACM.Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012.Automatic detection of rumor on sina weibo. In
Pro-ceedings of the ACM SIGKDD Workshop on MiningData Semantics , page 13. ACM.Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Au-gustus Odena. 2018. Self-attention generative adver-sarial networks.
A Appendices
A.1 DefinitionsFake News:
Articles where there is a deliberateintent to influence readers through fabricated ormanipulated claims in the headline and the content.Such articles have a compelling writing style simi-lar to the mainstream media. “Claim-worthy”:
Statements which are importantto the point of the article but one would require tohave them verified.
Compelling Fake News Articles:
Articles whichmake persuasive claims in headline and content,that may influence readers to believe a fabri-cated/manipulated story.
Credible Sources : Mainstream media, establishedfact-checking websites and Government docu-ments.
Questionable Sources : Non-mainstream medialike infowars, naturalnews, breitbart etc.
A.2 Experiment ArchitecturesA.2.1 Vector Generation
Architecture setting for generating Headline Vector(HV) displayed in Figure-5 input Model Output size [Kernel size, Filters, Strides], Repeats sentence vectors 500x300conv1d_1 500 x 256 [4, 256, 1] x 1conv1d_2 500 x 512 [4, 256, 1] x 1SelfAttention 500 x 512 [1, 64, 1] x 4Concat 500 x 2048conv1d_3 500 x 512 [4, 256, 1] x 1headline vector 1 x 300conv1d_4 1 x 512 [1, 512, 1] x 1CrossAttention 500 x 512 [1, 64, 1] x 4Concat 500 x 2048conv1d_5 250 x 512 [4, 512, 2] x 1conv1d_6 125 x 512 [4, 512, 2] x 1conv1d_7 63 x 512 [4, 512, 2] x 1conv1d_8 32 x 512 [4, 512, 2] x 1Global Pooling 512FC_1 1024output_vector 300
Figure 5: Architecture setting for generating HeadlineVector(HV).
A.2.2 Word Generation
Architecture setting for generating Headline VectorWord Probabilities (OHWV) displayed in Figure-6
A.3 DNF-700 Dataset Details
Each article is identified by an id . The content ofthe article is stored in a separate text files havingfile name “article id ”, for example, article 122. AJSON file is also provided with the following fields: id: Unique identifier of the article startingfrom 0. authors:
Authors of the article. headline:
Headline of the article. type: “fake” (articles from Stanford and Buzzfeeddatasets which are already proven fake); and“questionable” (articles from
Getting Real AboutFake News
Kaggle dataset which require manualverification of the degree of veracity) urls:
Source/domain URL of the article. nput Model Output size [Kernel size, Filters, Strides], Repeats sentence vectors 500x300conv1d_1 500 x 256 [4, 256, 1] x 1conv1d_2 500 x 512 [4, 256, 1] x 1SelfAttention 500 x 512 [1, 64, 1] x 4Concat 500 x 2048conv1d_3 500 x 512 [4, 256, 1] x 1headline vector 1 x 300conv1d_4 1 x 512 [1, 512, 1] x 1CrossAttention 500 x 512 [1, 64, 1] x 4Concat 500 x 2048conv1d_5 250 x 512 [4, 512, 2] x 1conv1d_6 125 x 512 [4, 512, 2] x 1conv1d_7 63 x 512 [4, 512, 2] x 1conv1d_8 32 x 512 [4, 512, 2] x 1Global Pooling 512Repeat 50 x 512Bi-LSTM 50 x 1024TimeDistributed Dense, softmax 50 x 20000
Figure 6: Architecture setting for generating HeadlineVector Word Probabilities (OHWV).
A.4 DNF-300 Dataset Details
DNF-300 is more sophisticated subset of DNF-700with additional fields based on manual verificationof the article. The JSON file of this datasetcontains following fields: id:
Unique identifier of the article startingfrom 0. authors:
Authors of the article. headline:
Headline of the article. type: { (0) False; (1) Partial Truth; (2) OpinionsStated As Fact; (3) True } urls: Source/domain URL of the article. evidence:
URL of credible sources supporting orrefuting the article. This field is empty when noevidence were found which talked about the claimsmade in this article. This means, the claims areinnovated lies. In such cases, the type field is set as0. reason:
Reason about the verdict. It can be one ofthe following:1. Based on Snopes rating ’False’ which means’the primary elements of a claim are demon-strably false.’2. Based on Snopes rating ’Unproven’ whichmeans ’insufficient evidence exists to estab-lish the given claim as true, but the claim can-not be definitively proved false.’3. Based on Snopes rating ’Mixture’ whichmeans ’a claim has significant elements ofboth truth and falsity to it such that it couldnot fairly be described by any other rating.’4. Based on Snopes rating ’Mostly False’ which means ’the primary elements of a claim aredemonstrably false, but some of the ancillarydetails surrounding the claim may be accu-rate.’5. The key claim is false (based on Snopes rat-ing), however, the article also contains opin-ions stated as fact.6. Snopes mentiones that a true story was manip-ulated to mislead people.7. The key claims are true but exaggerated byadding personal opinions stated as fact.8. No reports from trusted sources for the keyclaims.9. True story manipulated to mislead read-ers by making unverifiable claims such as ‘some claim’ .10. Article is fraught with opinions stated as factabout a true event.11. Found evidence to refute key claims.12. Article contains opinions stated as fact.13. Evidence found to support key claims.
Figure 7: : Example for CDCs and for statements thatshould not be considered as CDCs. The V and X indi-cate if the candidate is a CDC for the given Topic, ornot, respectively.
A.5 Examples
We present examples for all 4 label types { False ; Partial Truth ; Opinion stated as fact ; True } presentin our dataset: DNF-300. Please refer Table-4,5,6,7.An annotated example from CDC dataset is dis-played in Figure-7 eadline : Clinton Received Debate Questions Week Before DebateThe first presidential debate was held and Hillary Clinton was proclaimed the winner by the media.Indeed Clinton was able to turn in a strong debate performance, but did she do so fairly? Multiplereports and leaked information from inside the Clinton camp claim that the Clinton campaign wasgiven the entire set of debate questions an entire week before the actual debate. Earlier last week anNBC intern was seen hand delivering a package to Clinton’s campaign headquarters, according tosources. The package was not given to secretarial staff, as would normally happen, but the internwas instead ushered into the personal office of Clinton campaign manager Robert Mook. Membersof the Clinton press corps from several media organizations were in attendance at the time, and areporter from Fox News recognized the intern, but said he was initially confused because the NBCintern was dressed like a Fed Ex employee. The reporter from Fox questioned campaign staff aboutthe intern, but campaign staff at first claimed ignorance and then claimed that it was just a Fed Exemployee who had already left. No reporters present who had seen the intern dressed as a Fed Exemployee go into Mook’s office saw him leave by the same front entrance. The Fox reporter whorecognized the intern also immediately looked outside of the campaign headquarters and noted thatthere were no Fed Ex vehicles parked outside. Clinton seemed to have scripted responses readyfor every question she was asked at the first debate. She had facts and numbers memorized forspecific questions that it is very doubtful she would have had without being furnished the questionsbeforehand. The entire mainstream media has specifically been trying to portray Trump as a racistand a poor candidate. By furnishing Clinton with the debate questions NBC certainly hoped tomake Clinton appear much more knowledgeable and competent than Trump. And though it isunlikely that anyone will be able to conclusively prove that Clinton was given the debate questions,it seems both logical and likely. Type : 0 (
False ) Authors :Baltimore Gazette
URLs
Evidence
Reason : Based on Snopes rating ’False’ which means ’the primary elements of a claim aredemonstrably false.’
Table 4: An example on
False type from DNF-300 dataset. eadline : Allergens in Vaccines Are Causing Life-Threatening Food AllergiesIt would probably surprise few people to hear that food allergies are increasingly common in U.S.children and around the world . According to one public health website , food allergies in childrenaged 0-17 in the U.S. increased by 50% from 1997 to 2011. Although food allergies are now sowidespread as to have become almost normalized, it is important to realize that millions of Americanchildren and adults suffer from severe rapid-onset allergic reactions that can be life-threatening.Foods represent the most common cause of anaphylaxis among children and adolescents. TheUnited Kingdom has witnessed a 700% increase in hospital admissions for anaphylaxis and a500% increase in admissions for food allergy since 1990. The question that few are asking is whylife-threatening food allergies have become so alarmingly pervasive. A 2015 open access casereport by Vinu Arumugham in the Journal of Developing Drugs , entitled “ Evidence that FoodProteins in Vaccines Cause the Development of Food Allergies and Its Implications for VaccinePolicy ,” persuasively argues that allergens in vaccines—and specifically food proteins—may bethe elephant in the room. As Arumugham points out, scientists have known for over 100 years thatinjecting proteins into humans or animals causes immune system sensitization to those proteins.And, since the 1940s, researchers have confirmed that food proteins in vaccines can induce allergyin vaccine recipients. Arumugham is not the first to bring the vaccine-allergy link to the public’sattention. Heather Fraser makes a powerful case for the role of vaccines in precipitating peanutallergies in her 2011 book, The Peanut Allergy Epidemic: What’s Causing It and How to Stop It.
Type : 1 (
Partial Truth ) Authors :Admin - Orissa
URLs : galacticconnection.com
Evidence
Reason : The key claim is written in such a way so that it misleads people in thinking all the foodrelated allergies in US are caused by vaccines. Found evidence which says these type of allergiesare rare.
Table 5: An example on
Partial Truth type from DNF-300 dataset. eadline : George Soros: Trump Will Win Popular Vote by a Landslide but Clinton Victory a’Done Deal’In recent weeks, Democrats have attempted to paint Republican presidential nominee Donald J.Trump as a lunatic for claiming that the election is going to be rigged in favor of his Democraticrival, Hillary Clinton. Even Republican politicians and former politicians are telling Trump toknock off such talk. But, as usual, Trump’s shrewdness and defiance of standard political decorum –in which the “opposition” party merely rolls over and surrenders in the face of Democratic pressure– is winning the day. None other than billionaire investor and longtime Democratic supporterGeorge Soros has said that the fix is literally in for the election, in favor of Clinton – no matterhow much of the popular vote, and from which battleground states, Trump captures. As reportedby Top Right News and other outlets, during a recent interview with Bloomberg News, Soros – aDemocrat mega-donor – openly admitted that Trump will win the popular vote in a “landslide.”However, he said that none of that would matter, because a President Hillary Clinton is already a“done deal.” In the interview, which is now going viral, Soros says with certainty that Trump willtake the popular vote, despite what the polls say now (which are completely rigged to oversampleDemocrats), but not the Electoral College, which will go to Clinton. When the reporter asks if thatis already a “done deal” – that Clinton will be our next president no matter what – Soros says “yes,”and nods his head. Is Soros just making a prediction out of overconfidence? Or does he truly knowsomething most of us don’t know?
Type : 2 (
Opinion Stated As Fact ) Authors :J. D. Heyes
URLs
Evidence
Reason : The key claim is false (based on Snopes rating), however, the article also contains opinionsstated as fact.
Table 6: An example on
Opinion stated as fact type from DNF-300 dataset.
Headline : Donald Trump: Minnesota Has ‘Suffered Enough’ Accepting RefugeesIn a pitch to suspend the nation’s Syrian refugee program , Donald Trump said Minnesotans have“suffered enough” from accepting Somali immigrants into their state. “Here in Minnesota you haveseen first hand the problems caused with faulty refugee vetting, with large numbers of Somalirefugees coming into your state, without your knowledge, without your support or approval,” Trumpsaid at a Minneapolis rally Sunday afternoon. He said his administration would suspend the Syrianrefugee program and not resettle refugees anywhere in the United States without support from thecommunities, while Hillary Clinton’s “plan will import generations of terrorism, extremism andradicalism into your schools and throughout your communities.”
Type : 3 (
True ) Authors :Henry Wolff
URLs : amren.com
Evidence
Reason : Evidence found to support key claims.
Table 7: An example on