[PDF] REMOD: Relation Extraction for Modeling Online Discourse

Abstract

The enormous amount of discourse taking place online poses challenges to the functioning of a civil and informed public sphere. Efforts to standardize online discourse data, such as ClaimReview, are making available a wealth of new data about potentially inaccurate claims, reviewed by third-party fact-checkers. These data could help shed light on the nature of online discourse, the role of political elites in amplifying it, and its implications for the integrity of the online information ecosystem. Unfortunately, the semi-structured nature of much of this data presents significant challenges when it comes to modeling and reasoning about online discourse. A key challenge is relation extraction, which is the task of determining the semantic relationships between named entities in a claim. Here we develop a novel supervised learning method for relation extraction that combines graph embedding techniques with path traversal on semantic dependency graphs. Our approach is based on the intuitive observation that knowledge of the entities along the path between the subject and object of a triple (e.g. Washington,_D.C.}, and United_States_of_America) provides useful information that can be leveraged for extracting its semantic relation (i.e. capitalOf). As an example of a potential application of this technique for modeling online discourse, we show that our method can be integrated into a pipeline to reason about potential misinformation claims.

Full PDF

RREMOD: Relation Extraction for Modeling Online Discourse

Matthew Sumpter [email protected] of South Florida

Giovanni Luca Ciampaglia [email protected] of South Florida

ABSTRACT

The enormous amount of discourse taking place online poses chal-lenges to the functioning of a civil and informed public sphere.Efforts to standardize online discourse data, such as ClaimReview,are making available a wealth of new data about potentially inac-curate claims, reviewed by third-party fact-checkers. These datacould help shed light on the nature of online discourse, the roleof political elites in amplifying it, and its implications for the in-tegrity of the online information ecosystem. Unfortunately, thesemi-structured nature of much of this data presents significantchallenges when it comes to modeling and reasoning about on-line discourse. A key challenge is relation extraction, which is thetask of determining the semantic relationships between namedentities in a claim. Here we develop a novel supervised learningmethod for relation extraction that combines graph embeddingtechniques with path traversal on semantic dependency graphs.Our approach is based on the intuitive observation that knowledgeof the entities along the path between the subject and object of atriple (e.g.

Washington,_D.C. , and

United_States_of_America )provides useful information that can be leveraged for extractingits semantic relation (i.e. capitalOf ). As an example of a potentialapplication of this technique for modeling online discourse, weshow that our method can be integrated into a pipeline to reasonabout potential misinformation claims.

CCS CONCEPTS • Information systems → Web mining ; Semantic web descriptionlanguages ; Information extraction . KEYWORDS relation extraction, semi-structured data, semantic ontology, claimmatching, fact-checking

The prevalence of false and inaccurate information in its myriadof forms — a persistent and dangerous societal problem — is still apoorly understood phenomenon [1, 7, 30], especially in the contextof political communication [21]. Even though strong exposure toso-called “fake news” is limited to the segment of most active newsconsumers [19], individual claims echoing the false or misleadingcontent shared by these audiences can spread rapidly through socialmedia [57, 68], amplified by bots [46] or other malicious actors [60],who often target elites, like celebrities, pundits, or politicians. Fromthere, false claims rebroadcast by these elites enjoy further dissem-ination, reaching even wider audiences.Misinformation has become an emerging focus of computationalsocial scientists seeking to understand and combat it [10, 56]. Net-work analysis and natural language processing (NLP) provide in-sight into the community organization and stylistic patterns that fred:receive_1fred:degree_1 dbpedia:Tej_Pratap_Yadavquant:afred:Takshsila_universitydbpedia:Biharvn.data:Receive_13052000dul:Eventdbpedia:Doctoratedul:Quality fred:Degree schemaorg:Placevn.role:Agentquant:hasDeterminervn.role:Themefred:from fred:locatedInrdf:typerdf:typerdfs:subClassOfrdf:type rdfs:subClassOfrdfs:subClassOf

TARGET SOURCE

Figure 1: Schematic example of our approach. The RDFgraphlet generated by a machine-reading tool (FRED) forthe claim “

Tej Pratap Yadav receives a doctorate degree fromTakshsila University in Bihar ” (a known misinformationclaim [26]). The shortest undirected path between the source( dbpedia:Tej_Pratap_Yadav ) and target ( dbpedia:Doctorate )is shown in red. The nodes along the path are highlighted ingray. are indicative of misinformation, respectively, however they oftenfail to engage with the ideological content being shared. Onlinediscourse typically takes the form of unorganized and unstructureddata which is a significant limiting factor to performing contentanalysis. Existing work on semantic ontologies and knowledgebase development has proved to be a guiding method in structuringonline information. A knowledge base most commonly structuresknowledge in the shape of semantic triples; a semantic triple iscomposed of two entities (e.g. a person, place, or thing) and a pred-icate relation between them. An example of a semantic triple is .This structure allows for concepts to be reduced to machine-readabledata which can be compiled into traversable (and understandable)networks of information. The result is a data structure that can beused to provide quantitative analysis of online discourse.An example of knowledge bases application in combating misin-formation regards computational fact-checking. Fact-checking isrecognized as an antidote to misinformation [32], especially withrespects to claims spread by political elites. For example, Nyhan andReifler [36] show that alerting politicians to the risk of being fact-checked leads to less inaccuracy and better ratings. Unfortunately,fact-checking claims at the scale of the web is a hard task. A fact-checker must first identify claims that are worthy of being checked,then they must research the claim [6, 51], and finally write, publish,and circulate their conclusion on the web. In general, there is a lagof approximately 15 hours between the consumption of misinforma-tion and the appearance of corrections [45]. The time investment a r X i v : . [ c s . S I] F e b equired of human fact-checkers leads to an open opportunity forthe development of many automated fact-checking [11, 62] or verifi-cation [34] strategies. One approach is based on identifying missingrelations in structured knowledge bases [11, 29, 47, 48]. This ap-proach takes a claim in the form of a semantic triple and checksits validity against the sets of triples in the knowledge base thatconnect the subject and object. When the knowledge base is viewedas a network, this task is equivalent to link prediction [33].This approach has proven very promising, but its main restrictionis that of its input. Modeling a claim using semantic triples is anontrivial task, and has limited the application of such an approach.It requires choosing a semantic ontology (or developing a new one)which is able to model claims in a consistent and non-redundantmanner. Once an ontology has been established, the next step isrelation extraction — the task of reducing a text into a semantictriple that both captures the meaning and fits within the ontology.This task is plenty challenging when addressing a compound factualclaim with many subjects and relations; this challenge is amplifiedwhen considering a claim that may contain sarcasm, opinion, humor,or any other nuance of language that can be present in onlinediscourse.In this paper, we present a novel relation extraction methodbuilt upon semantic dependency trees, see Figure 1 for a schematicexample. Our approach to the problem is based on the intuitionthat knowledge of the nodes and relations along the path betweenthe subject and object of a triple (e.g. Washington,_D.C. , and

United_States_of_America ) provides useful information that canbe leveraged for extracting its relation (i.e. capitalOf ). This well-established phenomenon was first observed by Richards and Mooney[41]. Later, Bunescu and Mooney [9] used it in the context of akernel-based approach. Here, we take advantage of recent advancesin graph representation learning to overcome the above challengesposed by online discourse in applying such an approach. Specif-ically, we parse a large corpus of Wikipedia snippets, annotatedwith information about one of 5 relations from the DBPedia ontol-ogy, combine the resulting dependency trees into a larger semanticnetwork, and finally use node embedding techniques to obtain ahigh-dimensional representation of this corpus-level network. Wefind that graph traversal in this learned representation provides astrong signal to discriminate between multiple possible relations.This approach allowed us to effectively extract these relations innatural language (extraction accuracy measured as the area underthe ROC curve,

AUC = . ). We then tested this model’s abilityto generalize to a set of real-world claims (reviewed by professionalfact-checkers and annotated using the ClaimReview [22] schema),obtaining again a very good signal (extraction AUC = . ).As an example of a potential application of this technique, weshow that, thanks to our method, a wider range of online discoursesamples is amenable to analysis than before. In particular, we inte-grate our approach into a pipeline (see Figure 2) that uses off-the-shelf fact-checking algorithms to analyze a subset of ClaimReview-annotated online discourse samples. Using this pipeline, we obtainvery encouraging results on two separate tasks: First, on samplesof ‘simple’ online discourse claims, which can be effectively sum-marized (and thus fact-checked) by extracting a single RDF triple,we outperform a claim-matching baseline based on state-of-the-artrepresentation learning (verification AUC = . ). Second, on GREC andClaimReviewCorpora

Snippet FRED RDFGraph Relational RDFCorpus (1 FRED RDF /Snippet)

Relational Corpus Graph

RDFGraphs

Node2Vec

Node2VecModel

Path Traversaland ShortestPath VectorGeneration

Retrieve Subject and Object Graph corresponding to Snippet N ode E m bedd i ng s Training/TestingData

RelationClassiﬁer

ClaimReview Claims

ClassifyClaimReviewRelations

Knowledge Linker

Figure 2: Schematic illustration of an integrated extractionand verification pipeline using our relation extraction toolREMOD. The white components correspond to the varioussteps needed to perform relation extraction. Numbered la-bels correspond to section headings in the manuscript. Toshow the potential for integration with external tools, as anadditional step in the pipeline the green node shows the useof an off-the-shelf fact-checking algorithm [11]. more complex claims, from which one can extract multiple relevantrelations, and therefore cannot be fact-checked directly, the fact-checker can still identify evidence in support or against the claimwith good accuracy (verification

AUC = . ).The rest of this paper is structured as follows: Section 2 detailsthe datasets used, as well as the methods used in the various stepsof the pipeline. Section 3 shows the results of both the relation clas-sification task and the fact checking tasks. Section 4 goes into detailon relevant prior work from the literature on relation classification,misinformation detection, and computational fact-checking. Finally,Section 5 discusses the impact and importance of our results, aswell as addresses methods that may be used to improve upon thiswork in the future. Our relation extraction pipeline is described in Figure 2. Roughlyspeaking, the main task of our pipeline is a supervised relationextraction task (white nodes), but since later we show how thistask can be integrated to perform an additional unsupervised fact-checking, in the figure we show also this final step (green node).Collectively these two tasks leverage a number of different datasources, so we start by describing the various datasets used inbuilding the pipeline. We then describe the various components ofthe pipeline proper.

For the main relation extraction task, we use two main corpora, bothcompiled by Google: the Google Relation Extraction Corpus (GREC)and the Google Fact Check Explorer corpus, described below.

The dataset of re-lations used was the Google Relation Extraction Corpus (GREC) [37]. able 1: Number of snippets per relation before and afterfiltering the GREC corpus. Total Retained % RetainedInstitution ,

628 19 ,

900 46 . Education ,

850 806 43 . Date of Birth ,

490 1 ,

010 40 . Place of Birth ,

566 4 ,

005 41 . Place of Death ,

042 1 ,

307 43 . This dataset contains text snippets extracted from Wikipedia arti-cles that represent a subject/object relation, which can be describedby the following defining questions:

Institution “What educational institution did the subject at-tend?”

Education “What degree did the subject receive?”

Date of Birth (DOB) “On what date was the subject born?”

Place of Birth (POB) “When was the subject born?”

Place of Death (POD) “Where did the subject die?”Each entry in the dataset consists of a natural language snippet oftext, the URL of the Wikipedia entry from which the text was pulled,the Freebase predicate, a Freebase ID for subject and object, andthe judgements of five human annotators on whether the snippetdoes or does not contain the relation (some annotators also votedto "skip", representing no decision either way). Freebase has beenreplaced with the Google Knowledge Graph since this dataset wasgenerated, which limited the use of this dataset in its original form.We made a set of addenda to the GREC to update it to be moremachine-ready for current relation extraction tasks and knowledgebases. The addenda include the following for each entry: text stringsfor both subject and object, DBpedia URI for both subject and object,Wikidata QID for both subject and object, a unique identifier, andthe majority annotator vote.The snippets varied considerably in length. The distribution ofword lengths can be found in Figure 3. Because we relied on a third-party API to parse the snippets, to reduce potential bias due tosnippet length, and to ensure only the most characteristic relationswere modeled, snippets were removed if they were not within ± . standard deviations of the mean snippet length (measured in words),per relation. Table 1 shows the number of snippets retained, perrelation. Researchers at DukeUniversity and Google have developed an annotation standardnamed ClaimReview [44] to help annotate structured fact-checkson the web. It allows fact-checkers to add structured markup totheir fact-checks with info that identifies distinct properties of aclaim (i.e. claim reviewed, the rating decision, the source, etc.).This semi-structured data allows fact-checks to be catalogued andqueried by search engines. The Google Fact Check Explorer tool collects all the ClaimReview fragments published by fact-checkingorganizations that meet a set of established guidelines , which are https://github.com/mjsumpter/google-relation-extraction-corpus-augmented https://toolbox.google.com/factcheck/explorer https://developers.google.com/search/docs/data-types/factcheck Table 2: The set of WordNet synonyms used to extract rele-vant claims from the ClaimReview database

WordNet synonyms per relationInstitution attend, university, college, graduate

Education graduate, degree

Date of Birth born, born on

Place of Birth born, birthplace, place of birth, place of origin

Place of Death deceased, died, perished, passed away, expired the same standards for accountability, transparency, and accuracyused by Google News to select publishers. We collected claims fromthe Google Fact Check Explorer tool up until 04/2020. From thiscorpus, we produced a dataset of 49,770 ClaimReview-annotatedclaims. Of the 20,817 English claims in the dataset, we searched forclaims that contained one of the relations represented in the GREC,using WordNet [14] synonyms to select search terms (see Table 2).This procedure yielded a subset of 28 claims that met this criteria(see Table 7).

The main contribution of this work is REMOD (which stands forRelation Extraction for Modeling of Online Discourse), a novel toolfor relation extraction that extract RDF triples from semi-structuredsamples of online discourse. To do so, our tool leverages an anno-tated corpus of past claims and relations. In the example pipelineshown in Figure 2, the various steps of REMOD correspond to thewhite nodes, which we describe in more detail below. (The figureis labeled with numbers corresponding to the following sectionnumbers, which elaborate on each step of the process.) To facilitatethe replication of our results, the source code of REMOD is freelyavailable online at https://github.com/mjsumpter/remod.

Our workflow begins with natural lan-guage snippets. To parse these snippets we used FRED, a machinereading tool based on Discourse Representation Theory and linguis-tic frames [17], described by the authors as “semantic middleware”.FRED is an NLP tool that combines frame detection, type induction,named-entity recognition, semantic parsing, and ontology align-ment, all into a single tool. The authors provide a RESTful API toaccess it. When provided with a text string as input, it returns a Re-source Description Framework (RDF) graphlet of the semantic parsetree of the input. (In practice, FRED produces DAGs instead of treesdue to entity linking to external ontologies, hence our referring tothem as ‘graphlets’.) An example of these RDF graphlets is shownin Figure 1 for the ClaimReview snippet of a known misinformationclaim [26].

In a realistic environment, manyclaims of different relations will exist in the same corpus. To mimicthis environment, we composed a single ‘corpus’ graph, whichwas composed of every FRED RDF graphlet generated from thecorpus snippets. For named entities, FRED defaults to generatingnodes for its own namespace (e.g. fred:Doctorate ), then if it findsthat the same entity is present in an existing ontology, it links tothat ontology (e.g. dbpedia:Doctorate ). Since these equivalent

100 200 300

Snippet Length N u m b e r o f S n i pp e t s Institution

Snippet LengthEducation

Snippet LengthDate of Birth

Snippet LengthPlace of Birth

Snippet LengthPlace of Death

Figure 3: Distribution of snippet lengths found in the GREC. The red solid line corresponds to the average snippet length (inwords) and the dashed lines to ± . 𝜎 of the average. Snippets were kept if they were within this interval.Figure 4: A visualization of how two separate RDF graphletswere stitched together along identical nodes. entities were redundant, we contracted the two nodes into a singlevertex, and use the URI from the linked ontology (i.e. DBpedia inthis example) as its new URI. The corpus graph was than created bystitching together all the contracted RDF graphlets: if two graphletsshare one or more nodes (i.e. two or more nodes have the sameURI), then we consider the union of the two graphlets, and contractany pair of such nodes into a single node. This new node is incidentto the union of all incident edges in the two original graphlets. Anexample of this is shown in Figure 4. The resulting corpus graphconsists of , nodes and , edges. The corpus graph is effectively a combinedsemantic parse tree of the selected snippets from the corpus. Tobetter exploit this structure in machine learning tasks, we generatednode embeddings using the Node2Vec algorithm [20]. Node2Vecgenerates sets of random walks for each node, which are thensubstituted in place of natural language sentences as input into theWord2Vec model. There are two important parameters which willinfluence the nature of the embeddings: the return parameter 𝑝 and the in–out parameter 𝑞 . For 𝑝 > there is a higher likelihoodof returning to a visited node in the random walks, whereas for 𝑞 > there is an increased likelihood of exploring unvisited nodes.We performed a grid search of 𝑝 and 𝑞 parameters (see §3.2), anddetermined the best choice for these parameters to be 𝑝 = and 𝑞 = ; this configuration captures what the authors of Node2Vec callthe ‘global’ topological structure of the graph. The other parametersof Node2Vec were chosen as follows: the dimension of the vector space was set to ; the number of walks to ; the walk lengthto ; and, finally, the context window to . Our approach is inspiredby the well-known idea that finding paths over structured knowl-edge representations can help learning new concepts [41]. Morerecently, Bunescu and Mooney [9] confirmed the intuitive con-clusion that the shortest path between entities in a dependencytree captures the significant information contained between them.Therefore, we sought to develop a classifier that could distinguishbetween the shortest paths of different semantic relationships. Todo so, for each snippet in the corpus, the subject and object wereretrieved, along with the original (i.e., non-stitched) RDF graphletof that specific snippet. The nodes corresponding to the subjectand object were identified in the RDF graphlet. With the terminalnodes identified, the shortest path in the original RDF graphlet wascalculated (Figure 1). Finally, we generated a final embedding bysumming along the path: 𝑛 𝑛 ∑︁ 𝑖 = (cid:174) 𝑣 𝑖 where 𝑣 , . . . , 𝑣 𝑛 is a path and (cid:174) 𝑣 ∈ R 𝑑 is the vector associated to 𝑣 ∈ 𝑉 . This resulted in a final vector representing the aggregatedsequence of nodes along the shortest path between subject andobject.This process resulted in a -dimensional vector for each snip-pet in the corpus. All results shown in the next section were ob-tained from these vectors. We projected the vectors into a lower-dimensional space using t-SNE. The visualization of these vectorsis shown in Figure 5, where each color corresponds to a differentrelation. The projection reveals a good separation of vectors basedon the relation they represent. We trained a number of classifierson the resulting set of shortest path vectors. The selected classifierswere Logistic Regression, 𝑘 -NN, SVM, Random Forest, DecisionTree, and a Wide Neural Net. Samples that were rated by the anno-tators to not contain a specified relation were removed, and thenthe dataset was balanced to the lowest frequency class (Education, 𝑁 = samples). Readers will note this is a decrease from the reported in Table 1; FRED was not always accurate (leadingto inaccurate terminal nodes) and occasionally returned samplesas corrupted RDF graphs which resulted in a small loss of data. To nstitutionEducationDate of BirthPlace of BirthPlace of DeathNone of the Above Figure 5: The shortest path vectors of GREC relations pro-jected into 2D using t-SNE. Each color represents a differentsemantic relation, with a sixth color to mark snippets forwhich a majority of annotators voted ‘No (relation)’. effectively compare different classifiers, training was done usinga 64%/16%/20% training/validation/testing split, rather than cross-validation. This resulted in a final training dataset of , samples(5 classes, 𝑁 ≈ samples/class), with a validation set of samples, and an additional samples held for testing. The selected ClaimReview claims were held as an additional test set,which is elaborated on in Section 2.2.6. To demonstrate the usefulness of our method,we show that REMOD can be integrated into a fact-checking pipelineusing existing, off-the-shelf tools to verify online discourse claimsannotated using the ClaimReview standard. To perform fact-checking,we rely on the work of Shiralkar et al. [48], who provide open-sourceimplementations of several fact-checking algorithms . These algo-rithms can be used to assess the truthfulness of a statement, butof course any tool that takes RDF triple in input could be usedas well. To extract relation from ClaimReview snippets, we usedthe deep neural network classifier, which was the most successfulclassifier from the prior step and fed the extracted triples into thefact-checker.Of course, when integrating two distinct tools one has to makesure that any error originating in the first tool does not affectthe performance of the second tool. Therefore, to avoid cascadingerrors we removed some claims from our dataset. We removed twotypes of errors. First, we removed any claim where the relationwas misclassified, to avoid feeding inaccurate inputs into the fact-checker. Second, FRED is not always able to link both the subjectand object entities to DBpedia, which is a requirement for usingthe fact-checking algorithms of Shiralkar et al. [48]. Thus we alsoremoved claims that did not have both the subject or object linkedto the DBpedia ontology. Of the original 28 claims, this filteringresulted in 13 remaining ClaimReview claims used in our evaluation.Additionally, we also manually checked whether the overallclaim reduces to the extracted triple (in the sense that verifying thetriple also verifies the overall claim). This distinction is importantsince it allows us to gauge the ability of our system to check entireclaims automatically, in a purely end-to-end fashion. Finally, these https://github.com/shiralkarprashant/knowledgestream Table 3: AUC of Wide DNN on the relation classification taskusing different types of graph to represent the corpus graph.AUC Unweighted Weighted

Undirected 𝑘 most simi-lar matching claims. We removed fact-checking organizations thatused scaleless fact-check verdicts (i.e. factcheck.org); for those thathad scales, we assigned truth scores to every claim, setting "False"to a baseline of 0, unless a scale explicitly stated a different baseline(i.e. PolitiFact ranks "Pants on Fire" lower than "False"). The corpus graph is composed of dependency trees, and so the cor-pus graph is naturally a directed graph; edges are also all weightedequally. This design has a strong influence on path traversal, sincedirected edges reduce the number of available paths and the costof taking an edge (or its absence) influences the choice of one pathover another. For completeness, we considered all four combina-tions of taking either a directed or undirected graph, and of havingedge weights or not. Let 𝑣 𝑖 , 𝑣 𝑗 ∈ 𝑉 represent two nodes in the de-pendency graph that are incident on the same edge. The weight 𝑤 𝑖 𝑗 between them is the angular distance between the respectivenode embeddings: 𝑤 𝑖 𝑗 = 𝜋 arccos (cid:18) (cid:174) 𝑣 𝑖 · (cid:174) 𝑣 𝑗 ∥(cid:174) 𝑣 𝑖 ∥ · ∥(cid:174) 𝑣 𝑗 ∥ (cid:19) Where (cid:174) 𝑣 is the vector associated to 𝑣 ∈ 𝑉 .Table 3 shows that the undirected, unweighted graph yields thebest classification results, which prompts two observations. Thefirst is that directed edges reduce the number of available pathwaysto connect two nodes. Second, and perhaps a bit surprisingly, weobserve that the unweighted network performs better than theweighted one. Because node embeddings were the same in the twovariants, the final feature vector used for relation classificationwould be different only if a different shortest path was found. Thiscould be possible if edges that are more relevant to discriminatingthe relation were assigned large weights, compared to other, lessrelevant edges. The results of the relation classification task are shown in Table 4.The outcome of these various tests reveal that the node embeddingsdo contain information regarding the semantic nature of the GREC able 4: Results of the relation classification task using dif-ferent ML models, on an unweighted, undirected corpusgraph, as compared to training with Word2Vec embeddings. Precision Recall F1 AUCDecision Tree 0.64 0.64 0.64 0.773Random Forest 0.81 0.67 0.61 0.793 𝑘 -NN 0.78 0.74 0.74 0.841SVM 0.81 0.77 0.77 0.855Log. Regr. 0.80 0.71 0.71 0.827Wide DNN Word2Vec+Log. Regr. 0.66 0.47 0.44 0.658Word2Vec+Wide DNN 0.61 0.63 0.61 0.883relations, however they are not neatly separable by decision planes.It is notable that models we tested are often more successful inprecision than in recall. This suggests that the more complex model,such as a DNN, is necessary to identify the less characteristic sam-ples of a relation. To improve these results, we performed a gridsearch on the Node2Vec 𝑝 and 𝑞 parameters (with values of . , . , , , , and ). The best overall results were a product of a‘global’ configuration, using 𝑝 = and 𝑞 = , which achieved anAUC of . on the test set. To evaluate our method, as a base-line we generated 300-dimension vectors for each snippet from aWord2Vec model, pre-trained on Wikipedia [65]. This is the samesource of the GREC corpus, which provided training data for model.These embeddings were then used as features to train a DNN and alogistic regression models for relation extraction. REMOD showsa marked improvement in both instances, indicating an effectiveapproach to relation extraction. The claims selected from the ClaimReview corpus, along with theirpredicted and correct relation, are shown in Table 7 in the appen-dix. The AUC of the predicted relations is . . Inspecting themisclassified samples, we see that REMOD made mistakes betweensimilar relations (e.g. place of birth and date of birth), which oftenoccur in similar sentences. We next test the integration with fact-checking algorithms. In par-ticular, we use the fact-checker for two similar, but conceptuallydistinct tasks: 1) fact-checking an entire claim ( fact-checking ), and 2)identifying evidence in support or against a claim ( fact verification ).For example, for claim ≡ Triple”,which is true (indicated by a checkmark) when the extracted rela-tion summarizes the whole claim (e.g. claim

Table 5: The performance of the fact-checking algorithmson predicting the validity of the relations.Method AUC

Knowledge Linker 0.636Relational Knowledge Linker

Knowledge Stream important: as mentioned before, although our relation extractionpipeline is capable of predicting a relation for all the entries inTable 7, not all triples that are correctly predicted can be fed tothe fact-checking algorithms, due to incomplete entity linking. Forthe task of identifying supporting evidence, we find a total of 13ClaimReview claims that are amenable to fact-checking. For thetask of checking an entire claim, this number is further reduced to7 claims.

Table 5 shows the results of verifying in-dividual pieces of evidence in support or against any of the 13ClaimReview claims identified by REMOD, using any of the threealgorithms for fact-checking RDF triples. Relational KnowledgeLinker and Knowledge Stream were the best performers. Note thatsince our baseline is intended to emulate a true fact-checking task,in this case we do not run the baseline since the similarity is basedon the whole claim, and thus would not be a meaningful compar-ison with our method, which focuses only on a specific relationwithin a larger claim.

We test here the subset of claims for whichchecking the triple is equivalent to checking the entire claim. Inthis case, REMOD yields 7 claims that can be used as inputs tothe fact-checking algorithms. Table 6 shows the results of our 7ClaimReview claims, on the three fact-checking algorithms, alongwith the baseline. Here, the baseline emulates fact-checking byclaim matching.Since we are using claim-matching to perform fact-checking, weconsider three different scenarios to make the task more realistic.In particular, we match the claim against three different corporaby higher degree of realism: 1) the full ClaimReview corpus (‘All’),2) all ClaimReview entries by PolitiFact only (‘PolitiFact’), and 3)all ClaimReview entries from the same fact-checker of the claimof interest (‘Same’). The first case (‘All’) is meant to give an upperbound on the performance of claim matching but is not realistic,since it makes use of knowledge of the truth score of potentiallyfuture claims, as well as of ratings for the same claim but by differentfact-checkers. The second case (‘PolitiFact’) partially addresses thissecond unrealistic assumption by using only claims from a singlesource. Thus, it does not have access to truth scores by differentorganizations for the same claim, but it does still have access tofuture information. Both 1) and 2) can be thus regarded as goldstandard measures of performance. The last one (‘Same’) is themore realistic one, since it emulates the scenario of a fact-checkerwho may check a claim for the first time, and who thus cannothave access to claims fact-checked afterwards nor by ratings of thesame claim by different fact-checkers. In all three cases, the claimbeing matched was removed from the corpus, to prevent trivially able 6: Results of the fact-checking algorithms. (CM =Claim Matching; KL = Knowledge Linker; Rel. KL = Rela-tional Knowledge Linker; KS = Knowledge Stream.) 𝑘 = 𝑘 = 𝑘 = 𝑘 = CM (All) 0.417

CM (PolitiFact) 0.666 0.625 KS perfect predictions. Relational Knowledge Linker and KnowledgeStream are still the best performing of the fact-checking algorithmsand manages to reach, if not exceed, the performance of the goldstandard (Claim Matching–All, or –PolitiFact). Relation extraction and classification is the task of extracting se-mantic relationships between two entities in natural language textand matching them to semantically equivalent or similar relations.This task is at the core of information extraction and knowledgebase construction, as it effectively reduces statements to their coremeaning; this is typically modeled as a semantic triple, ( s,p,o ), wheretwo entities ( s and o ) are connected with a predicate, p . There areseveral distinct nuances and open challenges to effective relationextraction. Identifying attributes that discriminate between twoobjects provides a descriptive explanation to supplement word em-beddings (i.e. lime is separated from lemon by the attribute ‘green’),and is currently most successful with SVM classifiers [27]. Multi-way classification attempts to distinguish the direction of one-wayrelations (the sonOf relation is not bidirectional between two peo-ple), and has seen similar levels of success from solutions built withlanguage models [3], convolutional neural networks [58], and recur-rent neural networks [63]. Distantly supervised relation extractionis a two-way approach whereby semantic triples are generatedfrom natural language by aligning them with information alreadypresent in knowledge graphs [64]. Relation extraction performanceis often assessed on the TACRED dataset [67]. This is a large-scaledataset of , examples used in the annual TAC KnowledgeBase Population challenges, and covers relation types. The mostsuccessful solution to date is from Baldini Soares et al. [3], whoachieved a micro-averaged F1 score of . . Despite increasingavailability of state-of-the-art machine learning architectures, rela-tion extraction continues to be an open problem with much roomfor improvement. Knowledge base augmentation is a task that aims to add new re-lations to existing knowledge bases in an automated fashion [61].This task takes one of two approaches; the first infers new relationsfrom existing triples in a knowledge base [8, 53] — this is essentially a link-prediction task that builds upon patterns found between en-tities in knowledge bases. The second approach mines data foundon the web for knowledge discovery [12, 66]. This approach relieson redundant relations found among the selected source materials,which may be as restrictive as Wikipedia articles [39] or as exten-sive as the entire web [12]. Due to the potential for error basedon the sources, Dong et al. [13] developed a Knowledge-BasedTrust (KBT) score for measuring the trustworthiness of selectedsources. Yu et al. [66] expand upon this by combining KBT scoreswith other entity/relation-based features to assign a unique scoreto each individual triple.

Information disorder is a catch-all term for the many kinds ofunreliable information that one may encounter online or in thereal-world [59], which includes disinformation, misinformation,fake news, rumor, spam, etc. Information disorder can also takeon several modalities, including text, video, and images. The manyvarieties of information disorder make it challenging to develop anyone approach for detection. This leads to a multi-model approachto detection based on three main modalities: the content of theinformation, the users who shared it, and the patterns of informa-tion dissemination on a network. Often bad content is generatedby bots; this suggests that features captured from user profiles canbe useful for distinguishing bots from humans [50]. Content detec-tion is dependent on the medium; lexical features, sentiment, andreadability metrics are used for text, while neural visual featuresare extracted from other content [40, 42, 43]. Network detectionmethods model social media networks as propagation networks,measuring the flow of information [49]. There has also been promis-ing work into crowd-sourcing the task by allowing users to flagquestionable content [55]. This task, while likely to remain imper-fect, provides the important supplement of human supervision toall of the aforementioned tasks.

Hassan et al. [24] released the first-ever end-to-end fact-checkingsystem in 2017, called ClaimBuster. ClaimBuster is composed ofseveral distinct components that work in sequence to accomplishthe task of automated fact-checking. The first, claim monitor , con-tinuously monitors text published as broadcast television closed-captions, Twitter accounts, and as content on a selected set ofwebsites. This text is passed to the claim spotter , which scores ev-ery sentence by its likelihood to contain a claim that is worthyof fact-checking — subjective and opinionated sentences receive alow score in this task. Once it has identified a set of check-worthysentences, it uses a claim matcher to search through fact-checkrepositories to return existing fact-checks that match the selectedsentences.

Claim checker generates questions from the selectedsentences and uses those questions to query Wolfram Alpha andGoogle to fetch supporting or debunking evidence as a supplementto the findings of claim matcher . Finally, the fact-check reporter builds a report from all of the gathered evidence that summarizesthe findings of the ClaimBuster pipeline, and disseminates thesefindings through social media. .5 Claim Verification Claim verification is arguably the key task of fact-checking — tocheck a claim against existing evidence. It is related to the match-ing and checking subtasks of ClaimBuster, in that it is the taskof checking whether a natural language sentence selected as evi-dence supports or debunks the correlated claim. To build out com-putational solutions to this task, datasets containing claims andtheir corresponding evidence are needed. There have been somedatasets [2, 15, 56] relevant to this task, however they are eithernot machine-readable or lacking in size. Thorne et al. [54] rec-ognized this gap, and has since released a large-scale dataset toaddress these concerns, called FEVER. This dataset contains 185,445claims with corresponding evidence that were manually classifiedas

SUPPORTED , REFUTED , or

NOTENOUGHINFO . This has been followedup with annual workshops that encourage participants to improveupon both the dataset and the claim verification task. The CLEFCheckThat! [4] series of workshops and conferences also seek tobring researchers together to improve claim verification, along withidentifying and extracting checkworthy claims.

Besides claim-matching approaches, there are a handful of existingalgorithms for fact-checking, mostly based on exploiting content orcharacteristics of existing knowledge bases. Embedding approaches,such as TransE [5], seek to generate vector embeddings of knowl-edge bases, a task which is conceptually related to our approach.By generating these embeddings, they can perform link-predictionbased on structural patterns of ( s, p, o ) triples. In terms of a knowl-edge base, this amounts to adding new facts without any neededsource material. For fact-checking, this approach can be used totest whether a triple extracted from a claim is a predicted linkin the knowledge base; the pitfall of these methods, as with allembedding techniques, is they lack both interpretability and scala-bility. Other algorithms similarly consider paths within knowledgebases, but seek to address the interpretability problem. PRA [28],SFE [18], PredPath [47], and AMIE [16] all take the approach ofmining possible pathways between two entities within a knowledgebase. From these mined pathways, they generate sets of features tobe used in supervised learning models for link-prediction. Thesehave shown promise in their success at predicting the validity of aclaim, however this also suffers from scalability. Knowledge basesthat contain enough relevant information to be useful are verylarge, and path mining and feature generation becomes necessar-ily time-consuming. There are a few rule-based [38] methods forfact-checking, which rely on logical constraints of a knowledgegraph and are naturally explainable. General, large-scale knowl-edge graphs do not have these logical constraints from which tobuild rules from, leaving this approach to fact-checking an openproblem [25].

No method is perfect and our approach suffers from a number oflimitations, which we briefly describe here. The main limitationof our pipeline lies in its discrete structure, which is prone to cas-cading failures. Our main NLP tool, FRED, is a powerhouse of atool and performed many important NLP tasks at once; however, it was not always completely accurate and many of our samples werereturned as corrupted RDF graphs. Additionally, it was not alwaysable to link the nodes to DBpedia, which limited the number oftriples we could feed into our fact-checking algorithms. Cascad-ing failures are common to many machine reading pipelines [35].One way to overcome this issue would be to rely on a joint in-ference approaches [52]. Another limitation of our methodologyhas to do with our use of distributed representations. For the taskof fact-checking, the corpus is always growing; Node2Vec cannotgeneralize to unseen data and requires retraining. An inductivelearning framework, such as GraphSAGE [23], can generate embed-dings for unseen nodes, and is therefore a more practical algorithmfor extending this pipeline. For the classification task, our machinelearning models were relatively simple, and optimizing both theparameters and architecture of the neural network would likely seean increase in the accuracy and effectiveness of this method.

In this paper, we have presented a novel relation extraction al-gorithm and previewed its application when used to classify rela-tions present in online discourse and automatically fact-check themagainst the information present in a general knowledge graph. Wedeveloped a pipeline to facilitate the linkage of these two tasks.Our relation classification method leverages graph representationlearning on the shortest paths between entities in semantic de-pendency trees; it was shown to be comparable to state-of-the-artmethods based on a corpus of labeled relations (

AUC = . ).This classifier was then used to reduce claims from online discourseto semantic triples with an AUC of . ; these were used as inputto fact-checking algorithms to predict the accuracy of the claim.We achieved an AUC of on our selected claims, which is atthe least comparable to claim matching, but without the need forthe corpus of existing claims that claim matching relies on.Our relation extraction method is a promising approach to distin-guishing relations present in large online discourse corpora; scalingup this algorithm could provide an outlet for modeling online dis-course within an established ontology. Additionally, our pipelinemay serve as a proof-of-concept for future research into automatedfact-checking. While it is a challenge to model all possible relationsin a generalistic ontology like DBPedia, this pipeline could form thebasis of tools for reducing the time needed to research an onlinediscourse claim. Acknowledgements

The authors would like to thank Google for making publicly avail-able both the GREC dataset and the Fact Check Explorer tool, andAlexios Mantzarlis for feedback on the manuscript.

REFERENCES [1] Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the2016 election.

Journal of economic perspectives

31, 2 (2017), 211–36.[2] Gabor Angeli and Christopher D. Manning. 2014. NaturalLI: Natural LogicInference for Common Sense Reasoning. In

Proceedings of the 2014 Conferenceon Empirical Methods in Natural Language Processing (EMNLP) . Association forComputational Linguistics, Doha, Qatar, 534–545. https://doi.org/10.3115/v1/D14-1059[3] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski.2019. Matching the Blanks: Distributional Similarity for Relation Learning. In

Proceedings of the 57th Annual Meeting of the Association for Computational inguistics . Association for Computational Linguistics, Florence, Italy, 2895–2905.https://doi.org/10.18653/v1/P19-1279[4] Alberto Barron-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino,Maram Hasanain, Reem Suwaileh, and Fatima Haouari. 2020. CheckThat! atCLEF 2020: Enabling the Automatic Identification and Verification of Claims inSocial Media. arXiv:2001.08546 [cs.CL][5] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-sana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relationalData. In Advances in Neural Information Processing Systems 26 , C. J. C. Burges,L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). CurranAssociates, Inc., Red Hook, NY, United States, 2787–2795.[6] B. Borel. 2016.

The Chicago Guide to Fact-Checking . University of Chicago Press,Chicago, IL, USA.[7] Alexandre Bovet and Hernán A. Makse. 2019. Influence of fake news in Twitterduring the 2016 US presidential election.

Nature Communications

10, 1 (Jan. 2019),7. https://doi.org/10.1038/s41467-018-07761-2[8] Lorenz Bühmann and Jens Lehmann. 2013. Pattern Based Knowledge BaseEnrichment. In

The Semantic Web – ISWC 2013 , Harith Alani, Lalana Kagal, AchilleFokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, NatashaNoy, Chris Welty, and Krzysztof Janowicz (Eds.). Springer Berlin Heidelberg,Berlin, Heidelberg, 33–48.[9] Razvan C. Bunescu and Raymond J. Mooney. 2005. A Shortest Path Depen-dency Kernel for Relation Extraction. In

Proceedings of the Conference on HumanLanguage Technology and Empirical Methods in Natural Language Processing (Vancouver, British Columbia, Canada) (HLT ’05) . Association for ComputationalLinguistics, USA, 724–731. https://doi.org/10.3115/1220575.1220666[10] Giovanni Luca Ciampaglia. 2018. Fighting fake news: a role for computationalsocial science in the fight against digital misinformation.

Journal of ComputationalSocial Science

1, 1 (29 Jan. 2018), 147–153. https://doi.org/10.1007/s42001-017-0005-6[11] Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen,Filippo Menczer, and Alessandro Flammini. 2015. Computational Fact Checkingfrom Knowledge Networks.

PLOS ONE

10, 6 (06 2015), 1–13. https://doi.org/10.1371/journal.pone.0128193[12] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Mur-phy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault:A web-scale approach to probabilistic knowledge fusion. In

Proceedings of theACM SIGKDD International Conference on Knowledge Discovery and Data Mining .Association for Computing Machinery, New York, New York, USA, 601–610.https://doi.org/10.1145/2623330.2623623[13] Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn,Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-Based Trust:Estimating the Trustworthiness of Web Sources.

Proc. VLDB Endow.

8, 9 (May2015), 938–949. https://doi.org/10.14778/2777598.2777603[14] C. Fellbaum and G.A. Miller. 1998.

WordNet: An Electronic Lexical Database . MITPress, Cambridge, MA, USA.[15] Kim Fridkin, Patrick J. Kenney, and Amanda Wintersieck. 2015. Liar, Liar,Pants on Fire: How Fact-Checking Influences Citizens’ Reactions to Nega-tive Advertising.

Political Communication

32, 1 (Jan 2015), 127–151. https://doi.org/10.1080/10584609.2014.914613[16] Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek.2013. AMIE: association rule mining under incomplete evidence in ontologicalknowledge bases. In

Proceedings of the 22nd international conference on WorldWide Web - WWW ’13 . ACM Press, New York, New York, USA, 413–422. https://doi.org/10.1145/2488388.2488425[17] Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea GiovanniNuzzolese, Francesco Draicchio, and Misael Mongiovì. 2017. Semantic WebMachine Reading with FRED.

Semantic Web

8, 6 (2017), 873–893. https://doi.org/10.3233/SW-160240[18] Matt Gardner and Tom Mitchell. 2015. Efficient and Expressive KnowledgeBase Completion Using Subgraph Feature Extraction. In

Proceedings of the 2015Conference on Empirical Methods in Natural Language Processing . Associationfor Computational Linguistics, Lisbon, Portugal, 1488–1498. https://doi.org/10.18653/v1/D15-1173[19] Nir Grinberg, Kenneth Joseph, Lisa Friedland, Briony Swire-Thompson, andDavid Lazer. 2019. Fake news on Twitter during the 2016 US presidential election.

Science

Proceedings of the ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining

Nature human behaviour

4, 5 (2020),472–480.[22] R. V. Guha, Dan Brickley, and Steve Macbeth. 2016. Schema.Org: Evolution ofStructured Data on the Web.

Commun. ACM

59, 2 (Jan. 2016), 44–51. https://doi.org/10.1145/2844544 [23] William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive RepresentationLearning on Large Graphs. In

Proceedings of the 31st International Conference onNeural Information Processing Systems (Long Beach, California, USA) (NIPS’17) .Curran Associates Inc., Red Hook, NY, USA, 1025–1035.[24] Naeemul Hassan, Gensheng Zhang, Fatma Arslan, Josue Caraballo, DamianJimenez, Siddhant Gawsane, Shohedul Hasan, Minumol Joseph, Aaditya Kulkarni,Anil Kumar Nayak, Vikas Sable, Chengkai Li, and Mark Tremayne. 2017. Claimbuster: The firstever endtoend factchecking system.

Proceedings of the VLDBEndowment

10, 12 (2017), 1945–1948. https://doi.org/10.14778/3137765.3137815[25] Viet Phi Huynh and Paolo Papotti. 2019. A benchmark for fact checking algo-rithms built on knowledge bases. In

International Conference on Information andKnowledge Management, Proceedings

Proceedings of The 12thInternational Workshop on Semantic Evaluation . Association for ComputationalLinguistics, New Orleans, Louisiana, 741–746. https://doi.org/10.18653/v1/S18-1118[28] Ni Lao and William W. Cohen. 2010. Relational retrieval using a combination ofpath-constrained random walks.

Machine Learning

81, 1 (2010), 53–67. https://doi.org/10.1007/s10994-010-5205-8[29] Ni Lao, Tom Mitchell, and William W. Cohen. 2011. Random Walk Inferenceand Learning in A Large Scale Knowledge Base. In

Proceedings of the 2011 Con-ference on Empirical Methods in Natural Language Processing . Association forComputational Linguistics, Edinburgh, Scotland, UK., 529–539.[30] David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly MGreenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny-cook, David Rothschild, et al. 2018. The science of fake news.

Science

Proceedings of the 31st International Conference on InternationalConference on Machine Learning - Volume 32 (ICML’14) . JMLR.org, Beijing, China,II–1188–II–1196.[32] Stephan Lewandowsky, Ullrich K. H. Ecker, Colleen M. Seifert, Norbert Schwarz,and John Cook. 2012. Misinformation and Its Correction: Continued Influenceand Successful Debiasing.

Psychological Science in the Public Interest

13, 3 (2012),106–131. https://doi.org/10.1177/1529100612451018[33] David Liben-Nowell and Jon Kleinberg. 2007. The Link-Prediction Problem forSocial Networks.

Journal of the American society for Information Science andTechnology

58, 7 (2007), 1019–1031.[34] Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, and Sameena Shah.2015. Real-Time Rumor Debunking on Twitter. In

Proceedings of the 24th ACMInternational on Conference on Information and Knowledge Management (Mel-bourne, Australia) (CIKM ’15) . Association for Computing Machinery, New York,NY, USA, 1867–1870. https://doi.org/10.1145/2806416.2806651[35] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson,B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed,N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya,A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. 2018. Never-EndingLearning.

Commun. ACM

61, 5 (April 2018), 103–115. https://doi.org/10.1145/3191513[36] Brendan Nyhan and Jason Reifler. 2015. The Effect of Fact-Checking on Elites: AField Experiment on U.S. State Legislators.

American Journal of Political Science

59, 3 (2015), 628–640. https://doi.org/10.1111/ajps.12162[37] Dave Orr. 2013. 50,000 Lessons on How to Read: a Relation Extraction Cor-pus. https://ai.googleblog.com/2013/04/50000-lessons-on-how-to-read-relation.html[38] Stefano Ortona, Venkata Vamsikrishna Meduri, and Paolo Papotti. 2018. Robustdiscovery of positive and negative rules in knowledge bases. In (Paris, France). IEEE, IEEE,Piscataway, NJ, USA, 1168–1179.[39] Heiko Paulheim and Simone Paolo Ponzetto. 2013. Extending DBpedia withWikipedia List Pages. In

Proceedings of the 2013th International Conference on NLP& DBpedia - Volume 1064 (Sydney, Australia) (NLP-DBPEDIA’13) . CEUR-WS.org,Aachen, DEU, 85–90.[40] Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017.Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. In

Proceedings of the 2017 Conference on Empirical Methods in NaturalLanguage Processing . Association for Computational Linguistics, Copenhagen,Denmark, 2931–2937. https://doi.org/10.18653/v1/D17-1317[41] Bradley L. Richards and Raymond J. Mooney. 1992. Learning Relations byPathfinding. In

Proceedings of the Tenth National Conference on Artificial Intelli-gence (San Jose, California) (AAAI’92) . AAAI Press, Palo Alto, CA, USA, 50–55.

42] Victoria L. Rubin, Yimin Chen, and Nadia K. Conroy. 2015. Deception detection fornews: Three types of fakes.

Proceedings of the Association for Information Scienceand Technology

52, 1 (2015), 1–4. https://doi.org/10.1002/pra2.2015.145052010083[43] Victoria L. Rubin and Tatiana Vashchilko. 2012. Identification of Truth andDeception in Text: Application of Vector Space Model to Rhetorical StructureTheory. In

Proceedings of the Workshop on Computational Approaches to DeceptionDetection . Association for Computational Linguistics, Avignon, France, 97–106.[44] schema.org. 2020. ClaimReview schema. https://schema.org/ClaimReview[45] Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Fil-ippo Menczer. 2016. Hoaxy: A Platform for Tracking Online Misinformation.In

Proceedings of the 25 th International Conference Companion on World WideWeb (Montréal, Québec, Canada) (WWW ’16 Companion) . International WorldWide Web Conferences Steering Committee, Republic and Canton of Geneva,Switzerland, 745–750. https://doi.org/10.1145/2872518.2890098[46] Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang,Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibilitycontent by social bots.

Nature communications

9, 1 (2018), 1–9.[47] Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining forfact checking in knowledge graphs.

Knowledge-Based Systems

104 (Jul 2016),123–133. https://doi.org/10.1016/j.knosys.2016.04.015 arXiv:1510.05911[48] Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni LucaCiampaglia. 2017. Finding Streams in Knowledge Graphs to Support Fact Check-ing. In (New Orleans,Louisiana, USA). IEEE, Piscataway, NJ, 859–864. https://doi.org/10.1109/ICDM.2017.105 arXiv:1708.07239 [cs.AI] Extended Version.[49] Kai Shu, Deepak Mahudeswaran, Suhang Wang, and Huan Liu. 2020. Hierarchicalpropagation networks for fake news detection: Investigation and exploitation. In

Proceedings of the International AAAI Conference on Web and Social Media , Vol. 14.AAAI, Palo Alto, CA, USA, 626–637.[50] Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding User Profiles onSocial Media for Fake News Detection. In

IEEE 1st Conference on MultimediaInformation Processing and Retrieval (MIPR 2018) . IEEE, Piscataway, NJ, USA,430–435. https://doi.org/10.1109/MIPR.2018.00092[51] Craig Silverman (Ed.). 2014.

Verification Handbook . European Journalism Center,Maastricht, the Netherlands.[52] Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, and Andrew McCal-lum. 2013. Joint Inference of Entities, Relations, and Coreference. In

Proceedingsof the 2013 Workshop on Automated Knowledge Base Construction (San Francisco,California, USA) (AKBC ’13) . Association for Computing Machinery, New York,NY, USA, 1–6. https://doi.org/10.1145/2509558.2509559[53] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng.2013. Reasoning With Neural Tensor Networks for Knowledge Base Com-pletion. In

Advances in Neural Information Processing Systems , C. J. C.Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger(Eds.), Vol. 26. Curran Associates, Inc., 57 Morehouse Lane, Red Hook,NY, United States, 926–934. https://proceedings.neurips.cc/paper/2013/file/b337e84de8752b27eda3a12363109e80-Paper.pdf[54] James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal.2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In

Proceedings of the 2018 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, Volume 1 (LongPapers) . Association for Computational Linguistics, Stroudsburg, PA, USA, 809–819. https://doi.org/10.18653/v1/N18-1074[55] Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant,and Andreas Krause. 2018. Fake News Detection in Social Networks via CrowdSignals. In

Companion Proceedings of the The Web Conference 2018 (Lyon, France) (WWW ’18) . International World Wide Web Conferences Steering Committee,Republic and Canton of Geneva, CHE, 517–524. https://doi.org/10.1145/3184558.3188722[56] Andreas Vlachos and Sebastian Riedel. 2014. Fact Checking: Task definitionand dataset construction. In

Proceedings of the ACL 2014 Workshop on LanguageTechnologies and Computational Social Science . Association for ComputationalLinguistics, Baltimore, MD, USA, 18–22. https://doi.org/10.3115/v1/W14-2508[57] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and falsenews online.

Science

Proceedings of the 54th AnnualMeeting of the Association for Computational Linguistics (Volume 1: Long Pa-pers) . Association for Computational Linguistics, Berlin, Germany, 1298–1307.https://doi.org/10.18653/v1/P16-1123[59] Claire Wardle and Hossein Derakhshan. 2017.

Information disorder: Toward aninterdisciplinary framework for research and policy making . Technical Report.Council of Europe Report.[60] Jen Weedon, William Nuland, and Alex Stamos. 2017.

Information Operationsand Facebook . Technical Report. Facebook, Inc. [61] Gerhard Weikum and Martin Theobald. 2010. From Information to Knowledge:Harvesting Entities and Relationships from Web Sources. In

Proceedings of theTwenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Data-base Systems (Indianapolis, Indiana, USA) (PODS ’10) . Association for ComputingMachinery, New York, NY, USA, 65–76. https://doi.org/10.1145/1807085.1807097[62] You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. TowardComputational Fact-Checking.

Proc. VLDB Endow.

7, 7 (March 2014), 589–600.https://doi.org/10.14778/2732286.2732295[63] Minguang Xiao and Cong Liu. 2016. Semantic Relation Classification via Hierar-chical Recurrent Neural Network with Attention. In

Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics: Technical Papers .The COLING 2016 Organizing Committee, Osaka, Japan, 1254–1263.[64] Peng Xu and Denilson Barbosa. 2019. Connecting Language and Knowledge withHeterogeneous Representations for Neural Relation Extraction. In

Proceedingsof the 2019 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies, Volume 1 (Long and ShortPapers) . Association for Computational Linguistics, Minneapolis, Minnesota,3201–3206. https://doi.org/10.18653/v1/N19-1323[65] Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016.Joint Learning of the Embedding of Words and Entities for Named Entity Dis-ambiguation. In

Proceedings of The 20th SIGNLL Conference on ComputationalNatural Language Learning . Association for Computational Linguistics, 209 N.Eighth Street, Stroudsburg PA 18360, USA, 250–259.[66] Ran Yu, Ujwal Gadiraju, Besnik Fetahu, Oliver Lehmberg, Dominique Ritze, andStefan DIetze. 2018. KnowMore - Knowledge base augmentation with structuredweb markup. , 159–180 pages. https://doi.org/10.3233/SW-180304[67] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D.Manning. 2017. Position-aware Attention and Supervised Data Improve SlotFilling. In

Proceedings of the 2017 Conference on Empirical Methods in NaturalLanguage Processing . Association for Computational Linguistics, Copenhagen,Denmark, 35–45. https://doi.org/10.18653/v1/D17-1004[68] Zhao, Zilong, Zhao, Jichang, Sano, Yukie, Levy, Orr, Takayasu, Hideki, Takayasu,Misako, Li, Daqing, Wu, Junjie, and Havlin, Shlomo. 2020. Fake news propagatesdifferently from real news even at early stages of spreading.

EPJ Data Sci.

9, 1(2020), 7. https://doi.org/10.1140/epjds/s13688-020-00224-z SELECTED CLAIMREVIEW CLAIMS

Table 7: Selected ClaimReview claims, the relation they con-tain, and the relation predicted by the model. The text boldindicates the entities participating in the relation. The AUCof the relation classification task is . . ID Claim Actual Predicted Rating Claim ≡ Triple Malaysian -born Senator

Penny Wong ineligible for Australian parliament POB DOB False2 Donald Trump says

President Obama ’s grandmother in Kenya said he was born in

Kenya and shewas there and witnessed the birth. POB Institution False ✓ Fred Trump , was born in a very wonderful place in

Germany . POB POB False ✓ Barack Obama was born in the

United States . POB POB True ✓ Barron Trump was born in

March 2006 and Melania wasn’t a legal citizen until July 2006. So underthis executive order, his own son wouldn’t be an American citizen. DOB POB False6

Isabelle Duterte was born on

January 26, 2002 , which makes her only 15 years old today. DOB DOB False7

Tej Pratap Yadav receives a doctorate degree from Takshsila University in Bihar education education False ✓ Smriti Irani has a

MA degree . education institution False ✓ Melania Trump lied under oath in 2013 about graduating from college with a bachelor’s degree inarchitecture. education institution False10 Did

Michelle Obama recently earn a doctorate degree in law? education education False ✓ Pravin Gordhan does not have a degree . education education False ✓ Alexandria Ocasio-Cortez ’s economics degree recalled. education institution False ✓

13 Ilocos Norte Governor

Imee Marcos claimed on January 16 that she earned a degree from PrincetonUniversity. education education False14 Ilocos Norte Governor

Imee Marcos claimed on January 16 that she earned a degree from

PrincetonUniversity . institution institution False ✓ Tej Pratap Yadav receives a doctorate degree from

Takshsila University in Bihar. institution education False16

Patrick Murphy embellished, according to reports, his

University of Miami academic achievement. institution institution True17 Mahmoud Abbas, Ali Khamenei, and

Vladimir Putin met each other in the class of 1968 at

PatriceLumumba University in Moscow institution institution False18

Mahmoud Abbas , Ali Khamenei, and Vladimir Putin met each other in the class of 1968 at

PatriceLumumba University in Moscow institution institution False19 Mahmoud Abbas,

Ali Khamenei , and Vladimir Putin met each other in the class of 1968 at

PatriceLumumba University in Moscow institution institution False20

Maria Butina is a human rights activist, a student of the

American University , and the mostrelevant is that she is a person who did not work (collaborate) with the Russian state bodies. institution institution False21 Ilocos Norte Governor

Imee Marcos graduated cum laude from the

University of the Philippines (UP) College of Law. institution institution False22

David Hogg graduated from

Redondo Shores High School in 2015. institution institution False ✓

23 Sadhvi Pragya Singh Thakur said

Manohar Parrikar died of cancer because he allowed the con-sumption of beef in

Goa . POD POD False24 Fox star

Tucker Carlson in critical condition (then died) after head on collision driving home in

Washington D.C.

POD POD False ✓ Nasser Al Kharafi died in

Kuwait . POD POD False ✓

26 DCP

Amit Sharma passed away in

Delhi riots POD institution False ✓

27 It is being claimed that

Jason Statham was murdered at his home in

New York by assailants whobroke into his mansion. POD POD False28 Actor

Robert Downey Jr. died in a car crash stunt in

Hollywood on July 8. POD POD Falseon July 8. POD POD False