[PDF] Adversarial Active Learning based Heterogeneous Graph Neural Network for Fake News Detection

Abstract

The explosive growth of fake news along with destructive effects on politics, economy, and public safety has increased the demand for fake news detection. Fake news on social media does not exist independently in the form of an article. Many other entities, such as news creators, news subjects, and so on, exist on social media and have relationships with news articles. Different entities and relationships can be modeled as a heterogeneous information network (HIN). In this paper, we attempt to solve the fake news detection problem with the support of a news-oriented HIN. We propose a novel fake news detection framework, namely Adversarial Active Learning-based Heterogeneous Graph Neural Network (AA-HGNN) which employs a novel hierarchical attention mechanism to perform node representation learning in the HIN. AA-HGNN utilizes an active learning framework to enhance learning performance, especially when facing the paucity of labeled data. An adversarial selector will be trained to query high-value candidates for the active learning framework. When the adversarial active learning is completed, AA-HGNN detects fake news by classifying news article nodes. Experiments with two real-world fake news datasets show that our model can outperform text-based models and other graph-based models when using less labeled data benefiting from the adversarial active learning. As a model with generalizability, AA-HGNN also has the ability to be widely used in other node classification-related applications on heterogeneous graphs.

Full PDF

AAdversarial Active Learning based HeterogeneousGraph Neural Network for Fake News Detection

Yuxiang Ren ∗ , Bo Wang † , Jiawei Zhang ∗ and Yi Chang †∗ IFM Lab, Department of Computer Science, Florida State University, FL, USA † Artiﬁcial Intelligence, Jilin University, Jilin, ChinaEmail: [email protected], [email protected], [email protected], [email protected]

Abstract —The explosive growth of fake news along withdestructive effects on politics, economy, and public safety hasincreased the demand for fake news detection. Fake news onsocial media does not exist independently in the form of anarticle. Many other entities, such as news creators, news subjects,and so on, exist on social media and have relationships withnews articles. Different entities and relationships can be modeledas a heterogeneous information network (HIN). In this paper,we attempt to solve the fake news detection problem with thesupport of a news-oriented HIN. We propose a novel fakenews detection framework, namely Adversarial Active Learning-based Heterogeneous Graph Neural Network (AA-HGNN) whichemploys a novel hierarchical attention mechanism to performnode representation learning in the HIN. AA-HGNN utilizesan active learning framework to enhance learning performance,especially when facing the paucity of labeled data. An adversarialselector will be trained to query high-value candidates for theactive learning framework. When the adversarial active learningis completed, AA-HGNN detects fake news by classifying newsarticle nodes. Experiments with two real-world fake news datasetsshow that our model can outperform text-based models and othergraph-based models when using less labeled data beneﬁting fromthe adversarial active learning. As a model with generalizability,AA-HGNN also has the ability to be widely used in other nodeclassiﬁcation-related applications on heterogeneous graphs.

Index Terms —Heterogeneous Network, Graph Neural Net-work, Fake News Detection, Data Mining

I. I

NTRODUCTION

With the widespread use of social networks, fake news hasbecome a serious social problem that cannot be ignored. Inpolitics, fake news biases people’s judgments about majorissues like Brexit [4] and the 2016 US presidential election[2]. A lot of fake news is spread on various social platformsduring the 2016 US presidential election, e.g., on Facebook.115 pro-Trump fake stories that were shared a total of 30million times, and 41 pro-Clinton fake stories being shared atotal of 7.6 million times are observed [2]. In the economicﬁeld, the extreme sensitivity of the capital market has causedit to suffer from fake news. For instance, $130 billion is wipedout in stock value after a piece of fake news claimed that then-president Barack Obama was injured in an explosion [28]. Inpublic safety affairs, people’s responses to emergencies, fromnatural disasters to terrorist attacks, have been disrupted bythe spread of false news online [23], [14], [45]. In view ofthis, the detection and mitigation of fake news is imperative.However, detecting fake news on social media is particularlychallenging. At ﬁrst, fake news is written and published

Creator News

Article Subject (b) Node Type

Write Belongs to (c) Network schema(a) Heterogeneous Information Network Creator1Creator2Creator3 News

Article1News

Article2News

Article3News

Article4 Subject1Subject2Subject3

Fig. 1. An illustrative example of a heterogenous information network basedon PoliticFact data (News-HIN). (a) A News-HIN consists three types of nodesand two types of links. (b) Three types of nodes (i.e., Creator, News article,Subject). (c) Network schema of News-HIN intentionally, so the content is carefully camouﬂaged. Fakecontent may account for only 1% of news articles, but it issufﬁcient for the purpose. This makes it difﬁcult to detectfake news simply based on news articles. Secondly, fake newsspreads much faster than real news. According to the researchin [45], many more people retweeted falsehood than they didthe truth on Twitter. Therefore, the detection of fake newshas high requirements for timeliness. Once a large numberof users have obtained false consultations, destructive effectshave already been caused. What’s more, it is expensive andtime-consuming to check and label the credibility of newsarticles by experts manually. Fake news detection methodsrequiring a large number of labels are not practical in thereal world.On social media, focusing on news articles alone is notcomprehensive, because news does not exist independently inthe form of articles. In fact, there are many entities relatedto news articles, such as news creators, news subjects andso on. These different types of entities and their relationshipsprovide a more comprehensive perspective on identifying newsarticles. A heterogeneous information network (HIN for short)[42], [39] can be utilized to represent these entities andrelationships. An illustration of such a news oriented heteroge-nous information network (News-HIN) based on

PolitiFact a r X i v : . [ c s . S I] J a n ata is presented in Figure 1. In addition to the informationprovided in the news article, we are able to collect proﬁleinformation of news creators from social networks and othersupplementary knowledge libraries. For the news subjects,the background and auxiliary knowledge can be collected tosupport the fake news detection. With the support of a News-HIN, fake news detection task can be formulated as the nodeclassiﬁcation problem. In this way, more sufﬁcient informationand knowledge can be used to check the credibility of newsarticles.The main challenges of the fake news detection problem ina News-HIN lie in the following points: • Paucity of Training data : Fake news appears and spreadsvery quickly. The real-time nature of news also makesoutdated labels worthless. Therefore, fake news detectionoften faces the challenge of lacking valuable training data.This requires that models can effectively detect potentialfake news with the support of a small amount of trainingdata. • Heterogeneity : Multiple types of heterogeneous informationexist in a News-HIN, which can provide key signals foridentifying fake news article nodes. At the same time,learning effective node representations in a News-HIN con-sidering both structural and type information is non-trivial. • Generalizability : In order to ensure the applicability of theproposed model to diverse and possibly changing News-HINs, we need to provide a general detection model thatcan handle News-HINs containing any types of nodes anddifferent schemas.To solve these challenges aforementioned, we proposea novel A dversarial A ctive Learning-based H eterogeneous G raph N eural N etwork (AA-HGNN) to detect fake news inthe News-HIN. For the ﬁrst challenge, the proposed frameworkis built on an active learning framework, where a classiﬁer anda selector are included. By continuously querying high-valuecandidate nodes for classiﬁer training and tuning, excellentperformance can be achieved with a small amount of labeleddata. For the second challenge, a heterogeneous graph neuralnetwork with a novel H ierarchical G raph A ttention (HGAT)mechanism is utilized in both the classiﬁer and the selector.Based on the two-level attention mechanism (node-level &schema-level), HGAT can get the optimal combination of dif-ferent types of neighbors in a hierarchical manner. The HGAT-based classiﬁer is responsible for conducting classiﬁcationon news article nodes. The HGAT-based selector is used toevaluate the predicted label from the classiﬁer for high-valueselection. The selected candidate nodes will become part ofthe training set via experts labeling. The classiﬁer and theselector are trained based on adversarial learning: with theimprovement of the predicted label quality by the classiﬁer,the evaluation ability of the selector will be improved tocontinuously select better candidates. The overall architectureof proposed framework is shown in Figure 2. AA-HGNNhas no limitation on the structures of News-HINs, thus it hasgood generalizability and can solve the third challenge well. Labeled Nodes

Unlabeled

Nodes

HGAT-based Classiﬁer update ˆ y Positive pairsNegative pairs

Candidates

Experts y HGAT-based Selector { ( h L , y ) } { ( h L , y ) } { ( h U , ˆ y ) }{ h U } Fig. 2. Overall Framework.

We focus on applying AA-HGNN to fake news detectiondomain in this paper, but for more general problems of nodeclassiﬁcation on heterogeneous graphs, AA-HGNN is alsoapplicable.The contributions of our work are summarized as follows: • We are the ﬁrst to apply adversarial active learning tofake news detection, which can achieve excellent detectionperformance with much less training data. It is of greatsigniﬁcance for fake news detection, because the urgenttimeliness of fake news detection makes sufﬁcient trainingdata impossible. • We propose a novel adversarial active learning-based frame-work AA-HGNN which can handle the heterogenity ofNews-HINs effectively through a two-level attention mech-anism. AA-HGNN is applicable to HINs with differentschemas. • We conduct extensive experiments on two real-worlddatasets to demonstrate the effectiveness of AA-HGNN.The results show the superiority of AA-HGNN comparedwith the state-of-the-art models in detecting fake news,especially facing the paucity of training data.II. R

ELATED W ORK

A. Fake News Detection

As an emerging topic, some research works in fake newsdetection have been proposed. Content-based fake news detec-tion is based primarily on the deep mining of news content.[6], [38] extract the knowledge, a set of (Subject, Predicate,Object) triples [10], from the news content and assess theauthenticity of news by comparing them with real knowledge.However, the timeliness and integrity of the knowledge mapstill limit the application of them [52]. Writing style isextracted and utilized to measure the credibility of news bysome methods. [34] employs rhetorical structure theory toevaluate the authenticity in discourse level. [25], [26] capturethe sentiment and readability of the news content to access theextent of falsehood. But these methods based on writing stylecan be hard to work in the face of carefully camouﬂaged fakenews.Some methods use not only the news content, but also otherinformation related to the news. Guo et al. [13] utilize LSTMand a hierarchical attention mechanism to detect rumors, whichmakes use of social information through the proposed socialfeature. Shu et al. [7] study the explainable detection offake news with the support of both news contents and user2omments. Jin et al. evaluate news credibility within a graphoptimization framework [18]. Methods based on matrix fac-torization [40], tensor factorization [15], and recurrent neuralnetworks (RNNs) [35], [51], [32] are proposed to work on thenews-oriented networks.In this paper, we model the news content and related entitiesas a News-HIN. Both structural information and node contentof News-HIN are utilized by AA-HGNN to identify fakenews.

B. Graph Neural Network

Graph Neural Networks (GNNs) learn nodes’ new featurevectors through a recursive neighborhood aggregation scheme[12], [36], [49]. A propagation model incorporating gatedrecurrent units to propagate information across all nodes isproposed in [22]. Recently, there is a surge of generalizingconvolutional operation on the graph-structured data. JoanBruna et al. [5] extend convolution to general graphs bya novel Fourier transformation in graphs. Kipf et al. [20]propose Graph Convolutional Network (GCN). Hamilton etal. [16] introduce GraphSAGE which generates embeddingsby aggregating features from a node’s local neighborhooddirectly. Graph Attention Network (GAT) [44] ﬁrst importsthe attention mechanism into graphs, which is utilized to learnthe importance of neighbors and aggregates the neighbors tolearn the representation of nodes in the graph. However, theabove graph neural networks are presented for the homoge-neous graphs. Wang et al. [48], [31] consider the attentionmechanism in heterogeneous graph learning through the modelHAN, where information from multiple meta-path deﬁnedconnections can be learned effectively. However, meta-pathas a handcrafted feature limits HAN. In addition, HAN onlyconsiders different types of connections between target nodesthrough meta-path but ignores the use of node contents carriedby different types of nodes.

C. Adversarial and Active Learning

The principle of adversarial learning is invented in genera-tive adversarial networks (GANs) by Goodfellow et al. [11].Adversarial learning principle has achieved excellent perfor-mance in many different topics, such as text classiﬁcation[21], information retrieval [46], and network embedding [17],[8]. Adversarial learning method on heterogeneous networkembeddings [17] can be used to learn a more efﬁcient repre-sentation of news nodes in News-HIN. However, in order todetect fake news, HeGAN [17] still requires a large number oflabeled data to train a classiﬁer. Active learning is an effectiveway to train a model with less labeled data, because not alltraining samples are equally important [1]. The number oflabels needed to learn actively can be logarithmic in the usualsample complexity of passive learning [9] . Active learningalso proves its value and robustness on different topics includ-ing recommendation systems [33], social network alignment[30], [29], image classiﬁcation [47] and graph matching [37].In this paper, AA-HGNN combines adversarial learningand active learning. Selectors trained in an adversarial man-ner can continuously select high-value candidates for active learning. The high-value candidates further improve the per-formance of the classiﬁer.III. C

ONCEPT AND P ROBLEM D EFINITION

A. Terminology Deﬁnition

In order to make it easier to understand related concepts, wewill use the

PolitiFact data as an example to illustrate here. The

PolitiFact data contain News articles, Subjects and Creators,which can be modeled into a heterogeneous network as threetypes of nodes and construct different types of links based onthe connections among them. We can deﬁne News OrientedHeterogeneous Information Networks (News-HIN) formally asfollows:D

EFINITION (News Oriented Heterogeneous InformationNetworks (News-HIN)): The news oriented heterogeneous in-formation network (News-HIN) can be deﬁned as G = ( V , E ) ,where the node set V = C ∪ N ∪ S . C , N and S representCreators, News articles and Subjects respectively. We willdeﬁne different types of nodes in detail later. The link set E = E c,n ∪ E n,s involves the ”Write” links between creatorsand news articles, and the ”Belongs to” links between newsarticles and subjects.News articles refer to the news content post on social mediaor public platforms. We can deﬁne news articles in a formalway as:D EFINITION (News Articles): The News articles set canbe represented as N = { n , n , · · · , n m } . For each newsarticle n i ∈ N , it contains its textual contents.The credibility label of n i takes value from the label set Y = { F ake, Real } . In this paper, the original label set contains 6different class labels (True, Mostly True, Half True, MostlyFalse, False, Pants on Fire). We group the labels Pants onFire, False, Mostly False as fake news and group True, MostlyTrue, Half True as real news. Subjects denote the central ideasof news articles, which normally are the main objectives ofwriting news articles.D EFINITION (Subjects): The set of subjects can bedenoted as S = { s , s , · · · , s k } . For each subject s i ∈ S ,it contains the textual description.Creators denote people who write news articles. We can alsodeﬁne this concept in a formal way.D EFINITION (Creators): The set of creators can berepresented as C = { c , c , · · · , c n } . For each creator c i ∈ C ,it contains the proﬁle information.In the PolitiFact dataset, the creators have the proﬁle contain-ing their titles, political party membership, and geographicalresidential locations. The proﬁle information can be describedby a sequence of words.In order to better understand the News-HIN and utilizetype information, it is necessary to deﬁne the schema-leveldescription. The schema of News-HIN serves for learning theimportance of nodes and links with different types.D EFINITION (News-HIN Schema): Formally, the schemaof the given News-HIN G = ( V , E ) can be represented as S G = ( V T , E T ) , where V T and E T denote the set of node3ypes and link types in the network respectively. Here, V T = { φ n , φ c , φ s } and E T = { Write , Belongs to } .An illustration of News-HIN Schema based on the PolitiFactdata is shown in Figure 1(c). B. Problem Formulation

Given a News-HIN G = ( V , E ) , the fake news detectionproblem aims at learning a classiﬁcation function f : N → Y to classify news article nodes in the set N into the correctclass with the credibility label in Y . The news article nodeswith labels can be grouped as a labeled set L and the rest newsarticle nodes will be denoted as the unlabeled set U = N \ L .Based on the active learning setting, we are also allowed toquery for labels of news article nodes in U with a upper limitbudget b . We also want to propose a mechanism to achievean optimal query set U q to improve the classiﬁcation function f : N → Y . To resolve the above fake news detection problem,we will introduce the proposed adversarial active learningbased heterogeneous graph neural network AA-HGNN inSection IV. IV. P

ROPOSED M ETHOD

In this section, we propose a novel A dversarial A ctiveLearning based H eterogeneous G raph N eural N etwork (AA-HGNN) to detect fake news. As shown in Figure 2, AA-HGNN consists of two major components: (1) HGAT-basedclassiﬁer, and (2) HGAT-based selector. We begin with theoverview of the model, followed by detailed descriptions of thehierarchical graph attention neural network (HGAT). Then weillustrate the HGAT-based classiﬁer and HGAT-based selectorrespectively. At last, we elaborate on the optimization of AA-HGNN. A. Model Overview

The architecture of AA-HGNN is shown in Figure 2.The News-HIN G is the input of the HGAT-based classiﬁer. h L and h U denote the initial feature of a labeled node andan unlabeled node respectively. The HGAT-based classiﬁeris trained with both labeled and unlabeled data to predictlabels { ˆ y } for unlabeled news article nodes. The HGAT-basedselector evaluates the quality of predicted labels and selectshigh-value candidates from them based on a query strategy.We take the pairs of labeled nodes and their ground-truthlabels { y } as positive samples, and the pairs of unlabelednodes and their predicted labels { ˆ y } are used as negativesamples. A portion of positive and negative pairs are sampledto train the HGAT-based selector. After being trained, theselector outputs the conﬁdence P of pairs in the test set. Basedon the conﬁdence, the proposed selection strategy selectsa set of high-value unlabeled nodes as candidates with thesize k . These candidates will be labeled by experts. In ourexperiments, these candidates will be moved to the trainingset before next round optimization. A query budget b ispre-speciﬁed for AA-HGNN. When the query budget b isexceeded, the adversarial active learning stops.Since Hierarchical Graph Attention Neural Network(HGAT) is the basis of the classiﬁer and the selector, which is Projection

Node-Level Attention

Neighbor Type S … h n i h n i Projection Projection … S n i … Neighbour Types

Projection

Node-Level Attention

Neighbor Type C h n i h n i Projection Projection … … C n i h c i h c i Schema-Level Attention

Node-Level AttentionSchema-Level Attention r n i h s i h s i Fig. 3. Hierarchical Graph Attention Neural Network. the key to handling the heterogeneity, we will ﬁrst introduceHGAT in detail in the next section.

B. Hierarchical Graph Attention Neural Network (HGAT)

The novel HGAT employs a two-level attention mechanismincluding node-level attention and schema-level attention. Thestructure of HGAT is shown in Figure 3. Node-level attentionis responsible for learning the weights of neighbors belongto the same type and aggregates them to get the type-speciﬁcneighbor representation. Schema-level attention enables HGATto learn the information of node types and get the optimalweighted combination of the type-speciﬁc neighbor represen-tations. Through the two-level attention mechanism, the rep-resentations of news article nodes contain both the structuraland node content information.

1) Node-level attention:

The node-level attention can learnthe importance of neighbors belong to the same type re-spectively for each news article node n i ∈ N , and thenaggregates the representation of same-type neighbors to forman integrated representation which we deﬁne as a schema node.The inputs of the node-level attention layer are the nodeinitial feature vectors { h } . Because multiple types of nodesexist in the News-HIN, the initial feature vectors belong tofeature spaces with different dimensions. In order to enablethe attention mechanism to output comparable and meaningfulweights between different types of nodes, we ﬁrst utilize atype-speciﬁc transformation matrix to project features withdifferent dimensions into the same feature space. We take thenews article node n i ∈ N as an example. The transformationmatrix for type φ n is M φ n ∈ R F × F φn , where F φ n is thedimension of the initial feature h n i ∈ R F φn of the news articlenode n i and F is the dimension of the feature space mappedto. The projection process can be shown as follows: h (cid:48) n i = M φ n · h n i (1)4 i c l c m c n n i n i n l (a) Node-level aggregation avgavgavg C n i N n i S n i concate/avg (b) Schema-level aggregation r n i ↵ c in ↵ c il ↵ c im ↵ n il ↵ n ii ↵ s il ↵ s im ↵ s in s l s m s n c i s i n i Fig. 4. Explanation of aggregating process in node-level and schema-level.

The h (cid:48) n i is the projected feature of node n i . The F is thesame for all type-speciﬁc transformation matrices. Through thetype-speciﬁc projection operation, the feature space of nodeswith different types can be uniﬁed where the self-attentionmechanism can work on to learn the weight among variouskinds of nodes.In the face of fake news detection, the target node is thenews article node n i ∈ N . The neighbors of it belong to N ∪ S ∪ C . It should be noted that we also regard the targetnode itself as a neighbor node to cooperate the self-attentionmechanism. We let T ∈ {N , S , C} and nodes in T have thetype φ t . For n i ’s neighbor nodes in T , the node-level attentioncan learn the importance e φ t ij which means how important node t j ∈ T will be for n i . The importance of the node pair ( n i , t j ) can be formulated as follows: e φ t ij = att ( h (cid:48) n i , h (cid:48) t j ; φ t ) (2)Here, the node-level attention att denotes the same deepneural network as [44]. att is shared for all neighbor nodeswith the same type φ t . The masked attention captures the net-work structure information where only node t j ∈ neighbor n i (being neighbors of node n i ) will be calculated and recordedas e φ t ij . Otherwise, the attention weight will be 0. We normalizethem to get the weight coefﬁcient α φ t ij via softmax function: α φ t ij = softmax j ( e φ t ij ) = exp ( e φ t ij ) (cid:80) t k ∈ neighbor ni e φ t ik (3)Then, the schema node T n i can be aggregated by theneighbor’s projected features with the corresponding weightsas follows: T n i = σ ( (cid:88) t j ∈ neighbor ni α φ t ij · h (cid:48) t j ) (4)Similar to Graph Attention Network (GAT) [44], a multi-head attention mechanism can be used to stabilize the learningprocess of self-attention in node-level attention. In details, K independent node-level attentions execute the transformationof Equation (4), and then the features achieved by K headswill be concatenated, resulting in the output representation ofthe schema node: T n i = K (cid:107) k =1 σ ( (cid:88) t j ∈ neighbor ni α φ t ij · h (cid:48) t j ) (5) HGAT

Classiﬁcation Layer

Classiﬁer r n i ˆ y Output Layer

Selector r n i ˆ y P (ˆ y ; r n i ) Query Strategy

HGAT

Fig. 5. HGAT-based Classiﬁer and HGAT-based Selector. where (cid:107) represents concatenation. In the problem we face,every target node n i has 3 schema nodes correspondingto 3 different types neighbors (include itself) based on theDeﬁnition 5. They can be denoted as N n i , C n i , S n i .

2) Schema-level attention:

Through the node-level atten-tion, we fuse information from neighbor nodes with the sametype into the representation of a schema node. Now, HGATneeds to learn the representation of news article nodes from allschema nodes. Different schema nodes contain type-speciﬁcinformation, which requires us to learn the importance of dif-ferent node types. Here, the schema-level attention is proposedto learn the importance of different schema nodes, and ﬁnallyuse the learned coefﬁcients for weighted combination.In order to obtain sufﬁcient expressive power to calculatethe attention weights between schema nodes, one learnablelinear transformation is applied to the schema nodes. Thelinear transformation is parametrized by a weight matrix W ∈ R F (cid:48) × KF , where K is the number of heads in node-levelattention. The schema-level attention schema is a single-layerfeedforward neural network applying the activating functionSigmoid with the dimension F (cid:48) . For the schema node T n i ,the importance of it can be denoted as w φ t i : w φ t i = schema ( W T n i , W N n i ) (6)We normalize the imoportance of each schema nodesthrough a softmax function. Then coefﬁcients of the ﬁnalfusion can be denoted as β φ t i , which can be calculated asfollows: β φ t i = softmax t ( w φ t i ) = exp ( w φ t i ) (cid:80) φ ∈V T exp ( w φi ) (7)Based on the learned coefﬁcients, we can fuse all schemanodes to get the ﬁnal representation r n i ∈ R F (cid:48) of the targetnode n i : r n i = (cid:88) φ t ∈V T β φ t i · T n i (8)We also describe the two-level aggregating process in Fig-ure 4 for reference.5 . HGAT-based Classiﬁer As shown in the left side of Figure 5, HGAT and aclassiﬁcation layer constitute a HGAT-based classiﬁer. Theinput of HGAT-based classiﬁer is the same as HGAT, whichare the initial feature vectors of nodes. The classiﬁcation layercan output the predicted labels { ˆ y } of unlabeled news articlenodes. In our experiments, a logistic regression layer worksas the classiﬁcation layer.For the fake news detection tasks, the optimization objectivefunction of the HGAT-based classiﬁer can leverage the cross-entropy loss minimization. The HGAT-based classiﬁer can beoptimized in an end-to-end manner by backpropagation. Wedeﬁne the set of labeled news article nodes as N L and the setof unlabelled news article nodes as N U , then the cross-entropyloss we used can be written as: Loss classifier = − (cid:88) n i ∈N L ( y n i log ( p n i )+(1 − y n i ) log (1 − p n i )) (9)Here, y n i is a binary indicator (0 or 1) indicating if the binaryclass label is the correct classiﬁcation for the news article noderepresentation r n i . p n i is the predicted probability of labelednews article node n i .When the optimization is completed, the predicted proba-bility of unlabeled news article nodes in N U are rounded andcast into predicted labels { ˆ y } . The predicted labels { ˆ y } willbe evaluated by the HGAT-based selector which is describedin the next section. D. HGAT-based Selector

The structure of a HGAT-based selector is shown in the rightside of Figure 5. The inputs of the layers of HGAT are theinitial feature vectors { h } . Based on the learned representation r n i , we then concatenate r n i with the predicted label ˆ y (orthe ground-truth label y of the labeled node). We denote thisconcatenated vector as z n i ∈ R ( F (cid:48) +1) : z n i = [ r n i , ˆ y ] (10)The purpose of the HGAT-based selector is to evaluate theprobability that how likely the z n i is from the set of labelednews article nodes N L . A higher possibility represents that anews article node matches the predicted label better. At thesame time, if a node does not match the predicted label, it islikely to indicate that the predicted label is wrong. The outputlayer is responsible for predicting the probability P (ˆ y ; r n i ) .Here, we use a logistic regression layer as the output layer.We sample z n j , n j ∈ N L as the positive samples, and thesame number of z n k , n k ∈ N U are sampled as the negativesamples. These positive and negative samples constitute thetraining set for the HGAT-based selector. The loss functionused by HGAT-based selector is a cross-entropy loss: Loss selector = − (cid:88) ( y log ( P ) + (1 − y ) log (1 − P )) (11) y ∈ { , } denotes the negative-positive label of the concate-nated vector in training set. P is the predicted probability of Algorithm 1:

Adversarial Active optimization of AA-HGNN

Input:

The News-HIN G = ( V , E ) ; The set of labeled newsarticle nodes N L ; The set of unlabeled news articlenodes N U ; The query budget b ; The query batch size k ; Number of samples m ; U q = ∅ ; while |U q | < b do (cid:46) Optimization for HGAT-based classiﬁer ; begin Train the HGAT-based classiﬁer on N L via Eq.9; Predict the labels of nodes in N U ; Update the set of predicted labels { ˆ y } ; (cid:46) Optimization for HGAT-based selector ; begin Sample m nodes from N L to construct positivesamples via Eq.10, i.e., z n j , n j ∈ N L ; Sample m nodes from N U to construct negativesamples via Eq.10, i.e., z n k , n k ∈ N U ; Train the HGAT-based selector on positive andnegative samples; Predict the probability P via Eq.11; Query k candidates based on Deﬁnition 6; U q = U q ∪ { candidates } ; Labeling k candidates by experts; N L = N L ∪ { candidates } ; N U = N U \ { candidates } ; return The set of predicted labels { ˆ y } label being positive. This loss function can be optimized bybackpropagation.The rest concatenated vectors of unlabeled news articlenodes are in the testing set. After training, the HGAT-basedselector will output the probability P for testing samples.Based on the probability, we propose a query strategyto select high-value candidates for active learning. As wementioned before, a lower probability P indicates that theunlabeled news article node and the predicted label do notmatch. It also represents there is a high probability that thepredicted label will be wrong. Obviously, if the news articlenode we query was not able to be classiﬁed correctly by theHGAT-based classiﬁer, then it will be more ”informative” thanthe nodes that have been correctly classiﬁed. Besides, we canmake it as part of the training set in the next round of trainingafter experts labeling, thereby correcting the misclassiﬁednodes in the test set for similar reasons. So the query strategyis:D EFINITION (Query Strategy): All samples in the test setwill be sorted in ascending order according to the predictedprobability P , the top k candidates will be added to U q . Here, k denotes the query batch size. E. Adversarial Active Optimization

In AA-HGNN, the HGAT-based classiﬁer and the HGAT-based selector cooperate in an adversarial active manner. Weadopt the iterative optimization to train these components inAA-HGNN. In each iteration, the HGAT-based classiﬁer andthe HGAT-based selector have trained alternately. Speciﬁcally,6e ﬁrst train the HGAT-based classiﬁer to output the predictedlabels. Then the HGAT-based selector will be trained by thepredicted labels from the classiﬁer. Based on the optimizedselector, k candidates will be queried in one iteration and beadded to U q used as training data in the next iteration. Eachtime k candidates are obtained, the classiﬁcation performanceof the HGAT-based classiﬁer can be improved in the nextiteration. As a consequence, the credibility of predicted labelswill be increased. Better predicted labels further improve theevaluation performance of the HGAT-based selector. We repeatthe above iteration until the size of U q exceeds the querybudget b . The adversarial active optimization of AA-HGNNis described in Algorithm 1.V. E XPERIMENTS

To test the effectiveness of AA-HGNN, extensive exper-iments are designed and conducted on two real-world fakenews datasets. We ﬁrst introduce the datasets. Then experi-mental settings are provided. We aim to answer the followingevaluation questions based on experimental results togetherwith the detailed analysis: • Question 1 : Can AA-HGNN improve fake news detectionperformance by modeling data as a News-HIN? • Question 2 : Can Hierarchical Graph Attention (HGAT)mechanism handle the heterogeneity of the News-HIN ef-fectively? • Question 3 : Can the active learning setting of AA-HGNNovercome the paucity of training data? • Question 4 : Can adversarial learning between the classiﬁerand the selector signiﬁcantly help improve the performance?

A. Dataset Description

TABLE IP

ROPERTIES OF THE H ETEROGENEOUS N ETWORKS

PolitiFact Network BuzzFeed Network

We use two datasets to verify our model in experiments. Themain dataset is collected from the platform with fact-checking:

PolitiFact , which is operated by Tampa Bay Times. The newsafter fact-checking from

PolitiFact mainly are the statementsor news articles posted by the politicians (Congress members,White House staffs, lobbyists) and political groups. They arecreators of news articles in our experiments. Regarding thesenews articles,

PolitiFact will provide the original contents,fact-checking results and comprehensive fact-checking reportson the website. When presenting these news articles, theplatform will categorize them into different subjects based oncontents and topics. A brief description of each subject willbe provided as well. The fact-checking results can indicate thecredibility of corresponding news articles and take values from { True, Mostly True, Half True, Mostly False, False, Pants onFire! } . In the PolitiFact dataset, news articles are marked as ”Pants on Fire”, while the number of news articles with”False” is . Besides, ”Mostly False” news articlesand ”Half True” news articles exist in the dataset. Thenumber of ”Mostly True” and ”True” news is and respectively. We group the labels { Pants on ﬁre, False, MostlyFalse } as fake news and group { True, Mostly True, Half True } as real news, the quantity of fake news is correspondingto real news. The fact-checking results will be usedas the ground truth in experiments. We won’t make use ofcomprehensive fact-checking reports in this paper. We haveestablished a heterogeneous information network based on the PolitiFact dataset. The HIN includes three types of nodes:article, creator and subject and two types of links: Write(between article and creator) and Belongs to (between articleand subject). In order to verify the generalization and stabilityof AA-HGNN, we use a public dataset

BuzzFeed from Shu etal.[41]. BuzzFeed contains real news articles and fakenews articles. We also construct a HIN based on BuzzFeed dataset. There exist three types of nodes: article, twitter userand publisher. The key statistical data describing the HINs canbe found in Table I.

B. Experimental Settings1) Experimental Setup:

In the experiments, we are able toacquire the set of news article nodes which are the target nodeto conduct the classiﬁcation. For the

PolitiFact dataset, thefact-checking results corresponding to news articles are used asthe ground truth for model learning and evaluation. We groupfact-checking results { Pants on ﬁre, False, Mostly False } as aFake class and group { True, Mostly True, Half True } as a Realclass. Because our target is to detect fake news, we treat Fakeclass as the positive class and Real class as the negative class.For all comparison methods, we use 20% of news article nodesas the training set and 10% of the nodes as the validation set.In addition, the testing ratio is ﬁxed as 10%. For AA-HGNN,we use 1000 nodes to initialize the active learning. The querybudget b is 1800 and the query batch size k is 200. In this way,2800 nodes (20% of news article nodes) are utilized to trainAA-HGNN ﬁnally. BuzzFeed dataset has only two types oflabels: True and fake, we can use it directly. The rest settingis the same as the

PolitiFact dataset. We run the experimentson a Dell PowerEdge T630 Server with 2 20-core Intel CPUsand 256GB memory and the other Server with 3 GTX-1080ti GPUs. Code is available at the link .

2) Data Preprocessing:

Two datasets both contain textualdata with different length. In order to ﬁt to the non-sequentialmodels, we have to transform the input features of each typeof nodes into a vector with a ﬁxed length. To deal with theproblem, we use

TﬁdfVectorizer in Sklearn package to extractfeatures. For the

PolitiFact dataset, the dimensions of initialfeatures of news articles, creators, and subjects are 3000, 3109,and 191 respectively. For the

BuzzFeed dataset, the parameter max features for the news article nodes is set as 3000. https://github.com/KaiDMML/FakeNewsNet/tree/old-version ) Comparison Methods: We classify comparison methodsinto three categories: Graph neural network methods, Textclassiﬁcation methods, and Network embedding methods.

Graph neural network methods • AA-HGNN : AA-HGNN is the proposed model. • AA-HGNN entropy : We keep the active learning setting ofAA-HGNN, but query the candidates according to entropy.Here, we deﬁne that the closer the probability of this nodebeing fake news to 0.5, the higher its entropy. • AA-HGNN random : Here, we query the candidates foractive learning randomly. • HGAT-based classiﬁer : It is the classiﬁer in the proposedAA-HGNN. We test the performance without active learn-ing setting. • HAN [50]: HAN employs node-level attention andsemantic-level attention to capture the information from allmeta-paths. In our experiments, we utilize two meta-paths(article-creator-article, article-subject-article) in HAN. • GAT [44]: GAT is also an attention-based graph neuralnetwork for the node classiﬁcation, but it is designed forhomogeneous graphs. The News-HIN is treated as a homo-geneous graph (ignore the type information) when testingthe model. • GCN [20]: GCN is a semi-supervised methods for the nodeclassiﬁcation in homogeneous graphs. The News-HIN istreated as a homogeneous graph when testing it.

Text classiﬁcation methods • SVM : SVM is a classic supervised learning model. Thefeature vector used for building the SVM model is extractedmerely based on the news article contents with TF-IDF. • Text-CNN [19]: Text-CNN is a text classiﬁcation methodbased on convolutional neural network. It utilizes convolu-tion ﬁlters of various sizes to capture key local features innews contents. • LIWC [24]: LIWC stands for Linguistic Inquiry and WordCount, which is widely used to extract the lexicons fallinginto psycho-linguistic categories. It learns a feature vectorfrom psychology and deception perspective.

Network embedding methods (NE) • Label Propagation (LP) [53]: LP is merely based on thenetwork structure. The prediction scores will be rounded andcast into labels. • DeepWalk [3]: A random walk based embedding method,which is designed to deal with the homogeneous network.Based on the embedding results, we then train a logisticregression model to perform the classiﬁcation of newsarticles. • LINE [43]: LINE optimizes the objective function thatpreserves the local and the global network structure simulta-neously. We also learn a logistic regression model to conductthe classiﬁcation based on the learned embeddings.We have also noticed some recently appeared methods forfake news detection [7], [40], [27], but did not compare them.The main consideration is the difference between the scenarioswe face. In above works, they all utilize social context like user comments, but AA-HGNN aims at detecting fake news in arelatively early stage with less labeled data. We won’t utilizeuser comments about the news or large amount of trainingdata, because when many users have started to discuss onefake news, the bad inﬂuence of fake news has spread.

C. Experimental Results with Analysis1) Assessing Impact of News-HIN:

In order to answer

Question 1 , we ﬁrst present experiment results in Table IIto compare AA-HGNN with three categories of methodsintroduced in Section V-B3. For text classiﬁcation methodsSVM, LIWC and Text-CNN which use the textual informationof news article nodes to do classiﬁcation, we see that Text-CNN > SVM & LIWC in all metrics. This result shows thatText-CNN can better capture the important textual featuresin news contents by utilizing multiple convolution ﬁlters. ForNetwork embedding methods relying on graph structures, allof them achieve a poor recall. Recall is a pretty critical metricfor fake news detection problem. A low recall means weomit lots of fake news so that they will cause bad socialinﬂuence, which is unexpected. A News-HIN integrates allheterogeneous available data in the form of a graph structure.Intuitively, methods (AA-HGNN, HAN) making full use ofNew-HIN as training data achieve better results. Throughthe comparison among GNNs methods, we verify that theheterogeneity of networks should be dealt with in a moreeffective way. If we simply treat a heterogeneous network asa homogeneous network by ignoring the type, then the results(reported by GAT, GCN) would be very disappointing. Wecontinue to discuss performance concerning heterogeneity inthe next section.

Train with 1200 nodes

Fig. 6. The advantage in training with less labeled data.

2) Methods performance on Heterogeneous graph:

To an-swer

Question 2 , we further investigate the performance ofdifferent GNN methods besides AA-HGNN and its variants.As we utilize a heterogeneous network as source data, theheterogeneity should be handle in an effective manner. InTable II, we observe HGAT achieves the best accuracy, recalland F1. GAT and GCN get high precision but low recall.Particularly for the

PolitiFact dataset, GCN reach 0.9688 inprecision but 0.0246 in recall. This result occurs because theyprefer to classify a sample as real news based on News-HIN.They are not powerful methods in fake news problem becausethey were originally designed for homogeneous networks.Also as a method for heterogeneous graphs, HGAT-basedclassiﬁer also shows an advantage over HAN. As the basicclassiﬁer, HGAT-based classiﬁer can handle the heterogeneityof News-HIN well.8

ABLE IIP

ERFORMANCE COMPARISON OF DIFFERENT METHODS . T

HE TRAINING RATIO IS

PolitiFact BuzzFeed

Methods Accuracy Precision Recall F1 Accuracy Precision Recall F1 T e x t SVM 0.5432 0.4975 0.32 0.3894 0.5398 0.6011 0.5109 0.5523LIWC 0.4544 0.4415 0.23 0.3023 0.6137 0.6459 0.5885 0.6175Text-CNN 0.5658 0.5873 0.2824 0.3814 0.6317 0.6415 0.6233 0.6322 N E Label Propagation 0.5796 0.7005 0.1164 0.1996 0.5867 0.6409 0.223 0.3309DeepWalk 0.5297 0.4639 0.2881 0.4639 0.3721 0.3083 0.4322 0.3599LINE 0.5012 0.4109 0.1215 0.4109 0.5899 0.6123 0.3057 0.4077

GNN s GAT 0.5765 0.7569 0.0453 0.0854 0.5885 0.654 0.3367 0.4445GCN 0.5611 random entropy

TABLE IIIA

DVERSARIAL A CTIVE L EARNING P ERFORMANCE OF

AA-HGNN IN PolitiFact

Number of training nodesMetrics 1000 1200 1400 1600 1800 2000 2200 2440 2600 2800Accuracy 0.5658 0.5878 0.6049 0.6053 0.6013 0.5984 0.597 0.597 0.5955 0.6155Precision 0.5142 0.5246 0.5218 0.5245 0.5135 0.5115 0.516 0.5136 0.5342 0.5661Recall 0.3241 0.4526 0.4869 0.5065 0.5277 0.5441 0.5539 0.5523 0.5688 0.5804F1 0.3975 0.4859 0.5038 0.5154 0.5205 0.5273 0.5342 0.5323 0.5456 0.5732

Budget F AA-HGNNAA-HGNN_entropyAA-HGNN_random (a) F1

Budget R e c a ll AA-HGNNAA-HGNN_entropyAA-HGNN_random (b) Recall

Budget P r e c i s i o n AA-HGNNAA-HGNN_entropyAA-HGNN_random (c) Precision

Budget A cc u r a c y AA-HGNNAA-HGNN_entropyAA-HGNN_random (d) AccuracyFig. 7. Performance Analysis of Query Strategy in

PolitiFact

3) Active learning setting on scarce training data:

Toanswer

Question 3 , we draw Figure 6 to compare the per-formance of HGAT-based classiﬁer and AA-HGNN. The F1score of the classiﬁer shown in Figure 6 is achieved with 2800training nodes. In comparison, AA-HGNN can outperform theclassiﬁer when being trained with 1200 labeled nodes. Besides,the score of AA-HGNN applying the active learning settingsigniﬁcantly increased. When the number of training nodes is2800, the performance of AA-HGNN increase nearly 9% thanthe model without the active learning setting. From Table II,we can observe that AA-HGNN has the apparent advantagewhen using 20% training ratio, while other mehtods can not perform well due to the paucity of training data. Also, we seeAA-HGNN can reach satisfactory result although the trainingdata is even more scarce in Table III.

4) Adversarial learning impacts on Active Learning:

In order to answer

Question 4 , we build two variantsAA-HGNN entropy and AA-HGNN random to demonstratethe adversarial learning setting’s efforts. These two varientsprovide different query strategies for active learning. Basedon the results of comparative experiments in Figure 7, itis obvious that AA-HGNN outperforms AA-HGNN entropy and AA-HGNN random in every query batch. The adversariallearning between the classiﬁer and the selector indeed providesan effective query strategy for the active learning. The queriedcandidates are of high value for the classiﬁer, so the perfor-mance of the classiﬁer can be signiﬁcantly improved. Besides,the adversarial learning-based query strategy can consistentlyprovide high-value candidates, as the performance of selectorsalso improves in adversarial learning.VI. C

ONCLUSION

In this paper, we study the HIN-based fake news detectionproblem and propose a novel adversarial active learning-based graph neural network AA-HGNN to solve it. AA-HGNN employs a novel hierarchical attention mechanismto deal with the heterogeneity of News-HIN and learnstextual and structural information simultaneously. An activelearning framework is applied in AA-HGNN to enhance thelearning performance, especially when facing the paucity oflabeled data. A selector is trained in an adversarial mannerto query high-value candidates for the active learning setting.Experiments with real-world fake news data show that ourmodel can outperform text-based models and other graph-based models when using less labeled data. Experiments alsoverify the effectiveness of adversarial learning-based querystrategy, which consistently queries high-value candidates toimprove the performance. As an adversarial active learning-based model, AA-HGNN is ideal for detecting fake news inthe early stages when lacking training data. Finally, due to thegood generalizability of AA-HGNN, it has the ability to bewidely used in other node classiﬁcation-related applicationson heterogeneous graphs, where there will be no obstacles tothe transfer.9II. A

CKNOWLEDGEMENT

This work is partially supported by NSF through grant IIS-1763365 and by FSU. R

EFERENCES[1] Charu C Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, andS Yu Philip. Active learning: A survey. In

Data Classiﬁcation , pages599–634. Chapman and Hall/CRC, 2014.[2] H. Allcott and M. Gentzkow. Social media and fake news in the 2016election.

Journal of Economic Perspectives , 2017.[3] Rami Al-Rfou B. Perozzi and Steven Skiena. Deepwalk: Online learningof social representations. In

KDD , 2014.[4] Marco T Bastos and Dan Mercea. The brexit botnet and user-generatedhyperpartisan news.

Social Science Computer Review , 37(1):38–54,2019.[5] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spec-tral networks and locally connected networks on graphs. In arXivpreprint arXiv:1312.6203 , 2013.[6] Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, JohanBollen, Filippo Menczer, and Alessandro Flammini. Computational factchecking from knowledge networks.

PloS one , 2015.[7] Limeng Cui, Kai Shu, Suhang Wang, Dongwon Lee, and Huan Liu.defend: A system for explainable fake news detection. In

Proceedings ofthe 28th ACM International Conference on Information and KnowledgeManagement , pages 2961–2964. ACM, 2019.[8] Quanyu Dai, Qiang Li, Jian Tang, and Dan Wang. Adversarial networkembedding. In

Thirty-Second AAAI Conference on Artiﬁcial Intelligence ,2018.[9] Sanjoy Dasgupta, Adam Tauman Kalai, and Claire Monteleoni. Analysisof perceptron-based active learning. In

International Conference onComputational Learning Theory , pages 249–263. Springer, 2005.[10] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, NiLao,Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang.Knowledge vault: A web-scale approach to probabilistic knowledgefusion. In

KDD , 2014.[11] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, DavidWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen-erative adversarial nets. In

Advances in neural information processingsystems , pages 2672–2680, 2014.[12] Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new modelfor learning in graph domains. In

IJCNN , 2005.[13] Han Guo, Juan Cao, Yazi Zhang, Junbo Guo, and Jintao Li. Rumordetection with hierarchical social attention network. In

CIKM , 2018.[14] Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and AnupamJoshi. Faking sandy: characterizing and identifying fake images ontwitter during hurricane sandy. In

Proceedings of the 22nd internationalconference on World Wide Web , pages 729–736. ACM, 2013.[15] Shashank Gupta, Raghuveer Thirukovalluru, Manjira Sinha, and SandyaMannarswamy. Cimtdetect: A community infused matrix-tensor coupledfactor- ization based method for fake news detection. In arXiv preprintarXiv:1809.05252 , 2018.[16] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive repre-sentation learning on large graphs. In

NIPS , 2017.[17] Binbin Hu, Yuan Fang, and Chuan Shi. Adversarial learning onheterogeneous information networks. In

Proceedings of the 25th ACMSIGKDD International Conference on Knowledge Discovery & DataMining , 2019.[18] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News veriﬁca-tion by exploiting conﬂicting social viewpoints in microblogs. In

AAAI ,2016.[19] Yoon Kim. Convolutional neural networks for sentence classiﬁcation. arXiv preprint arXiv:1408.5882 , 2014.[20] Thomas N. Kipf and Max Welling. Semi-supervised classi cation withgraph convolutional networks. In

ICLR , 2017.[21] Yan Li and Jieping Ye. Learning adversarial networks for semi-supervised text classiﬁcation via policy gradient. In

Proceedings of the24th ACM SIGKDD International Conference on Knowledge Discovery& Data Mining , pages 1715–1723. ACM, 2018.[22] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gatedgraph sequence neural networks. In

ICLR , 2016.[23] Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. Twitter undercrisis: Can we trust what we rt? In

Proceedings of the ﬁrst workshopon social media analytics , pages 71–79. ACM, 2010. [24] J. Pennebaker, R. Boyd, K. Jordan, and K. Blackburn. The developmentand psychometric properties of liwc.

Technical Report , 2015.[25] Veronica Perez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and RadaMihalcea. Automatic detection of fake news. In arXiv preprintarXiv:1708.07104 , 2017.[26] Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendor, andBenno Stein. A stylometric inquiry into hyperpartisan and fake news.In arXiv preprint arXiv:1702.05638 , 2017.[27] Feng Qian, Chengyue Gong, Karishma Sharma, and Yan Liu. Neuraluser response generator: Fake news detection with collective userintelligence. In

IJCAI , pages 3834–3840, 2018.[28] K. Rapoza. Can ’fake news’ impact the stock market?

Forbes News ,2017.[29] Yuxiang Ren, Charu Aggarwal, and Jiawei Zhang. Activeiter: Metadiagram based active learning in social networks alignment.

IEEETransactions on Knowledge and Data Engineering , 2019.[30] Yuxiang Ren, Charu C Aggarwal, and Jiawei Zhang. Meta diagrambased active social networks alignment. In , pages 1690–1693. IEEE,2019.[31] Yuxiang Ren, Bo Liu, Chao Huang, Peng Dai, Liefeng Bo, andJiawei Zhang. Heterogeneous deep graph infomax. arXiv preprintarXiv:1911.08538 , 2019.[32] Yuxiang Ren and Jiawei Zhang. Hgat: hierarchical graph attentionnetwork for fake news detection. arXiv preprint arXiv:2002.04397 ,2020.[33] Neil Rubens, Mehdi Elahi, Masashi Sugiyama, and Dain Kaplan. Activelearning in recommender systems. In

Recommender systems handbook ,pages 809–846. Springer, 2015.[34] Victoria L Rubin and Tatiana Lukoianova. Truth and deception at therhetorical structure level.

Journal of the Association for InformationScience and Technology , 2015.[35] Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deepmodel for fake news detection. In

CIKM , 2017.[36] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner,and Gabriele Monfardini. The graph neural network model.

IEEETransactions on Neural Networks 20, 1 , 2009.[37] Francesc Serratosa and Xavier Cort´es. Interactive graph-matching usingactive query strategies.

Pattern Recognition , 48(4):1364–1373, 2015.[38] Baoxu Shi and Tim Weninger. Discriminative predicate path mining forfact checking in knowledge graphs.

Knowledge-Based Systems , 2014.[39] C. Shi, Y. Li, J. Zhang, Y. Sun, and P. Yu. A survey of heterogeneousinformation network analysis.

TKDE , 2017.[40] Kai Shu, Suhang Wang, and Huan Liu. Beyond news contents: The roleof social context for fake news detection. In

WSDM , 2019.[41] Kai Shu, Suhang Wang, and Huan Liu. Beyond news contents: The roleof social context for fake news detection. In

Proceedings of the TwelfthACM International Conference on Web Search and Data Mining , pages312–320, 2019.[42] Y. Sun and J. Han. Mining heterogeneous information networks: astructural analysis approach.

KDD Explorations , 2012.[43] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, andQiaozhu Mei. Line: Large-scale information network embedding. In

WWW , 2015.[44] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, andY. Bengio. Graph attention networks. In

ICLR , 2018.[45] Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true andfalse news online.

Science , 359(6380):1146–1151, 2018.[46] Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, BenyouWang, Peng Zhang, and Dell Zhang. Irgan: A minimax game forunifying generative and discriminative information retrieval models.In

Proceedings of the 40th International ACM SIGIR conference onResearch and Development in Information Retrieval , pages 515–524.ACM, 2017.[47] Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, and Liang Lin. Cost-effective active learning for deep image classiﬁcation.

IEEE Transac-tions on Circuits and Systems for Video Technology , 27(12):2591–2600,2016.[48] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Peng Cui, P. Yu, andYanfang Ye. Heterogeneous graph attention network. In

WWW , 2019.[49] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graphneural networks? In

ICLR , 2019.[50] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Hierarchicalattention networks for document classiﬁcation. In

NAACL , 2016.

51] Jiawei Zhang, Limeng Cui, Yanjie Fu, and Fisher B Gouza. Fakenews detection with deep diffusive network model. In arXiv preprintarXiv:1805.08751 , 2018.[52] Xinyi Zhou and Reza Zafarani. Fakenews: A survey of research, detec- tion methods, and opportunities. In arXiv preprint arXiv:1812.00315 ,2018.[53] Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled andunlabeled data with label propagation. Technical report, 2002.,2018.[53] Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled andunlabeled data with label propagation. Technical report, 2002.