Adversarial Active Learning based Heterogeneous Graph Neural Network for Fake News Detection
AAdversarial Active Learning based HeterogeneousGraph Neural Network for Fake News Detection
Yuxiang Ren ∗ , Bo Wang † , Jiawei Zhang ∗ and Yi Chang †∗ IFM Lab, Department of Computer Science, Florida State University, FL, USA † Artificial Intelligence, Jilin University, Jilin, ChinaEmail: [email protected], [email protected], [email protected], [email protected]
Abstract —The explosive growth of fake news along withdestructive effects on politics, economy, and public safety hasincreased the demand for fake news detection. Fake news onsocial media does not exist independently in the form of anarticle. Many other entities, such as news creators, news subjects,and so on, exist on social media and have relationships withnews articles. Different entities and relationships can be modeledas a heterogeneous information network (HIN). In this paper,we attempt to solve the fake news detection problem with thesupport of a news-oriented HIN. We propose a novel fakenews detection framework, namely Adversarial Active Learning-based Heterogeneous Graph Neural Network (AA-HGNN) whichemploys a novel hierarchical attention mechanism to performnode representation learning in the HIN. AA-HGNN utilizesan active learning framework to enhance learning performance,especially when facing the paucity of labeled data. An adversarialselector will be trained to query high-value candidates for theactive learning framework. When the adversarial active learningis completed, AA-HGNN detects fake news by classifying newsarticle nodes. Experiments with two real-world fake news datasetsshow that our model can outperform text-based models and othergraph-based models when using less labeled data benefiting fromthe adversarial active learning. As a model with generalizability,AA-HGNN also has the ability to be widely used in other nodeclassification-related applications on heterogeneous graphs.
Index Terms —Heterogeneous Network, Graph Neural Net-work, Fake News Detection, Data Mining
I. I
NTRODUCTION
With the widespread use of social networks, fake news hasbecome a serious social problem that cannot be ignored. Inpolitics, fake news biases people’s judgments about majorissues like Brexit [4] and the 2016 US presidential election[2]. A lot of fake news is spread on various social platformsduring the 2016 US presidential election, e.g., on Facebook.115 pro-Trump fake stories that were shared a total of 30million times, and 41 pro-Clinton fake stories being shared atotal of 7.6 million times are observed [2]. In the economicfield, the extreme sensitivity of the capital market has causedit to suffer from fake news. For instance, $130 billion is wipedout in stock value after a piece of fake news claimed that then-president Barack Obama was injured in an explosion [28]. Inpublic safety affairs, people’s responses to emergencies, fromnatural disasters to terrorist attacks, have been disrupted bythe spread of false news online [23], [14], [45]. In view ofthis, the detection and mitigation of fake news is imperative.However, detecting fake news on social media is particularlychallenging. At first, fake news is written and published
Creator News
Article Subject (b) Node Type
Write Belongs to (c) Network schema(a) Heterogeneous Information Network Creator1Creator2Creator3 News
Article1News
Article2News
Article3News
Article4 Subject1Subject2Subject3
Fig. 1. An illustrative example of a heterogenous information network basedon PoliticFact data (News-HIN). (a) A News-HIN consists three types of nodesand two types of links. (b) Three types of nodes (i.e., Creator, News article,Subject). (c) Network schema of News-HIN intentionally, so the content is carefully camouflaged. Fakecontent may account for only 1% of news articles, but it issufficient for the purpose. This makes it difficult to detectfake news simply based on news articles. Secondly, fake newsspreads much faster than real news. According to the researchin [45], many more people retweeted falsehood than they didthe truth on Twitter. Therefore, the detection of fake newshas high requirements for timeliness. Once a large numberof users have obtained false consultations, destructive effectshave already been caused. What’s more, it is expensive andtime-consuming to check and label the credibility of newsarticles by experts manually. Fake news detection methodsrequiring a large number of labels are not practical in thereal world.On social media, focusing on news articles alone is notcomprehensive, because news does not exist independently inthe form of articles. In fact, there are many entities relatedto news articles, such as news creators, news subjects andso on. These different types of entities and their relationshipsprovide a more comprehensive perspective on identifying newsarticles. A heterogeneous information network (HIN for short)[42], [39] can be utilized to represent these entities andrelationships. An illustration of such a news oriented heteroge-nous information network (News-HIN) based on
PolitiFact a r X i v : . [ c s . S I] J a n ata is presented in Figure 1. In addition to the informationprovided in the news article, we are able to collect profileinformation of news creators from social networks and othersupplementary knowledge libraries. For the news subjects,the background and auxiliary knowledge can be collected tosupport the fake news detection. With the support of a News-HIN, fake news detection task can be formulated as the nodeclassification problem. In this way, more sufficient informationand knowledge can be used to check the credibility of newsarticles.The main challenges of the fake news detection problem ina News-HIN lie in the following points: • Paucity of Training data : Fake news appears and spreadsvery quickly. The real-time nature of news also makesoutdated labels worthless. Therefore, fake news detectionoften faces the challenge of lacking valuable training data.This requires that models can effectively detect potentialfake news with the support of a small amount of trainingdata. • Heterogeneity : Multiple types of heterogeneous informationexist in a News-HIN, which can provide key signals foridentifying fake news article nodes. At the same time,learning effective node representations in a News-HIN con-sidering both structural and type information is non-trivial. • Generalizability : In order to ensure the applicability of theproposed model to diverse and possibly changing News-HINs, we need to provide a general detection model thatcan handle News-HINs containing any types of nodes anddifferent schemas.To solve these challenges aforementioned, we proposea novel A dversarial A ctive Learning-based H eterogeneous G raph N eural N etwork (AA-HGNN) to detect fake news inthe News-HIN. For the first challenge, the proposed frameworkis built on an active learning framework, where a classifier anda selector are included. By continuously querying high-valuecandidate nodes for classifier training and tuning, excellentperformance can be achieved with a small amount of labeleddata. For the second challenge, a heterogeneous graph neuralnetwork with a novel H ierarchical G raph A ttention (HGAT)mechanism is utilized in both the classifier and the selector.Based on the two-level attention mechanism (node-level &schema-level), HGAT can get the optimal combination of dif-ferent types of neighbors in a hierarchical manner. The HGAT-based classifier is responsible for conducting classificationon news article nodes. The HGAT-based selector is used toevaluate the predicted label from the classifier for high-valueselection. The selected candidate nodes will become part ofthe training set via experts labeling. The classifier and theselector are trained based on adversarial learning: with theimprovement of the predicted label quality by the classifier,the evaluation ability of the selector will be improved tocontinuously select better candidates. The overall architectureof proposed framework is shown in Figure 2. AA-HGNNhas no limitation on the structures of News-HINs, thus it hasgood generalizability and can solve the third challenge well. Labeled Nodes
Unlabeled
Nodes
HGAT-based Classifier update ˆ y Positive pairsNegative pairs
Candidates
Experts y HGAT-based Selector { ( h L , y ) } { ( h L , y ) } { ( h U , ˆ y ) }{ h U } Fig. 2. Overall Framework.
We focus on applying AA-HGNN to fake news detectiondomain in this paper, but for more general problems of nodeclassification on heterogeneous graphs, AA-HGNN is alsoapplicable.The contributions of our work are summarized as follows: • We are the first to apply adversarial active learning tofake news detection, which can achieve excellent detectionperformance with much less training data. It is of greatsignificance for fake news detection, because the urgenttimeliness of fake news detection makes sufficient trainingdata impossible. • We propose a novel adversarial active learning-based frame-work AA-HGNN which can handle the heterogenity ofNews-HINs effectively through a two-level attention mech-anism. AA-HGNN is applicable to HINs with differentschemas. • We conduct extensive experiments on two real-worlddatasets to demonstrate the effectiveness of AA-HGNN.The results show the superiority of AA-HGNN comparedwith the state-of-the-art models in detecting fake news,especially facing the paucity of training data.II. R
ELATED W ORK
A. Fake News Detection
As an emerging topic, some research works in fake newsdetection have been proposed. Content-based fake news detec-tion is based primarily on the deep mining of news content.[6], [38] extract the knowledge, a set of (Subject, Predicate,Object) triples [10], from the news content and assess theauthenticity of news by comparing them with real knowledge.However, the timeliness and integrity of the knowledge mapstill limit the application of them [52]. Writing style isextracted and utilized to measure the credibility of news bysome methods. [34] employs rhetorical structure theory toevaluate the authenticity in discourse level. [25], [26] capturethe sentiment and readability of the news content to access theextent of falsehood. But these methods based on writing stylecan be hard to work in the face of carefully camouflaged fakenews.Some methods use not only the news content, but also otherinformation related to the news. Guo et al. [13] utilize LSTMand a hierarchical attention mechanism to detect rumors, whichmakes use of social information through the proposed socialfeature. Shu et al. [7] study the explainable detection offake news with the support of both news contents and user2omments. Jin et al. evaluate news credibility within a graphoptimization framework [18]. Methods based on matrix fac-torization [40], tensor factorization [15], and recurrent neuralnetworks (RNNs) [35], [51], [32] are proposed to work on thenews-oriented networks.In this paper, we model the news content and related entitiesas a News-HIN. Both structural information and node contentof News-HIN are utilized by AA-HGNN to identify fakenews.
B. Graph Neural Network
Graph Neural Networks (GNNs) learn nodes’ new featurevectors through a recursive neighborhood aggregation scheme[12], [36], [49]. A propagation model incorporating gatedrecurrent units to propagate information across all nodes isproposed in [22]. Recently, there is a surge of generalizingconvolutional operation on the graph-structured data. JoanBruna et al. [5] extend convolution to general graphs bya novel Fourier transformation in graphs. Kipf et al. [20]propose Graph Convolutional Network (GCN). Hamilton etal. [16] introduce GraphSAGE which generates embeddingsby aggregating features from a node’s local neighborhooddirectly. Graph Attention Network (GAT) [44] first importsthe attention mechanism into graphs, which is utilized to learnthe importance of neighbors and aggregates the neighbors tolearn the representation of nodes in the graph. However, theabove graph neural networks are presented for the homoge-neous graphs. Wang et al. [48], [31] consider the attentionmechanism in heterogeneous graph learning through the modelHAN, where information from multiple meta-path definedconnections can be learned effectively. However, meta-pathas a handcrafted feature limits HAN. In addition, HAN onlyconsiders different types of connections between target nodesthrough meta-path but ignores the use of node contents carriedby different types of nodes.
C. Adversarial and Active Learning
The principle of adversarial learning is invented in genera-tive adversarial networks (GANs) by Goodfellow et al. [11].Adversarial learning principle has achieved excellent perfor-mance in many different topics, such as text classification[21], information retrieval [46], and network embedding [17],[8]. Adversarial learning method on heterogeneous networkembeddings [17] can be used to learn a more efficient repre-sentation of news nodes in News-HIN. However, in order todetect fake news, HeGAN [17] still requires a large number oflabeled data to train a classifier. Active learning is an effectiveway to train a model with less labeled data, because not alltraining samples are equally important [1]. The number oflabels needed to learn actively can be logarithmic in the usualsample complexity of passive learning [9] . Active learningalso proves its value and robustness on different topics includ-ing recommendation systems [33], social network alignment[30], [29], image classification [47] and graph matching [37].In this paper, AA-HGNN combines adversarial learningand active learning. Selectors trained in an adversarial man-ner can continuously select high-value candidates for active learning. The high-value candidates further improve the per-formance of the classifier.III. C
ONCEPT AND P ROBLEM D EFINITION
A. Terminology Definition
In order to make it easier to understand related concepts, wewill use the
PolitiFact data as an example to illustrate here. The
PolitiFact data contain News articles, Subjects and Creators,which can be modeled into a heterogeneous network as threetypes of nodes and construct different types of links based onthe connections among them. We can define News OrientedHeterogeneous Information Networks (News-HIN) formally asfollows:D
EFINITION (News Oriented Heterogeneous InformationNetworks (News-HIN)): The news oriented heterogeneous in-formation network (News-HIN) can be defined as G = ( V , E ) ,where the node set V = C ∪ N ∪ S . C , N and S representCreators, News articles and Subjects respectively. We willdefine different types of nodes in detail later. The link set E = E c,n ∪ E n,s involves the ”Write” links between creatorsand news articles, and the ”Belongs to” links between newsarticles and subjects.News articles refer to the news content post on social mediaor public platforms. We can define news articles in a formalway as:D EFINITION (News Articles): The News articles set canbe represented as N = { n , n , · · · , n m } . For each newsarticle n i ∈ N , it contains its textual contents.The credibility label of n i takes value from the label set Y = { F ake, Real } . In this paper, the original label set contains 6different class labels (True, Mostly True, Half True, MostlyFalse, False, Pants on Fire). We group the labels Pants onFire, False, Mostly False as fake news and group True, MostlyTrue, Half True as real news. Subjects denote the central ideasof news articles, which normally are the main objectives ofwriting news articles.D EFINITION (Subjects): The set of subjects can bedenoted as S = { s , s , · · · , s k } . For each subject s i ∈ S ,it contains the textual description.Creators denote people who write news articles. We can alsodefine this concept in a formal way.D EFINITION (Creators): The set of creators can berepresented as C = { c , c , · · · , c n } . For each creator c i ∈ C ,it contains the profile information.In the PolitiFact dataset, the creators have the profile contain-ing their titles, political party membership, and geographicalresidential locations. The profile information can be describedby a sequence of words.In order to better understand the News-HIN and utilizetype information, it is necessary to define the schema-leveldescription. The schema of News-HIN serves for learning theimportance of nodes and links with different types.D EFINITION (News-HIN Schema): Formally, the schemaof the given News-HIN G = ( V , E ) can be represented as S G = ( V T , E T ) , where V T and E T denote the set of node3ypes and link types in the network respectively. Here, V T = { φ n , φ c , φ s } and E T = { Write , Belongs to } .An illustration of News-HIN Schema based on the PolitiFactdata is shown in Figure 1(c). B. Problem Formulation
Given a News-HIN G = ( V , E ) , the fake news detectionproblem aims at learning a classification function f : N → Y to classify news article nodes in the set N into the correctclass with the credibility label in Y . The news article nodeswith labels can be grouped as a labeled set L and the rest newsarticle nodes will be denoted as the unlabeled set U = N \ L .Based on the active learning setting, we are also allowed toquery for labels of news article nodes in U with a upper limitbudget b . We also want to propose a mechanism to achievean optimal query set U q to improve the classification function f : N → Y . To resolve the above fake news detection problem,we will introduce the proposed adversarial active learningbased heterogeneous graph neural network AA-HGNN inSection IV. IV. P
ROPOSED M ETHOD
In this section, we propose a novel A dversarial A ctiveLearning based H eterogeneous G raph N eural N etwork (AA-HGNN) to detect fake news. As shown in Figure 2, AA-HGNN consists of two major components: (1) HGAT-basedclassifier, and (2) HGAT-based selector. We begin with theoverview of the model, followed by detailed descriptions of thehierarchical graph attention neural network (HGAT). Then weillustrate the HGAT-based classifier and HGAT-based selectorrespectively. At last, we elaborate on the optimization of AA-HGNN. A. Model Overview
The architecture of AA-HGNN is shown in Figure 2.The News-HIN G is the input of the HGAT-based classifier. h L and h U denote the initial feature of a labeled node andan unlabeled node respectively. The HGAT-based classifieris trained with both labeled and unlabeled data to predictlabels { ˆ y } for unlabeled news article nodes. The HGAT-basedselector evaluates the quality of predicted labels and selectshigh-value candidates from them based on a query strategy.We take the pairs of labeled nodes and their ground-truthlabels { y } as positive samples, and the pairs of unlabelednodes and their predicted labels { ˆ y } are used as negativesamples. A portion of positive and negative pairs are sampledto train the HGAT-based selector. After being trained, theselector outputs the confidence P of pairs in the test set. Basedon the confidence, the proposed selection strategy selectsa set of high-value unlabeled nodes as candidates with thesize k . These candidates will be labeled by experts. In ourexperiments, these candidates will be moved to the trainingset before next round optimization. A query budget b ispre-specified for AA-HGNN. When the query budget b isexceeded, the adversarial active learning stops.Since Hierarchical Graph Attention Neural Network(HGAT) is the basis of the classifier and the selector, which is Projection
Node-Level Attention
Neighbor Type S … h n i h n i Projection Projection … S n i … Neighbour Types
Projection
Node-Level Attention
Neighbor Type C h n i h n i Projection Projection … … C n i h c i h c i Schema-Level Attention
Node-Level AttentionSchema-Level Attention r n i h s i h s i Fig. 3. Hierarchical Graph Attention Neural Network. the key to handling the heterogeneity, we will first introduceHGAT in detail in the next section.
B. Hierarchical Graph Attention Neural Network (HGAT)
The novel HGAT employs a two-level attention mechanismincluding node-level attention and schema-level attention. Thestructure of HGAT is shown in Figure 3. Node-level attentionis responsible for learning the weights of neighbors belongto the same type and aggregates them to get the type-specificneighbor representation. Schema-level attention enables HGATto learn the information of node types and get the optimalweighted combination of the type-specific neighbor represen-tations. Through the two-level attention mechanism, the rep-resentations of news article nodes contain both the structuraland node content information.
1) Node-level attention:
The node-level attention can learnthe importance of neighbors belong to the same type re-spectively for each news article node n i ∈ N , and thenaggregates the representation of same-type neighbors to forman integrated representation which we define as a schema node.The inputs of the node-level attention layer are the nodeinitial feature vectors { h } . Because multiple types of nodesexist in the News-HIN, the initial feature vectors belong tofeature spaces with different dimensions. In order to enablethe attention mechanism to output comparable and meaningfulweights between different types of nodes, we first utilize atype-specific transformation matrix to project features withdifferent dimensions into the same feature space. We take thenews article node n i ∈ N as an example. The transformationmatrix for type φ n is M φ n ∈ R F × F φn , where F φ n is thedimension of the initial feature h n i ∈ R F φn of the news articlenode n i and F is the dimension of the feature space mappedto. The projection process can be shown as follows: h (cid:48) n i = M φ n · h n i (1)4 i c l c m c n n i n i n l (a) Node-level aggregation avgavgavg C n i N n i S n i concate/avg (b) Schema-level aggregation r n i ↵ c in ↵ c il ↵ c im ↵ n il ↵ n ii ↵ s il ↵ s im ↵ s in s l s m s n c i s i n i Fig. 4. Explanation of aggregating process in node-level and schema-level.
The h (cid:48) n i is the projected feature of node n i . The F is thesame for all type-specific transformation matrices. Through thetype-specific projection operation, the feature space of nodeswith different types can be unified where the self-attentionmechanism can work on to learn the weight among variouskinds of nodes.In the face of fake news detection, the target node is thenews article node n i ∈ N . The neighbors of it belong to N ∪ S ∪ C . It should be noted that we also regard the targetnode itself as a neighbor node to cooperate the self-attentionmechanism. We let T ∈ {N , S , C} and nodes in T have thetype φ t . For n i ’s neighbor nodes in T , the node-level attentioncan learn the importance e φ t ij which means how important node t j ∈ T will be for n i . The importance of the node pair ( n i , t j ) can be formulated as follows: e φ t ij = att ( h (cid:48) n i , h (cid:48) t j ; φ t ) (2)Here, the node-level attention att denotes the same deepneural network as [44]. att is shared for all neighbor nodeswith the same type φ t . The masked attention captures the net-work structure information where only node t j ∈ neighbor n i (being neighbors of node n i ) will be calculated and recordedas e φ t ij . Otherwise, the attention weight will be 0. We normalizethem to get the weight coefficient α φ t ij via softmax function: α φ t ij = softmax j ( e φ t ij ) = exp ( e φ t ij ) (cid:80) t k ∈ neighbor ni e φ t ik (3)Then, the schema node T n i can be aggregated by theneighbor’s projected features with the corresponding weightsas follows: T n i = σ ( (cid:88) t j ∈ neighbor ni α φ t ij · h (cid:48) t j ) (4)Similar to Graph Attention Network (GAT) [44], a multi-head attention mechanism can be used to stabilize the learningprocess of self-attention in node-level attention. In details, K independent node-level attentions execute the transformationof Equation (4), and then the features achieved by K headswill be concatenated, resulting in the output representation ofthe schema node: T n i = K (cid:107) k =1 σ ( (cid:88) t j ∈ neighbor ni α φ t ij · h (cid:48) t j ) (5) HGAT
Classification Layer
Classifier r n i ˆ y Output Layer
Selector r n i ˆ y P (ˆ y ; r n i ) Query Strategy
HGAT
Fig. 5. HGAT-based Classifier and HGAT-based Selector. where (cid:107) represents concatenation. In the problem we face,every target node n i has 3 schema nodes correspondingto 3 different types neighbors (include itself) based on theDefinition 5. They can be denoted as N n i , C n i , S n i .
2) Schema-level attention:
Through the node-level atten-tion, we fuse information from neighbor nodes with the sametype into the representation of a schema node. Now, HGATneeds to learn the representation of news article nodes from allschema nodes. Different schema nodes contain type-specificinformation, which requires us to learn the importance of dif-ferent node types. Here, the schema-level attention is proposedto learn the importance of different schema nodes, and finallyuse the learned coefficients for weighted combination.In order to obtain sufficient expressive power to calculatethe attention weights between schema nodes, one learnablelinear transformation is applied to the schema nodes. Thelinear transformation is parametrized by a weight matrix W ∈ R F (cid:48) × KF , where K is the number of heads in node-levelattention. The schema-level attention schema is a single-layerfeedforward neural network applying the activating functionSigmoid with the dimension F (cid:48) . For the schema node T n i ,the importance of it can be denoted as w φ t i : w φ t i = schema ( W T n i , W N n i ) (6)We normalize the imoportance of each schema nodesthrough a softmax function. Then coefficients of the finalfusion can be denoted as β φ t i , which can be calculated asfollows: β φ t i = softmax t ( w φ t i ) = exp ( w φ t i ) (cid:80) φ ∈V T exp ( w φi ) (7)Based on the learned coefficients, we can fuse all schemanodes to get the final representation r n i ∈ R F (cid:48) of the targetnode n i : r n i = (cid:88) φ t ∈V T β φ t i · T n i (8)We also describe the two-level aggregating process in Fig-ure 4 for reference.5 . HGAT-based Classifier As shown in the left side of Figure 5, HGAT and aclassification layer constitute a HGAT-based classifier. Theinput of HGAT-based classifier is the same as HGAT, whichare the initial feature vectors of nodes. The classification layercan output the predicted labels { ˆ y } of unlabeled news articlenodes. In our experiments, a logistic regression layer worksas the classification layer.For the fake news detection tasks, the optimization objectivefunction of the HGAT-based classifier can leverage the cross-entropy loss minimization. The HGAT-based classifier can beoptimized in an end-to-end manner by backpropagation. Wedefine the set of labeled news article nodes as N L and the setof unlabelled news article nodes as N U , then the cross-entropyloss we used can be written as: Loss classifier = − (cid:88) n i ∈N L ( y n i log ( p n i )+(1 − y n i ) log (1 − p n i )) (9)Here, y n i is a binary indicator (0 or 1) indicating if the binaryclass label is the correct classification for the news article noderepresentation r n i . p n i is the predicted probability of labelednews article node n i .When the optimization is completed, the predicted proba-bility of unlabeled news article nodes in N U are rounded andcast into predicted labels { ˆ y } . The predicted labels { ˆ y } willbe evaluated by the HGAT-based selector which is describedin the next section. D. HGAT-based Selector
The structure of a HGAT-based selector is shown in the rightside of Figure 5. The inputs of the layers of HGAT are theinitial feature vectors { h } . Based on the learned representation r n i , we then concatenate r n i with the predicted label ˆ y (orthe ground-truth label y of the labeled node). We denote thisconcatenated vector as z n i ∈ R ( F (cid:48) +1) : z n i = [ r n i , ˆ y ] (10)The purpose of the HGAT-based selector is to evaluate theprobability that how likely the z n i is from the set of labelednews article nodes N L . A higher possibility represents that anews article node matches the predicted label better. At thesame time, if a node does not match the predicted label, it islikely to indicate that the predicted label is wrong. The outputlayer is responsible for predicting the probability P (ˆ y ; r n i ) .Here, we use a logistic regression layer as the output layer.We sample z n j , n j ∈ N L as the positive samples, and thesame number of z n k , n k ∈ N U are sampled as the negativesamples. These positive and negative samples constitute thetraining set for the HGAT-based selector. The loss functionused by HGAT-based selector is a cross-entropy loss: Loss selector = − (cid:88) ( y log ( P ) + (1 − y ) log (1 − P )) (11) y ∈ { , } denotes the negative-positive label of the concate-nated vector in training set. P is the predicted probability of Algorithm 1:
Adversarial Active optimization of AA-HGNN
Input:
The News-HIN G = ( V , E ) ; The set of labeled newsarticle nodes N L ; The set of unlabeled news articlenodes N U ; The query budget b ; The query batch size k ; Number of samples m ; U q = ∅ ; while |U q | < b do (cid:46) Optimization for HGAT-based classifier ; begin Train the HGAT-based classifier on N L via Eq.9; Predict the labels of nodes in N U ; Update the set of predicted labels { ˆ y } ; (cid:46) Optimization for HGAT-based selector ; begin Sample m nodes from N L to construct positivesamples via Eq.10, i.e., z n j , n j ∈ N L ; Sample m nodes from N U to construct negativesamples via Eq.10, i.e., z n k , n k ∈ N U ; Train the HGAT-based selector on positive andnegative samples; Predict the probability P via Eq.11; Query k candidates based on Definition 6; U q = U q ∪ { candidates } ; Labeling k candidates by experts; N L = N L ∪ { candidates } ; N U = N U \ { candidates } ; return The set of predicted labels { ˆ y } label being positive. This loss function can be optimized bybackpropagation.The rest concatenated vectors of unlabeled news articlenodes are in the testing set. After training, the HGAT-basedselector will output the probability P for testing samples.Based on the probability, we propose a query strategyto select high-value candidates for active learning. As wementioned before, a lower probability P indicates that theunlabeled news article node and the predicted label do notmatch. It also represents there is a high probability that thepredicted label will be wrong. Obviously, if the news articlenode we query was not able to be classified correctly by theHGAT-based classifier, then it will be more ”informative” thanthe nodes that have been correctly classified. Besides, we canmake it as part of the training set in the next round of trainingafter experts labeling, thereby correcting the misclassifiednodes in the test set for similar reasons. So the query strategyis:D EFINITION (Query Strategy): All samples in the test setwill be sorted in ascending order according to the predictedprobability P , the top k candidates will be added to U q . Here, k denotes the query batch size. E. Adversarial Active Optimization
In AA-HGNN, the HGAT-based classifier and the HGAT-based selector cooperate in an adversarial active manner. Weadopt the iterative optimization to train these components inAA-HGNN. In each iteration, the HGAT-based classifier andthe HGAT-based selector have trained alternately. Specifically,6e first train the HGAT-based classifier to output the predictedlabels. Then the HGAT-based selector will be trained by thepredicted labels from the classifier. Based on the optimizedselector, k candidates will be queried in one iteration and beadded to U q used as training data in the next iteration. Eachtime k candidates are obtained, the classification performanceof the HGAT-based classifier can be improved in the nextiteration. As a consequence, the credibility of predicted labelswill be increased. Better predicted labels further improve theevaluation performance of the HGAT-based selector. We repeatthe above iteration until the size of U q exceeds the querybudget b . The adversarial active optimization of AA-HGNNis described in Algorithm 1.V. E XPERIMENTS
To test the effectiveness of AA-HGNN, extensive exper-iments are designed and conducted on two real-world fakenews datasets. We first introduce the datasets. Then experi-mental settings are provided. We aim to answer the followingevaluation questions based on experimental results togetherwith the detailed analysis: • Question 1 : Can AA-HGNN improve fake news detectionperformance by modeling data as a News-HIN? • Question 2 : Can Hierarchical Graph Attention (HGAT)mechanism handle the heterogeneity of the News-HIN ef-fectively? • Question 3 : Can the active learning setting of AA-HGNNovercome the paucity of training data? • Question 4 : Can adversarial learning between the classifierand the selector significantly help improve the performance?
A. Dataset Description
TABLE IP
ROPERTIES OF THE H ETEROGENEOUS N ETWORKS
PolitiFact Network BuzzFeed Network
We use two datasets to verify our model in experiments. Themain dataset is collected from the platform with fact-checking:
PolitiFact , which is operated by Tampa Bay Times. The newsafter fact-checking from
PolitiFact mainly are the statementsor news articles posted by the politicians (Congress members,White House staffs, lobbyists) and political groups. They arecreators of news articles in our experiments. Regarding thesenews articles,
PolitiFact will provide the original contents,fact-checking results and comprehensive fact-checking reportson the website. When presenting these news articles, theplatform will categorize them into different subjects based oncontents and topics. A brief description of each subject willbe provided as well. The fact-checking results can indicate thecredibility of corresponding news articles and take values from { True, Mostly True, Half True, Mostly False, False, Pants onFire! } . In the PolitiFact dataset, news articles are marked as ”Pants on Fire”, while the number of news articles with”False” is . Besides, ”Mostly False” news articlesand ”Half True” news articles exist in the dataset. Thenumber of ”Mostly True” and ”True” news is and respectively. We group the labels { Pants on fire, False, MostlyFalse } as fake news and group { True, Mostly True, Half True } as real news, the quantity of fake news is correspondingto real news. The fact-checking results will be usedas the ground truth in experiments. We won’t make use ofcomprehensive fact-checking reports in this paper. We haveestablished a heterogeneous information network based on the PolitiFact dataset. The HIN includes three types of nodes:article, creator and subject and two types of links: Write(between article and creator) and Belongs to (between articleand subject). In order to verify the generalization and stabilityof AA-HGNN, we use a public dataset
BuzzFeed from Shu etal.[41]. BuzzFeed contains real news articles and fakenews articles. We also construct a HIN based on BuzzFeed dataset. There exist three types of nodes: article, twitter userand publisher. The key statistical data describing the HINs canbe found in Table I.
B. Experimental Settings1) Experimental Setup:
In the experiments, we are able toacquire the set of news article nodes which are the target nodeto conduct the classification. For the
PolitiFact dataset, thefact-checking results corresponding to news articles are used asthe ground truth for model learning and evaluation. We groupfact-checking results { Pants on fire, False, Mostly False } as aFake class and group { True, Mostly True, Half True } as a Realclass. Because our target is to detect fake news, we treat Fakeclass as the positive class and Real class as the negative class.For all comparison methods, we use 20% of news article nodesas the training set and 10% of the nodes as the validation set.In addition, the testing ratio is fixed as 10%. For AA-HGNN,we use 1000 nodes to initialize the active learning. The querybudget b is 1800 and the query batch size k is 200. In this way,2800 nodes (20% of news article nodes) are utilized to trainAA-HGNN finally. BuzzFeed dataset has only two types oflabels: True and fake, we can use it directly. The rest settingis the same as the
PolitiFact dataset. We run the experimentson a Dell PowerEdge T630 Server with 2 20-core Intel CPUsand 256GB memory and the other Server with 3 GTX-1080ti GPUs. Code is available at the link .
2) Data Preprocessing:
Two datasets both contain textualdata with different length. In order to fit to the non-sequentialmodels, we have to transform the input features of each typeof nodes into a vector with a fixed length. To deal with theproblem, we use
TfidfVectorizer in Sklearn package to extractfeatures. For the
PolitiFact dataset, the dimensions of initialfeatures of news articles, creators, and subjects are 3000, 3109,and 191 respectively. For the
BuzzFeed dataset, the parameter max features for the news article nodes is set as 3000. https://github.com/KaiDMML/FakeNewsNet/tree/old-version ) Comparison Methods: We classify comparison methodsinto three categories: Graph neural network methods, Textclassification methods, and Network embedding methods.
Graph neural network methods • AA-HGNN : AA-HGNN is the proposed model. • AA-HGNN entropy : We keep the active learning setting ofAA-HGNN, but query the candidates according to entropy.Here, we define that the closer the probability of this nodebeing fake news to 0.5, the higher its entropy. • AA-HGNN random : Here, we query the candidates foractive learning randomly. • HGAT-based classifier : It is the classifier in the proposedAA-HGNN. We test the performance without active learn-ing setting. • HAN [50]: HAN employs node-level attention andsemantic-level attention to capture the information from allmeta-paths. In our experiments, we utilize two meta-paths(article-creator-article, article-subject-article) in HAN. • GAT [44]: GAT is also an attention-based graph neuralnetwork for the node classification, but it is designed forhomogeneous graphs. The News-HIN is treated as a homo-geneous graph (ignore the type information) when testingthe model. • GCN [20]: GCN is a semi-supervised methods for the nodeclassification in homogeneous graphs. The News-HIN istreated as a homogeneous graph when testing it.
Text classification methods • SVM : SVM is a classic supervised learning model. Thefeature vector used for building the SVM model is extractedmerely based on the news article contents with TF-IDF. • Text-CNN [19]: Text-CNN is a text classification methodbased on convolutional neural network. It utilizes convolu-tion filters of various sizes to capture key local features innews contents. • LIWC [24]: LIWC stands for Linguistic Inquiry and WordCount, which is widely used to extract the lexicons fallinginto psycho-linguistic categories. It learns a feature vectorfrom psychology and deception perspective.
Network embedding methods (NE) • Label Propagation (LP) [53]: LP is merely based on thenetwork structure. The prediction scores will be rounded andcast into labels. • DeepWalk [3]: A random walk based embedding method,which is designed to deal with the homogeneous network.Based on the embedding results, we then train a logisticregression model to perform the classification of newsarticles. • LINE [43]: LINE optimizes the objective function thatpreserves the local and the global network structure simulta-neously. We also learn a logistic regression model to conductthe classification based on the learned embeddings.We have also noticed some recently appeared methods forfake news detection [7], [40], [27], but did not compare them.The main consideration is the difference between the scenarioswe face. In above works, they all utilize social context like user comments, but AA-HGNN aims at detecting fake news in arelatively early stage with less labeled data. We won’t utilizeuser comments about the news or large amount of trainingdata, because when many users have started to discuss onefake news, the bad influence of fake news has spread.
C. Experimental Results with Analysis1) Assessing Impact of News-HIN:
In order to answer
Question 1 , we first present experiment results in Table IIto compare AA-HGNN with three categories of methodsintroduced in Section V-B3. For text classification methodsSVM, LIWC and Text-CNN which use the textual informationof news article nodes to do classification, we see that Text-CNN > SVM & LIWC in all metrics. This result shows thatText-CNN can better capture the important textual featuresin news contents by utilizing multiple convolution filters. ForNetwork embedding methods relying on graph structures, allof them achieve a poor recall. Recall is a pretty critical metricfor fake news detection problem. A low recall means weomit lots of fake news so that they will cause bad socialinfluence, which is unexpected. A News-HIN integrates allheterogeneous available data in the form of a graph structure.Intuitively, methods (AA-HGNN, HAN) making full use ofNew-HIN as training data achieve better results. Throughthe comparison among GNNs methods, we verify that theheterogeneity of networks should be dealt with in a moreeffective way. If we simply treat a heterogeneous network asa homogeneous network by ignoring the type, then the results(reported by GAT, GCN) would be very disappointing. Wecontinue to discuss performance concerning heterogeneity inthe next section.
Train with 1200 nodes
Fig. 6. The advantage in training with less labeled data.
2) Methods performance on Heterogeneous graph:
To an-swer
Question 2 , we further investigate the performance ofdifferent GNN methods besides AA-HGNN and its variants.As we utilize a heterogeneous network as source data, theheterogeneity should be handle in an effective manner. InTable II, we observe HGAT achieves the best accuracy, recalland F1. GAT and GCN get high precision but low recall.Particularly for the
PolitiFact dataset, GCN reach 0.9688 inprecision but 0.0246 in recall. This result occurs because theyprefer to classify a sample as real news based on News-HIN.They are not powerful methods in fake news problem becausethey were originally designed for homogeneous networks.Also as a method for heterogeneous graphs, HGAT-basedclassifier also shows an advantage over HAN. As the basicclassifier, HGAT-based classifier can handle the heterogeneityof News-HIN well.8
ABLE IIP
ERFORMANCE COMPARISON OF DIFFERENT METHODS . T
HE TRAINING RATIO IS
PolitiFact BuzzFeed
Methods Accuracy Precision Recall F1 Accuracy Precision Recall F1 T e x t SVM 0.5432 0.4975 0.32 0.3894 0.5398 0.6011 0.5109 0.5523LIWC 0.4544 0.4415 0.23 0.3023 0.6137 0.6459 0.5885 0.6175Text-CNN 0.5658 0.5873 0.2824 0.3814 0.6317 0.6415 0.6233 0.6322 N E Label Propagation 0.5796 0.7005 0.1164 0.1996 0.5867 0.6409 0.223 0.3309DeepWalk 0.5297 0.4639 0.2881 0.4639 0.3721 0.3083 0.4322 0.3599LINE 0.5012 0.4109 0.1215 0.4109 0.5899 0.6123 0.3057 0.4077
GNN s GAT 0.5765 0.7569 0.0453 0.0854 0.5885 0.654 0.3367 0.4445GCN 0.5611 random entropy
TABLE IIIA
DVERSARIAL A CTIVE L EARNING P ERFORMANCE OF
AA-HGNN IN PolitiFact
Number of training nodesMetrics 1000 1200 1400 1600 1800 2000 2200 2440 2600 2800Accuracy 0.5658 0.5878 0.6049 0.6053 0.6013 0.5984 0.597 0.597 0.5955 0.6155Precision 0.5142 0.5246 0.5218 0.5245 0.5135 0.5115 0.516 0.5136 0.5342 0.5661Recall 0.3241 0.4526 0.4869 0.5065 0.5277 0.5441 0.5539 0.5523 0.5688 0.5804F1 0.3975 0.4859 0.5038 0.5154 0.5205 0.5273 0.5342 0.5323 0.5456 0.5732
Budget F AA-HGNNAA-HGNN_entropyAA-HGNN_random (a) F1
Budget R e c a ll AA-HGNNAA-HGNN_entropyAA-HGNN_random (b) Recall
Budget P r e c i s i o n AA-HGNNAA-HGNN_entropyAA-HGNN_random (c) Precision
Budget A cc u r a c y AA-HGNNAA-HGNN_entropyAA-HGNN_random (d) AccuracyFig. 7. Performance Analysis of Query Strategy in
PolitiFact
3) Active learning setting on scarce training data:
Toanswer
Question 3 , we draw Figure 6 to compare the per-formance of HGAT-based classifier and AA-HGNN. The F1score of the classifier shown in Figure 6 is achieved with 2800training nodes. In comparison, AA-HGNN can outperform theclassifier when being trained with 1200 labeled nodes. Besides,the score of AA-HGNN applying the active learning settingsignificantly increased. When the number of training nodes is2800, the performance of AA-HGNN increase nearly 9% thanthe model without the active learning setting. From Table II,we can observe that AA-HGNN has the apparent advantagewhen using 20% training ratio, while other mehtods can not perform well due to the paucity of training data. Also, we seeAA-HGNN can reach satisfactory result although the trainingdata is even more scarce in Table III.
4) Adversarial learning impacts on Active Learning:
In order to answer
Question 4 , we build two variantsAA-HGNN entropy and AA-HGNN random to demonstratethe adversarial learning setting’s efforts. These two varientsprovide different query strategies for active learning. Basedon the results of comparative experiments in Figure 7, itis obvious that AA-HGNN outperforms AA-HGNN entropy and AA-HGNN random in every query batch. The adversariallearning between the classifier and the selector indeed providesan effective query strategy for the active learning. The queriedcandidates are of high value for the classifier, so the perfor-mance of the classifier can be significantly improved. Besides,the adversarial learning-based query strategy can consistentlyprovide high-value candidates, as the performance of selectorsalso improves in adversarial learning.VI. C
ONCLUSION
In this paper, we study the HIN-based fake news detectionproblem and propose a novel adversarial active learning-based graph neural network AA-HGNN to solve it. AA-HGNN employs a novel hierarchical attention mechanismto deal with the heterogeneity of News-HIN and learnstextual and structural information simultaneously. An activelearning framework is applied in AA-HGNN to enhance thelearning performance, especially when facing the paucity oflabeled data. A selector is trained in an adversarial mannerto query high-value candidates for the active learning setting.Experiments with real-world fake news data show that ourmodel can outperform text-based models and other graph-based models when using less labeled data. Experiments alsoverify the effectiveness of adversarial learning-based querystrategy, which consistently queries high-value candidates toimprove the performance. As an adversarial active learning-based model, AA-HGNN is ideal for detecting fake news inthe early stages when lacking training data. Finally, due to thegood generalizability of AA-HGNN, it has the ability to bewidely used in other node classification-related applicationson heterogeneous graphs, where there will be no obstacles tothe transfer.9II. A
CKNOWLEDGEMENT
This work is partially supported by NSF through grant IIS-1763365 and by FSU. R
EFERENCES[1] Charu C Aggarwal, Xiangnan Kong, Quanquan Gu, Jiawei Han, andS Yu Philip. Active learning: A survey. In
Data Classification , pages599–634. Chapman and Hall/CRC, 2014.[2] H. Allcott and M. Gentzkow. Social media and fake news in the 2016election.
Journal of Economic Perspectives , 2017.[3] Rami Al-Rfou B. Perozzi and Steven Skiena. Deepwalk: Online learningof social representations. In
KDD , 2014.[4] Marco T Bastos and Dan Mercea. The brexit botnet and user-generatedhyperpartisan news.
Social Science Computer Review , 37(1):38–54,2019.[5] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spec-tral networks and locally connected networks on graphs. In arXivpreprint arXiv:1312.6203 , 2013.[6] Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, JohanBollen, Filippo Menczer, and Alessandro Flammini. Computational factchecking from knowledge networks.
PloS one , 2015.[7] Limeng Cui, Kai Shu, Suhang Wang, Dongwon Lee, and Huan Liu.defend: A system for explainable fake news detection. In
Proceedings ofthe 28th ACM International Conference on Information and KnowledgeManagement , pages 2961–2964. ACM, 2019.[8] Quanyu Dai, Qiang Li, Jian Tang, and Dan Wang. Adversarial networkembedding. In
Thirty-Second AAAI Conference on Artificial Intelligence ,2018.[9] Sanjoy Dasgupta, Adam Tauman Kalai, and Claire Monteleoni. Analysisof perceptron-based active learning. In
International Conference onComputational Learning Theory , pages 249–263. Springer, 2005.[10] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, NiLao,Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang.Knowledge vault: A web-scale approach to probabilistic knowledgefusion. In
KDD , 2014.[11] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, DavidWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen-erative adversarial nets. In
Advances in neural information processingsystems , pages 2672–2680, 2014.[12] Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new modelfor learning in graph domains. In
IJCNN , 2005.[13] Han Guo, Juan Cao, Yazi Zhang, Junbo Guo, and Jintao Li. Rumordetection with hierarchical social attention network. In
CIKM , 2018.[14] Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and AnupamJoshi. Faking sandy: characterizing and identifying fake images ontwitter during hurricane sandy. In
Proceedings of the 22nd internationalconference on World Wide Web , pages 729–736. ACM, 2013.[15] Shashank Gupta, Raghuveer Thirukovalluru, Manjira Sinha, and SandyaMannarswamy. Cimtdetect: A community infused matrix-tensor coupledfactor- ization based method for fake news detection. In arXiv preprintarXiv:1809.05252 , 2018.[16] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive repre-sentation learning on large graphs. In
NIPS , 2017.[17] Binbin Hu, Yuan Fang, and Chuan Shi. Adversarial learning onheterogeneous information networks. In
Proceedings of the 25th ACMSIGKDD International Conference on Knowledge Discovery & DataMining , 2019.[18] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News verifica-tion by exploiting conflicting social viewpoints in microblogs. In
AAAI ,2016.[19] Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 , 2014.[20] Thomas N. Kipf and Max Welling. Semi-supervised classi cation withgraph convolutional networks. In
ICLR , 2017.[21] Yan Li and Jieping Ye. Learning adversarial networks for semi-supervised text classification via policy gradient. In
Proceedings of the24th ACM SIGKDD International Conference on Knowledge Discovery& Data Mining , pages 1715–1723. ACM, 2018.[22] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gatedgraph sequence neural networks. In
ICLR , 2016.[23] Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. Twitter undercrisis: Can we trust what we rt? In
Proceedings of the first workshopon social media analytics , pages 71–79. ACM, 2010. [24] J. Pennebaker, R. Boyd, K. Jordan, and K. Blackburn. The developmentand psychometric properties of liwc.
Technical Report , 2015.[25] Veronica Perez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and RadaMihalcea. Automatic detection of fake news. In arXiv preprintarXiv:1708.07104 , 2017.[26] Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendor, andBenno Stein. A stylometric inquiry into hyperpartisan and fake news.In arXiv preprint arXiv:1702.05638 , 2017.[27] Feng Qian, Chengyue Gong, Karishma Sharma, and Yan Liu. Neuraluser response generator: Fake news detection with collective userintelligence. In
IJCAI , pages 3834–3840, 2018.[28] K. Rapoza. Can ’fake news’ impact the stock market?
Forbes News ,2017.[29] Yuxiang Ren, Charu Aggarwal, and Jiawei Zhang. Activeiter: Metadiagram based active learning in social networks alignment.
IEEETransactions on Knowledge and Data Engineering , 2019.[30] Yuxiang Ren, Charu C Aggarwal, and Jiawei Zhang. Meta diagrambased active social networks alignment. In , pages 1690–1693. IEEE,2019.[31] Yuxiang Ren, Bo Liu, Chao Huang, Peng Dai, Liefeng Bo, andJiawei Zhang. Heterogeneous deep graph infomax. arXiv preprintarXiv:1911.08538 , 2019.[32] Yuxiang Ren and Jiawei Zhang. Hgat: hierarchical graph attentionnetwork for fake news detection. arXiv preprint arXiv:2002.04397 ,2020.[33] Neil Rubens, Mehdi Elahi, Masashi Sugiyama, and Dain Kaplan. Activelearning in recommender systems. In
Recommender systems handbook ,pages 809–846. Springer, 2015.[34] Victoria L Rubin and Tatiana Lukoianova. Truth and deception at therhetorical structure level.
Journal of the Association for InformationScience and Technology , 2015.[35] Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deepmodel for fake news detection. In
CIKM , 2017.[36] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner,and Gabriele Monfardini. The graph neural network model.
IEEETransactions on Neural Networks 20, 1 , 2009.[37] Francesc Serratosa and Xavier Cort´es. Interactive graph-matching usingactive query strategies.
Pattern Recognition , 48(4):1364–1373, 2015.[38] Baoxu Shi and Tim Weninger. Discriminative predicate path mining forfact checking in knowledge graphs.
Knowledge-Based Systems , 2014.[39] C. Shi, Y. Li, J. Zhang, Y. Sun, and P. Yu. A survey of heterogeneousinformation network analysis.
TKDE , 2017.[40] Kai Shu, Suhang Wang, and Huan Liu. Beyond news contents: The roleof social context for fake news detection. In
WSDM , 2019.[41] Kai Shu, Suhang Wang, and Huan Liu. Beyond news contents: The roleof social context for fake news detection. In
Proceedings of the TwelfthACM International Conference on Web Search and Data Mining , pages312–320, 2019.[42] Y. Sun and J. Han. Mining heterogeneous information networks: astructural analysis approach.
KDD Explorations , 2012.[43] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, andQiaozhu Mei. Line: Large-scale information network embedding. In
WWW , 2015.[44] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, andY. Bengio. Graph attention networks. In
ICLR , 2018.[45] Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true andfalse news online.
Science , 359(6380):1146–1151, 2018.[46] Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, BenyouWang, Peng Zhang, and Dell Zhang. Irgan: A minimax game forunifying generative and discriminative information retrieval models.In
Proceedings of the 40th International ACM SIGIR conference onResearch and Development in Information Retrieval , pages 515–524.ACM, 2017.[47] Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, and Liang Lin. Cost-effective active learning for deep image classification.
IEEE Transac-tions on Circuits and Systems for Video Technology , 27(12):2591–2600,2016.[48] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Peng Cui, P. Yu, andYanfang Ye. Heterogeneous graph attention network. In
WWW , 2019.[49] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graphneural networks? In
ICLR , 2019.[50] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Hierarchicalattention networks for document classification. In
NAACL , 2016.
51] Jiawei Zhang, Limeng Cui, Yanjie Fu, and Fisher B Gouza. Fakenews detection with deep diffusive network model. In arXiv preprintarXiv:1805.08751 , 2018.[52] Xinyi Zhou and Reza Zafarani. Fakenews: A survey of research, detec- tion methods, and opportunities. In arXiv preprint arXiv:1812.00315 ,2018.[53] Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled andunlabeled data with label propagation. Technical report, 2002.,2018.[53] Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled andunlabeled data with label propagation. Technical report, 2002.