A Model to Measure the Spread Power of Rumors
Zoleikha Jahanbakhsh-Nagadeh, Mohammad-Reza Feizi-Derakhshi, Majid Ramezani, Taymaz Rahkar-Farshi, Meysam Asgari-Chenaghlu, Narjes Nikzad-Khasmakhi, Ali-Reza Feizi-Derakhshi, Mehrdad Ranjbar-Khadivi, Elnaz Zafarani-Moattar, Mohammad-Ali Balafar
AA Model to Measure the Spread Power of Rumors
Zoleikha Jahanbakhsh-Nagadeh a,b , Mohammad-Reza Feizi-Derakhshi a, ∗ , Majid Ramezani a , TaymazRahkar-Farshi a,c , Meysam Asgari-Chenaghlu a , Narjes Nikzad–Khasmakhi a , Ali-Reza Feizi-Derakhshi a ,Mehrdad Ranjbar-Khadivi a,d , Elnaz Zafarani-Moattar a,e , Mohammad-Ali Balafar f a Computerized Intelligence Systems Laboratory, Department of Computer Engineering, University of Tabriz, Tabriz, Iran. b Department of Computer Engineering, Naghadeh Branch, Islamic Azad University, Naghadeh, Iran. c Department of Software Engineering, Altınba¸s University, Istanbul, Turkey. d Department of Computer Engineering, Shabestar Branch, Islamic Azad University, Shabestar, Iran. e Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran. f Department of Computer Engineering, University of Tabriz, Iran.
Abstract
With technologies that have democratized the production and reproduction of information, a significantportion of daily interacted posts in social media has been infected by rumors. Despite the extensive researchon rumor detection and verification, so far, the problem of calculating the spread power of rumors has notbeen considered. To address this research gap, the present study seeks a model to calculate the SpreadPower of Rumor (SPR) as the function of content-based features in two categories: False Rumor (FR) andTrue Rumor (TR). For this purpose, the theory of Allport and Postman will be adopted, which it claims thatimportance and ambiguity are the key variables in rumor-mongering and the power of rumor. Totally 42content features in two categories ”importance” (28 features) and ”ambiguity” (14 features) are introducedto compute SPR. The proposed model is evaluated on two datasets, Twitter and Telegram. The resultsshowed that (i) the spread power of FR documents is rarely more than TRs. (ii) there is a significantdifference between the SPR means of two groups FR and TR. (iii) SPR as a criterion can have a positiveimpact on distinguishing FR and TR.
Keywords:
Spread Power of Rumor (SPR), Ambiguity of rumor, Importance of rumor, Automatic rumorverification
1. Introduction “You can get a face mask exemption card so you don’t need to wear a mask”.“A vaccine to cure COVID-19 is available”.“The new coronavirus was deliberately created or released by people”.
Nowadays, many people exchange different messages through messengers and social media, unaware oftheir factual accuracy. A significant portion of these messages is fake news or rumors, which have an un-deniable impact on different aspects of our life. DiFonzo and Bordia [1] have defined rumor as unverifiedand instrumentally relevant information statements in circulation that arise in contexts of ambiguity, dan-ger or potential threat, and that functions to help people make sense and manage risk”. This unverifiedinformation may turn out to be true, or partly or entirely false; alternatively, it may also remain unresolved. ∗ This is to indicate the corresponding author.
Email addresses: [email protected] (Zoleikha Jahanbakhsh-Nagadeh), [email protected] (Mohammad-Reza Feizi-Derakhshi), [email protected] (Majid Ramezani), [email protected] (Taymaz Rahkar-Farshi), [email protected] (Meysam Asgari-Chenaghlu), [email protected] (NarjesNikzad–Khasmakhi), [email protected] (Ali-Reza Feizi-Derakhshi), [email protected] (MehrdadRanjbar-Khadivi), [email protected] (Elnaz Zafarani-Moattar), [email protected] (Mohammad-Ali Balafar)
Preprint submitted to ArXiv November 19, 2020 a r X i v : . [ c s . S I] N ov ccordingly, we classified rumors into two categories: False Rumor (FR) and True Rumor (TR). An FR ismisinformation or inaccurate information that is designed for certain goals based on special content features,while TR is a real social fact. Also, the users who create and spread rumors is named as rumormongers.Many rumors are spread depending on the social, economic, political and cultural conditions that canappear as one of the causes of anxiety in society and cause frustration among people in society. To clarify therole of spreading rumors in creating insecurity and disturbing the public mind, we can refer to the rumorspublished in the 2016 U.S. presidential election. In that year, various rumors were spreading on social media(Twitter and Facebook) during the election, so that, among all the 1,723 checked rumors from the popularrumor debunking website Snopes.com, 303 rumors were about Donald Trump and 226 rumors were aboutHillary Clinton. Thereby, these rumors could potentially have negative impacts on their campaigns [2].Rumor is an important and big problem because it can have high destructive power in society. Regardlessof the validity of this information, the spread of information is faster than ever. This brings unprecedentedchallenges in ensuring the reliability of the information [3]. This is so important that it has been noticed bythe largest IT companies such as Twitter, Facebook and Google. They try to validate the messages duringtheir publication and display the validation result to the user. Therefore, it is very important to identifythe rumor in the early hours of the release and to prevent its harmful consequences. So far, the problem ofcalculating the spread power of a document has not been addressed by researchers. Despite notable worksthat explore information diffusion, rumor propagation in particular, on social media during crisis events,very few studies have looked at the speed of spreading rumors.In this study, the problem of rumor is studied in a different field from other research. We hypothesisedthat the first influential factor in spreading a rumor is the textual content of the rumor. In other words, itis the power of words that empowers the text to influence the audience. Because. in the initial moments ofspreading a rumor, firstly there is not enough information about the users and the structure of the rumor,but only the content information of the rumor can be explored. Thereby, we proposed a model to calculatethe spread power of rumor and named it as the Spread Power of Rumor (SPR) for the first time. Thismodel is based on content features of message document. The next hypothesis is that the SPR criterion candistinguish FR from TR. To prove this hypothesis, we demonstrated the importance of the SPR criterion inverifying rumors using the problem of classifying rumors into two classes, FR and TR. To the best of theresearchers’ knowledge, there is no work in the area of SPRs, except Allport and Postman’s theory [4] aboutthe power of rumors. Thus, the contribution of this research is as follows: • Computing SPR for the first time by exploring and analyzing the content-based features.
Determining the spread power of information available on online media is a new task in the field ofrumors analysis. In this research, we intend to compute SPR by focusing on the textual content ofsource post. So far, no research has been conducted on the problem of calculating SPR, and thisresearch is the first work to solve this problem. • Application of SPR in rumor detection problem.
The experimental results on the Persianrumors indicate the spread power of FRs are more than TRs. Therefore, SPR as a distinctive featurebetween FRs and TRs is introduced.The rest of the paper is organized as follows: In Section 2, the problem definition and objectives arepresented. Section 4 reviews a summary of related works. Section 5 describes the proposed SPR measurementmodel and investigates a set of effective features to compute SPR. Section 6 describes the experiments andevaluations and in Section 7 conclusions of the paper is shown.
2. Baseline Theory
Every social phenomenon needs a special set of conditions for its emergence and reliability. Rumor is alsoa social phenomenon that requires conditions for publication and acceptance. Now, the question is, whatconditions and factors are needed to spread the rumor? To answer this question, Allport and Postman [4]outlined two fundamental conditions for spreading the rumor: first, the issue of rumor should be important2o the audience. If the subject is interesting to the audience, rumors about that subject may be interestingto them, but this condition alone is not enough. The second condition for propagating the rumor is theexistence of ambiguity in expressing the issue. Of course, rumors are more infectious; when little informationis released through authoritative channels and uncertainties occur in society. Thereby, Allport and Postman[4] defined the law of the power of the rumor based on the multiplication of importance and ambiguity. Sofar, this law has been presented as a theory and has not been practically studied on rumors.
P ower ≈ Importance × Ambiguity (1)In formula 1, the relation between ”Importance” and ”Ambiguity” is not sum, but is multiplication;because if ambiguity or importance is zero then there will be no rumor. According to this theory, wheneverthe importance of a rumor is high, its influence rate goes up equally. Also, with increasing ambiguity in thecase, the penetration rate of rumors rises. If one of these factors be zero, the influence rate of the rumor willbe zero [4]. In other words, it is unlikely that an individual will attempt to spread a rumor that does notmatter to him, although it is ambiguous. Also, the importance of the subject alone is not enough to spreadthe rumor, because the importance should be with the ambiguity that the rumor reveals. For example, arumor about choosing a presidential candidate after the announcement of the election results, though it isan important issue, due to the lack of ambiguity, it will not spread.As another inference, a rumor about raising or lowering the percentage of banks’ profits does not matterto anyone who does not have the money in the banks, and he will not pursue it. On the other hand, suchrumors are less effective for bank employees and officials as they are aware of the exact news of the profitlevels in the banks and there is no ambiguity for them, but it is different for ordinary people.This theory is based on two assumptions [4]: (i) people exert effort to find meaning in things and events;(ii) when people faced with ambiguity in any important matter, they try to find some meaning by retellingrelated rumors. This means that the importance and ambiguity of rumors are vital variables that predictwhether a rumor would be transmitted or not [4].
3. Problem Statement
The problem of rumor detection is considered as a classification problem that it usually is either a binary(true or false) or a multi-class (true, false or unverified) classification problem [5]. Classification of textdocuments involves assigning a text document to a set of pre-defined classes. Let D = { d , d , . . . , d n } bethe set of n document which is in two classes FR and TR. d ∈ D is a document that contains a sequenceof m sentences (i.e., d = S , S , . . . , S m ). S i ( d ) is i th sentence from document d . Each sentence S has asequence of k tokens (i.e., S = t , t , . . . , t k ), including terms, punctuations, numbers, and symbols. Sinceall message documents d are either TR or FR, it can be inferred that: P ( F R | d ) = 1 − P ( T R | d ) (2)Each rumor has a degree of spread power. Based on our hypothesis about the difference in spread powerof FRs and TRs, it can be argued that the SPR criterion has a decisive role in determining P ( F R | d ). Sothat P ( F R | d ) and SPR have a proportional ratio (equation 3): P ( F R | d ) ∝ SP R (3)Now, according to this assumption and the theory of Allport and Postman [4], the first question whicharose is that: • Q1:
How the spread power of a document will be calculated?According to Allport and Postman’s theory, SPR is approximately equal to the multiplication of theimportance and ambiguity (formula 1) surrounding the rumor as below:
SP R ( d ) = Imp ( d ) × Amb ( d ) (4)3e used formula 1 as a principle and proposed equation 4 to compute the SPR based on content infor-mation of rumors. Therefore, the present research seek to represent and compute two factors ”Importance”(Imp) and ”Ambiguity” (Amb) of the rumor based on its content information.Therefore, it is necessary need to address the problem in more detail. Thereby, two other questions arise: • Q2:
What content features show importance of the rumor? • Q3:
What content features show ambiguity of the rumor?This study analyzed the content features of the rumors to answer these questions. We have shown thatanalyzing content-based features using Natural Language Processing (NLP) methods can provide usefulinformation, because, the power of the words used in the document should not be underestimated.In the following, we seek to assess SPR as a function of content-based features between FRs and TRs,and investigate the role of SPR in verifying rumors. Therefore, Q 1 to 3 questions are followed by anotherquestion based on which, • Q4:
Is there a difference between the spread power of False-Rumors and True-Rumors?To answer this question, it is necessary to compute the spread power for a large set of TRs and FRsseparately and then perform evaluations such as t-test on them to determine whether the SPR criterionis effective in distinguishing FRs from TRs. This study seeks to address these questions and aims atinvestigating the effect of SPR on verifying the rumors.
4. Related works
Rumor phenomenon is studied by psychologists and researchers in various research areas such as psy-chology, modeling, detecting, verifying, and preventing/controling rumors. In the area of psychology, asmentioned, Allport and Postman [4] stated that the rumors are the product of two factors, ”importance”and ”ambiguity”, which are two determining factors in the power of rumors. Harsin [6] presented the ideaof the ”Rumor Bomb”, which means that a ”Rumor Bomb” spreads the notion of the rumor into a polit-ical communication concept. For Harsin, a ”rumor bomb” extends the definition of rumor into a politicalcommunication concept with the following features: (i) a crisis of verification. (ii) a context of public un-certainty or anxiety about a political group, figure, or cause, which the rumor bomb overcomes or transfersonto an opponent. (iii) a partisan, which seeks to profit politically from the rumor bomb’s diffusion. (iv)a rapid diffusion via social media. In another research, Kumar and Geethakumari [7] explored the use oftheories in cognitive psychology and proposed an algorithm that would use social media as a filter to sep-arate misinformation from accurate information. The cognitive process involved in the decision to spreadinformation involves answering four main questions viz consistency of message, the coherency of the message,the credibility of the source, and general acceptability of message. They proposed an algorithm that usesthe collaborative filtering property of social networks to measure the credibility of sources of information aswell as the quality of news items.To automatically solve the rumor detection problem, various researches are presented with differentmodeling methods, detecting, verifying, and preventing rumors by analyzing various features at three levels:user information, content, and propagation network structure on many languages [8]. Some of them areclassical learning methods like Naive Bayes (NB), Support Vector Machines (SVM), Decision Tree (DT),Random Forest (RF), Logistic Regression (LR), and others are based on deep learning methods. Mostclassical machine learning models follow the popular two-step procedure, wherein the first step some hand-crafted features are extracted from the documents (or any other textual unit) and in the second step thosefeatures are fed to a classifier to make a prediction [9]. Since this study utilized the content features extractedby feature engineering methods to calculate the spread power of a message document, so we discussed themachine learning-based classification systems and skipped other methods such as methods based on deepneural networks. Table 1 presents a brief literature review of veracity and detection classification using themachine learning methods. 4n classical machine learning approaches, researchers analyzed various features at three levels: userinformation, content, and propagation network structure [8]. For example, Castillo et al. [10] and Kwon etal. [11] proposed a combination of linguistics and structure-based features that can be used to approximatethe credibility of information on Twitter. Castillo et al. [10] studied the propagation of rumors duringreal-world emergencies while Kwon et al. [11] studied the propagation of urban legends (such as Bigfoot) onTwitter. Yang et al. [12] have done work similar to Castillo’s work on Sina Weibo (a Chinese leading micro-blogging service provider that functions like a FacebookTwitter hybrid). Kwon et al. [11] and Yang et al.[12] show that the most significant features for rumor detection are emoticons, opinion words, and sentimentscores (positive or negative). Qazvinian et al. [13] explored the content-based, network-based features, andmicroblog-specific memes to address the problem of rumor detection. Wu et al. [14] used all previous effectivefeatures, plus two new semantic features: a topic model feature and a search engine feature. Vosoughi [15]identifies salient features of rumors in different periods by analyzing three aspects of information spread:linguistic, user, and network propagation dynamics using Dynamic time wrapping (DTW) and HiddenMarkov Models(HMMs). Wang and Terano [16] identified a series of short diffusion patterns, based onstance, that appear to be strongly related to rumors. Floos [17] presented a statistical method based onthe computation of TF-IDF for each term in the tweets to detect rumors in Arabic tweets. Zhao et al.[18] tackled early detection of rumors by determining clusters of potential rumors and extracted a seriesof features for each cluster. These features were two types of language patterns in rumors: the correctiontype and the inquiry type. Liu and Xu [19] proposed a model based on propagation patterns of rumors andcredible messages and carried this model on Sina Weibo. Zubiaga et al. [20] leveraged the context preceding atweet with a sequential that learns the reporting dynamics during an event to detect rumors. Their proposedmethod was based on the hypothesis that a tweet alone may not suffice to know if its underlying story isa rumor, due to the lack of context. Kwon et al. [21] examined user, linguistic, network, and temporalfeatures over different observation time windows. They identified significant differences between rumorsand non-rumors for the first 3, 7, 14, 28, and 56 days from the initiation). Zamani et al. [22] addressedthe problem of rumor detection on Persian Twitter for the first time and developed a dataset of PersianTwitter rumors. They utilized a set of structural features based on tweet and user characteristics, and alsoused frequent Twitter unigrams as words vector. Zarharan et al. [23] focused on the stance detection ofPersian rumor and developed a dataset for it. Jahanbakhsh et al. [24] proposed a dictionary-based statisticaltechnique to identify Persian SA. Therefore, Based on the obtained results in [24], FRs are often expressedin four SA classes, including threat (SA Thrt), declaration (SA Dec), question (SA Ques), request (SAReq). They showed the positive effect of SA on rumor detection by combining the content features (intofour categories: Lexical, Semantic, Syntactic, and Surface) and four speech act classes. Kumar et al. [25]first extracted three categories of features, including content-based features (Part-of-Speech, Bag of Words,term-frequency), pragmatic features (emoticons, sentiment, anxiety-related words and Named Entity), andnetwork-specific features (User and Message metadata). Then, they used particle swarm optimization toselect the set of features with the highest importance on the rumor veracity classification task.Some research has also focused on modeling the distribution of rumors in the social space [8]. Forexample, Zeng et al. [26] modeled the speed of information transmission to compare retransmission timesacross content and context features. Doerr et al. [27] simulated a natural rumor spreading process on severalclassical network topologies. They also performed a mathematical analysis of this process in preferentialattachment graphs and proved that the process of rumor spreading disseminates a piece of news in sub-logarithmic time. That is, the spread of the rumors is extremely fast on social networks. Based on theseresults, it can be argued that the spread power of FRs is more than TRs.In existing research, the problem of rumor has been discussed from various fields such as rumor psychologytheories, automatic detection and validation, and rumor propagation modeling. However, the law of rumor[4] (i.e., the power of rumor) has not empirically grounded in any rumor research. Thereby, this studyintends to study the basic law of rumor empirically for the first time and calculate the Spread Power ofRumor (SPR) based on its content features. We also prove that there is a significant difference in the spreadpower of FRs and TRs. Therefore, SPR can be used as a new influential factor in rumor classification.5 able 1: The list of previous machine learning techniques for rumor detection based on Content (C), User (U), and Structural(S) features.
FeaturesRef. Lang. Dataset Method C U S Conclusion [10] EN Twitter DT, NB, SVM. (cid:88) (cid:88) (cid:88)
DT as best classifier.[13] EN Twitter Present the tweetwith two patterns:Lexical and Part-of-speech. (cid:88) (cid:88) (cid:88)
Identify users that spread false informa-tion in online social media using theirproposed features.[12] CHI Sina Weibo SVM-RBF kernel (cid:88) (cid:88) (cid:88)
Improve in accuracy.[11] EN Twitter DT, RF, SVM, LR (cid:88) (cid:88)
RF as best classifier.[14] CHI Sina Weibo SVM-RBF kernel (cid:88)
Improve in accuracy using networkbased features.[16] EN Twitter Analyze patterns ofdiffusion with linearmodel (cid:88)
Identify influential spreaders.[15] EN Twitter Verify rumors indifferent time periodsusing DTW andHMMs (cid:88) (cid:88) (cid:88)
HMM as best classifier.[17] AR Twitter TF-IDF (cid:88)
The effectiveness of content features invalidating Arabic tweets[18] En Twitter Searching enquiryphrases, clusteringsimilar posts, thenranking the clusters. (cid:88)
Accuracy of 0.52 for their best run usingJ48.[19] CHI Sina Weibo SVM (cid:88)
Differences in the propagation patternsof rumors and credible messages[20] En Twitter A sequential classifier (cid:88) (cid:88) [21] EN Twitter RF (cid:88) (cid:88) (cid:88)
Identify significant features in the first 3,7, 14, 28 and 56 days of the initiation.[22] FA Twitter J48, Naive Bayes,SMO, IBK (cid:88) (cid:88) (cid:88)
About 70% precision just based onstructural features and about 80% basedon both categories of features.[28] FA Twitter MLP, KNN, DT, NB,Random Tree, RF,Rules.Part, SVM,etc. (cid:88) (cid:88) (cid:88)
RF and meta. RandomSubSpace as bestclassifiers.[25] EN Twitter Implement SVM, DT,KNN, NB, NN usingParticle Swarm Opti-mization (PSO) to se-lect optimal features. (cid:88) (cid:88)
Improve in accuracy using PSO6 . Procedure to compute SPR
SPR is a research issue that has not been considered in any of the research on rumor empirically. Asmentioned in Section 2, the spread of the rumor depends on the existence of both factors of importance andambiguity in the rumor. Thereby, in the proposed model, a set of content features in two categories areintroduced to compute the importance and ambiguity of a message document with the aim of calculatingSPR. The general structure of the proposed model for computing SPR is shown in Figure 1. As describedin Figure 1, our model consists of the following steps:1. Pre-processing: Converting the message document into a form that is analyzable for this task.2. Feature extraction: Analyzing and extracting the features that indicate the importance and ambiguityof a rumoe document.3. Learning: Weighting content features and determining the degree of importance of each feature inpredicting classes.4. SPR calculation: Computing the Spread Power of Rumor based on two criteria of importance andambiguity.
The proposed model performs the SPR calculation in the Persian rumors on Twitter and Telegramdatasets. These online texts usually contain lots of noise and uninformative parts (such as symbols, specialcharacters). Therefore, five steps of pre-processing, including tokenization, normalization, Part Of Speech(POS) tagging, stemming, and lemmatization are performed on the message documents to bring documentsinto a form that is usable and analyzable for our task. Pre-processing operations in the Persian language havecomplexities and challenges that these challenges are addressed by normalization. Some of these challengesinclude:1. Multiform words: some words may be written in some different forms. For example, (cid:34) (cid:233)(cid:203)(cid:65)(cid:130)(cid:211)(cid:34) , (cid:34) (cid:233)(cid:202)(cid:13)(cid:74)(cid:130)(cid:211)(cid:34) ,and (cid:34) (cid:233)(cid:203)(cid:13)(cid:65)(cid:130)(cid:211)(cid:34) are all forms of writing the word ‘problem’. It is solved by a spell checker to normalizetext into a standard one and unify those words.2. Different spacing: some words may be written with space, short space or no space such as: (cid:34) (cid:16)(cid:73) (cid:9)(cid:174)(cid:194)(cid:74)(cid:10)(cid:211)(cid:34) , (cid:34) (cid:16)(cid:73) (cid:9)(cid:174)(cid:195) (cid:250)(cid:10)(cid:215)(cid:34) , and (cid:34) (cid:16)(cid:73) (cid:9)(cid:174)(cid:195) (cid:250)(cid:10)(cid:215)(cid:34) are all forms of writing the word ’was saying’. It is solved by Addingshort-spaces between different parts of a word.3. Letters with two Unicodes: there are some letters have two Unicodes that one is for Persian and onefor Arabic, such as: (cid:34) (cid:248)(cid:34) , (cid:34) (cid:248)(cid:10) (cid:34) (i) and (cid:34) (cid:240)(cid:34) , (cid:34) (cid:13)(cid:240)(cid:34) (v). It is solved by Replacing Arabic letters withtheir Persian equivalent. The role of an FR is not out of two modes; either it is expressed based on imagination, lies, and slander,or it is published an event that its acceptance depends on the state of the audience’s public opinions andits publication time. Accordingly, rumormongers use the power of words in expressing FRs to captivate theaudience and gain their trust. Thereby, FRs make a sense similar to the truth for the audience so that theaudience accepts it and propagates it, even if its validity is doubtful. Therefore FRs are quickly acceptedand propagated by audiences without any review of its accuracy. Accordingly, we focused on the content ofrumor document as an informative source and extracted the content features that increase two influentialfactors in the spread of rumors, i.e., importance and ambiguity.We introduced 42 features to compute SPR: 28 features to determine the importance of the messagedocument and 14 features to compute the ambiguity of the document. Table 2, 4, 3 show these featuresalong with a brief description of each. 7 igure 1: Proposed structure to compute spread power of rumor. able 2: A summary of Emotional features along with a brief description of each (The new features are marked with a ”*”). Abbr. Feature Description Emotional features
ETag Emotiveness[29] The ratio of adjectives plus adverbs to nouns plus verbs.Fr Fear* The ratio of the number of sentences containing fear-based words to the totalnumber of sentences in the document.Su Surprise* The ratio of the number of sentences containing surprise-based words to thetotal number of sentences in the document.Dsg Disgust* The ratio of the number of sentences containing disgust-based words to thetotal number of sentences in the document.Sad Sadness* The ratio of the number of sentences containing sadness-based words to thetotal number of sentences in the text.An Anger* The ratio of the number of sentences containing anger-based words to the totalnumber of sentences in the document.Aff Affective* The ratio of the number of sentences containing affective-based words (Wordsthat cause emotion or feeling, such as, (cid:34) (cid:80)(cid:65)(cid:162) (cid:9)(cid:107)(cid:64)(cid:34) ”/”ekhtar”/” Warning”)to thetotal number of sentences in the document.MV MotionVerbs* The ratio of the number of sentences containing motion verbs (such as, jump,dilatory, rotation and so on.) to the total number of sentences in the document.CW ConsecutiveWords* The ratio of the number of sentences containing consecutive repeated words(such as, (cid:34) (cid:233)(cid:107)(cid:46) (cid:241)(cid:16)(cid:75) (cid:233)(cid:107)(cid:46) (cid:241)(cid:16)(cid:75)(cid:34) / ”tavajjoh tavajjoh”/”Attention Attention” and soon.) to the total number of sentences in the document.CC ConsecutiveChars* The ratio of the number of sentences containing consecutive repeated charac-ters in a word (such as, (cid:21)(cid:208)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:67)(cid:131)(cid:22)(cid:64) / ”salˆaˆaˆaˆaˆaˆaˆam”/”helllllloooo”) to the totalnumber of sentences in the document.PS Positive Sen-timent [10] The ratio the number of positive words in the document to the sum of positiveand negative words. If the number of positive and negative words is zero, thenPS is zero.NS NegativeSentiment[10] The ratio of the number of negative words in the document to the sum ofpositive and negative words. If the number of positive and negative words iszero, then NS is zero.SA Thrt SpeechAct Threat[24] The SA Thrt of a document is determined by SA classifier provided by Jahan-bakhsh et al. [24]. By this SA, we can promise for hurting somebody or doingsomething if hearer does not do what we want.SA Req SpeechAct Request[24] The SA Req (Ie, politely asks from somebody to do or stop doing something)of a document is determined by the SA classifier [24].Adj Sup SuperlativeAdjective* The ratio of the number of sentences containing Adj Sup (Simple Adjective+ Suffixes (cid:34) (cid:9)(cid:225)(cid:75)(cid:10)(cid:81)(cid:16)(cid:75)(cid:34) /tarin/ Number + ( (cid:34) (cid:248)(cid:10) (cid:20)(cid:64)(cid:34) (cid:43) (cid:34) (cid:9)(cid:225)(cid:30)(cid:10)(cid:211)(cid:34) /[o]min/) to the totalnumber of sentences in the document.Adj Cmp ComparativeAdjective* The ratio of the number of sentences containing Adj Cmp (Simple Adjective +Suffixes + (cid:34) (cid:81)(cid:16)(cid:75)(cid:34) (cid:89)(cid:9)(cid:75)(cid:241)(cid:130)(cid:29)(cid:18) /tar/)to the total number of sentences in the document.Strt Start sen-tence* It analyzes whether the first sentence of the document contains emotion-basedwords. This feature have a boolean value for each document.End End sen-tence* It analyzes whether the last sentence of the document contains emotion-basedwords and words associated with the request. This feature have a boolean valuefor each document. 9 able 3: A summary of Newsworthy features along with a brief description of each (The new features are marked with a ”*”).
Abbr. Feature Description Newsworthy features
RT RelativeTime* The ratio of the number of sentences containing RT-based words (such as, (cid:34) (cid:73)(cid:46) (cid:17)(cid:130)(cid:211)(cid:64)(cid:34) / ”emˇsab” /) to the total number of sentences in the document.SI StatisticalInformation[10, 29] The ratio of the number of sentences containing SI-based words (such as, nu-meral characteres and (cid:34) (cid:250) (cid:9)(cid:107)(cid:81)(cid:75)(cid:46)(cid:34) / ”barkhi”) to the total number of sentences inthe document.NE Named En-tity [30] The ratio of the number of sentences containing NE (In three classes, theperson’s name, the organization, the location) to the total number of sentencesin the document.LD LexicalDiversity[29] The ratio of vocabulary to the total number of terms in the document [29].Cer Certainty[29] The ratio of certainty-based words to the sum of certainty and uncertainty-based words in the document. If certainty and uncertainty word are zero thencertainty score is zero.SA Dec SA Declarative[24] The SA Dec (Ie, Transfer information to hearer) of a document is determinedby the SA classifier [24].SA Quot SA Quotations[24] The SA Quot (Ie, speech acts that another person said or wrote before) of adocument is determined by the SA classifier [24].Adj Ord Ordinal Ad-jective* The ratio of the number of sentences containing Adj Ord (Number + ( (cid:34) (cid:208)(cid:64)(cid:34) , (cid:34) (cid:208)(cid:34) ) /om/) to the total number of sentences in the document.SM Spelling Mis-take [29] The ratio of misspelled words based on typographical errors to total numberof words in the document.10 able 4: A summary of Ambiguity features along with a brief description of each (The new features are marked with a ”*”). Abbr. Feature Description
Ucer Uncertainity[29] The ratio of uncertainty-based words to the sum of certainty and uncertainty-based words in the document. If certainty and uncertainty word are zero thenuncertainty score is zero.SV Sensory Verb[30] The ratio of the number of sentences containing SV (Such as, (cid:34) (cid:9)(cid:224)(cid:89)(cid:74)(cid:10) (cid:9)(cid:28) (cid:17)(cid:131)(cid:34) /”ˇsenidan”/hear and (cid:34) (cid:9)(cid:224)(cid:89)(cid:75)(cid:10)(cid:88)(cid:34) /”didan”/see) to the total number of sentences inthe document.QW QuestionWord [10] The ratio of the number of sentences containing QW (Such as, what, when,where, and who) to the total number of sentences in the document.QM QuestionMark [10] The ratio of the number of sentences containing the question mark ’ ?’ or mul-tiple question marks ”?????” to the total number of sentences of the document.EM ExclamationMark [10] The ratio of the number of sentences containing the exclamation mark to thetotal number of sentences of the document.SA Ques SpeechAct Question[24] The SA Ques (Such as, usual questions for information or confirmation) of adocument is determined by the SA classifier [24].Pro Pronoun [10] The ratio of the number of sentences containing pronoun (A personal pronounin 1st, 2nd, or 3rd person) to the total number of sentences of the document.Tntv Tentative[30] The ratio of the number of sentences containing tentative adjective (It describessomething that is uncertain and unsure) to the total number of sentences ofthe document.Neg Negation[21] The ratio of the number of sentences containing Negation words (Units oflanguage, including, words (e.g., not, no, never, incredible) and affixes (e.g.,-n’t, un-, any-)) to the total number of sentences of the document.Antcpnt Anticipation[21] The ratio of the number of sentences containing Anticipation-based words tothe total number of sentences of the document.Adv Exm ExampleWords* The ratio of the number of sentences containing Example-based words (suchas, (cid:34) (cid:9)(cid:224)(cid:241)(cid:106)(cid:18) (cid:210)(cid:235)(cid:34) /hamˇcon/, (cid:34) (cid:89)(cid:9)(cid:74)(cid:9)(cid:75)(cid:65)(cid:210)(cid:235)(cid:34) /hamanand/, and /masalan/, all are meaning”for example”) to the total number of sentences of the document.If Conditionalwords* The ratio of the number of sentences containing the conditional conjunctions(such as, if) to the total number of sentences of the document.GT GeneralTerms [10] The ratio of the number of sentences containing the general terms (It Refersto a person (or object) as a class of persons or objects) to the total number ofsentences of the document.UT Un Trust* The ratio of the number of sentences containing un trust words (such as, lack oftrust, distrust, and suspicion) to the total number of sentences of the document.11 .2.1. Computing the importance of a rumor
In this section, the importance of a message document is evaluated based on two factors: emotional andits newsworthy. Thereby, first, we introduced two categories features to compute the emotional ( f ImpEmo ) andthe newsworthy ( f ImpNws ). Then the criterion of the importance of the message document is obtained from thesum of these two criteria (formula 5). The following explains how to calculate these two criteria of emotionaland newsworthy.
Imp ( d ) = f ImpEmo ( d ) + f ImpNws ( d ) (5) • Computing the emotional score of a rumor
We have introduced a set of content features in various categories such as adjectives, adverbs, emo-tion words, and so on to determine the emotional score of a text document. The reason for introducingthese features is that rumormongers increase the emotional aspect of the message by utilizing powerwords. Power words are persuasive, descriptive words that trigger an emotional response. They makeus feel scared, encouraged, aroused, angry, and so on. The goal of using power words in FRs is tomotivate a person to spread a message. In the following, this set of emotional features are described:
Emotiveness:
Adjectives(Adj), Adverbs(Adv) describe things and modify other words so change ourunderstanding of things. f EmoET ag ( d ) = | Adj ( d ) | + | Adv ( d ) || N oun ( d ) | + | V erb ( d ) | (6)where, f EmoET ag ( d ) is the ratio of adjectives ( | Adj ( d ) | ) plus adverbs ( | Adv ( d ) | ) to nouns ( | N oun ( d ) | ) plusverbs ( | V erb ( d ) | ) in the document d , which is selected as an indication of expressivity of language [29]. Word-emotion:
Rumormongers use the power of word emotion to cause fear, concern, and hatredin the audience. In this study, the National Research Council Canada (NRC) [31] emotion lexiconis utilized to obtain the emotional score of words in the content of Persian rumors in eight basicemotions which are anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. Saif et al. [31]also provided versions of the lexicon in over one hundred languages, such as the Persian language. Wemanually reviewed and corrected each of these words with the help of two linguists. Five categories ofthese Word-Emotion (WE) including,
Fear (Fr),
Surprise (Su),
Disgust (Dsg),
Sadness (Sad),
Anger (An) are evaluated on the input documents. Also, we introduced the affective-based words (suchas, (cid:34) (cid:17)(cid:128)(cid:64)(cid:81)(cid:9)(cid:109)(cid:204)(cid:39)(cid:88)(cid:34) delkharash”/”irritant”) can increase its emotional impact. Additionally, we consideredMotion Verbs (MV) as a new feature to review the content of rumors. MVs are categorized into twocategories: (1) Transitional (i.e., top to down, down to top, left to right, right to left, and multi-directional). (2) Self-contained motion (such as oscillation, dilatory, rotation, wiggle, wander, rest).For this purpose, we utilized the set of MVs in the Persian language that is collected by Golfam et al.[32]. They extracted and analyzed 126 MVs from Dadegan site and Persian Language Database andother written sources. We narrowed down the list of MVs by selecting MVs that often appear in FRs.Score of each of Word Emotion Features W EF = { F r, Su, Dsg, Sad, An, Af f, M V } is calculated byformula 7. ∀ j ∈ W EF f
Emoj ( d ) = (cid:80) | S ( d ) | i =1 | W EF j ( S i ( d )) || S ( d ) | , (7)where, | W E j ( S i ( d )) | indicates that sentence S i of the document d contains feature j of the set W EF or not. | S ( d ) | shows the number of sentences in document d . onsecutive Words or Characters (CW & CC): Consecutive words or characters within asentence in every language is syntactically incorrect. However, rumormonger tries to emphasize themain subject of rumor by repeating consecutive words in the FR. For example, (cid:34) (cid:233)(cid:107)(cid:46) (cid:241)(cid:16)(cid:75) (cid:233)(cid:107)(cid:46) (cid:241)(cid:16)(cid:75)(cid:34) /Atten-tion Attention, is a CW and words like (cid:34) (cid:208)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:67)(cid:131)(cid:34) /”salˆaˆaˆaˆaˆaˆaˆam”/”Helllllllloooo” and (cid:34)(cid:80)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:64)(cid:89) (cid:17)(cid:130)(cid:235)(cid:34) /”Hoˇsdˆaˆaˆaˆaˆaˆar”/”Alaaaaarm” are words containing CC. f EmoCW ( d ) = (cid:80) | S ( d ) | i =1 | CW ( S i ( d )) || S ( d ) | (8) f EmoCC ( d ) = (cid:80) | S ( d ) | i =1 | CC ( S i ( d )) || S ( d ) | (9)In formulas 8 and 9 are respectively computed the fraction of sentences containing CW and CC toall sentences | S ( d ) | of the document d . | CW ( S i ( d )) | and | CC ( S i ( d )) | have a boolean value for eachsentence S i and indicate that sentence S i of the document d contains CC and CW or not respectively. Sentiment (PS & NS):
News sentences usually do not convey any sentiment. For example, in thesentence ”The Coaches of the Persepolis and Oil teams of Tehran, Branko and Ali Daei planted atree seedling in the league organization on the occasion of the arbor day.” , there is no excitement.Nevertheless, rumors contain several characteristic sentiments (e.g., anger) compared to other types ofinformation [33]. On the other hand, there is a general lay belief that FRs are dominated by negativesentiment and polarity [34]. Although rumors contain negative polarity, they are often expressed inpositive polarity. The NRC Emotion Lexicon [31] is used to obtain the sentiment score of Persianwords in one of the positive (1), negative (-1), or neutral (0) polarities. We utilized the concepts ofsentiment polarity of Zhang and Skiena [35] and calculated the sentiment score of the rumor documentusing an NRC lexicon (Formulas 10 and 11). f EmoP S ( d ) = (cid:40) if | P Sntm ( d ) | = 0 & | N Sntm ( d ) | = 0 | P Sntm ( d ) || P Sntm ( d ) | + | NSntm ( d ) | otherwise (10) f EmoNS ( d ) = (cid:40) if | P Sntm ( d ) | = 0 & | N Sntm ( d ) | = 0 | NSntm ( d ) || P Sntm ( d ) | + | NSntm ( d ) | otherwise , (11)where | P Sntm ( d ) | and | N Sntm ( d ) | are the number of positive and negative terms in document d . Threat and Request Speech Acts (SA Thrt & SA Req):
The importance of a message fromthe person’s point of view increases when the message transmits critical news and motivates fear inthe audience. Hence, individuals spread rumors when they feel anxiety or threat. For example, toincrease the speed of the release of a post on the social networks, the rumormonger asks audiences tonotify the message as soon as possible to his or her relatives and warns the audience that if they donot inform others, a bad event is may happen. We utilized the SA classifier provided by Jahanbakhshet al. [24].
Superlative and Comparative adjectives (Adj Sup, Adj Cmp):
There is a set of adjectivesthat their presence in a document can increase the excitement and importance of the subject. Weconsidered two adjectives of superlative (Adj Sup, such as, (cid:34) (cid:9)(cid:225)(cid:75)(cid:10)(cid:81)(cid:16)(cid:30)(cid:238)(cid:69)(cid:46) (cid:34) behtarin”/”best”, (cid:34) (cid:9)(cid:225)(cid:30)(cid:211)(cid:240)(cid:88)(cid:34) dovvomin”/”second” ) and comparative (Adj Cmp, such as, (cid:34)(cid:81)(cid:16)(cid:75) (cid:184)(cid:65)(cid:9)(cid:74)(cid:131)(cid:81)(cid:16)(cid:75)(cid:34) tarsnaktar”/”scarier” ).Theseadjectives upgrade (ie., one thing or person is superior to another) or diminish the main element ofrumor. Score of each of features
SCA = { Adj Sup, Adj Cmp } is calculated by formula 12. ∀ j ∈ SCAf
Emoj ( d ) = (cid:80) | S ( d ) | i =1 | SCA j ( S i ( d )) || S ( d ) | (12)13n formula 12, | SCA j ( S i ( d )) | has a boolean value for each sentence S i and indicate that sentence S i of the document d contains feature j of the set SCA or not.
Start and End of document (Str & End):
The start and end sentences of a document are veryimportant in conveying its message because the author expresses the main purpose in these sentences.Therefore, it can have a significant effect on attracting the attention of the audience. Thereby, weintroduced two new features to analyze the start sentence of the document based on word-emotionand the end sentence based on both word-emotion and words associated with the request. Both thesefeatures have a binary value, which indicates whether the start and/or the end sentences of the messagedocument contains word-emotion and/or words associated with the request.Each of these emotional-based features has a different impact on measuring the emotional score of adocument. Therefore, it is necessary to consider the influence coefficient of each feature in distinguish-ing FRs from TRs. For this purpose, a feature weighting technique (section ?? ) is used to obtain thesecoefficients for each of these features. f ImpEmo ( d ) = (cid:80) | F Emo | j =1 wf j ∗ f Emoj ( d ) | F Emo | (13)In formula 13, wf j is weight of feature f Emoj and f Emoj ( d ) is the value of feature j in F Emo = { E T ag, F r, Su, D, Sad, An, Af f, M V, P S, N S, CW, CC, SA T hrt, SA Req, Adj Sup, Adj Cmp, Strt, End } of document d . • Computing the newsworthy of a rumor
The main reason for accepting a message document as credible news by a person is that it is newsworthy.A news item can be defined as ”newsworthy information about recent events or happenings, especiallyas reported by news media.” [36]. In this study, the set of content features have been introducedto evaluate the newsworthy of the document, including relative time, statistical information, namedentity, lexical diversity, certainty, two SAs (including, Declarative and Quotations), ordinal adjective,and spelling mistakes. These factors are detailed below.
Relative Time (RT):
If a story happens today, it is news, but when the same thing happened lastweek, it is no longer interesting. Thereby, the novelty of news is particularly important. For example,journalists write news of the day with past events from a new angle or view daily. Rumormongeralso uses RT-based features (such as, ”tonight”, (cid:34) (cid:9)(cid:80)(cid:240)(cid:81)(cid:211)(cid:64)(cid:34) / ”emroz” / ”today”, (cid:34) (cid:19)(cid:64)(cid:81)(cid:30)(cid:10) (cid:9)(cid:103)(cid:64)(cid:34) / ”akhiran” /”recently” and so on) to apparently display new news or tries to pretend that an important eventwill happen soon. For example, for over three years, the rumor ”Recently, Google has put an Internetvoting to change the name of the Persian Gulf” be released on social networks. So, words with theAdv Time tag are extracted as RT and used to calculate the RT score of document d by 14. f NwsRT ( d ) = (cid:80) | S ( d ) | i =1 | RT ( S i ( d )) || S ( d ) | (14)In formula 14, | RT ( S i ( d )) | indicates whether sentence S i of the document d contains RT-based wordsor not. Statistical Information (SI):
SI-based words are divided into two categories: (i) The number ofnumeral characters, for example, the numeral ”56” has two digits: 5 and 6. (ii) Words such as (cid:34) (cid:250) (cid:9)(cid:107)(cid:81)(cid:75)(cid:46)(cid:34) / ”barkhi” / ”some, (cid:34) (cid:233)(cid:210)(cid:235)(cid:34) / ”hame” / ”all” and so on. Zhang et al. [37] found that rumors that areshort and contain numbers are more likely to be true than those that are long and do not contain anyquantitative details. This feature is calculated by formula 15 for the message document d . f NwsSI ( d ) = (cid:80) | S ( d ) | i =1 | SI ( S i ( d )) || S ( d ) | (15)14n formula 15, | SI ( S i ( d )) | indicates whether sentence S i of the document d contains SI-based wordsor not. Named Entity (NE):
Most people follow the topics that are discussed about celebrities. Celebritiesas NEs are high-interest items for individuals. They spread related news about famous people basedon popularity or disgust. Therefore, rumormongers utilize the names of famous people, scientists,philosophers, organizations, or institutions to increase the newsworthy of the rumor. In this study,NEs are extracted using a Hidden Markov Model (HMM)-based model [38] in three classes, the person’sname, organization, and location. f NwsNE ( d ) = (cid:80) | S ( d ) | i =1 | N E ( S i ( d )) || S ( d ) | (16)In formula 16, | N E ( S i ( d )) | indicates whether sentence S i of the document d contain NE phrases ornot. Lexical Diversity (LD):
Rumormongers try to attract the attention of audiences to the issue ofrumor, so repeat important and emotional words on the subject of a rumor. Therefore, using therepetitive words in the document reduces its LD score. Thereby, FRs have a low LD due to the highrepetition of tokens. f NwsLD ( d ) = | V ( d ) || T ( d ) | (17)Therefore, the ratio of the number of vocabulary | V ( d ) | to the total number of terms | T ( d ) | in thedocument d is calculated as the LD score [29]. Certainty (Cer):
Rumormongers use certainty-related words to hide their lies about the FR issue andincrease the audience’s trust in the subject. The score of certainty is calculated as a factor influencingthe newsworthy of the rumor by formula 18. f NwsCer ( d ) = (cid:40) if | V Cer ( d ) | = 0 & | V Ucer ( d ) | = 0 | V Cer ( d )) || V Ucer ( d ) | + | V Cer ( d ) | otherwise (18)In formula 18, | V Ucer ( d ) | and | V Cer ( d ) | are respectively the number of uncertainty-based and certainty-based vocabularies in document d . Declarative and Quotations Speech Acts (SA Dec & SA Quot):
These two types of SA givea formal concept to document. Sometimes, FRs are formally expressed and refer to reliable sources togain the audience’s trust. We determined these SAs in the document using a SA classifier [24].
Ordinal adjectives (Adj Ord):
These Adjectives denote in what order as first, second, third,fourth, and so on. This feature has a boolean value for each sentence, meaning it is checked whetherthe sentence contains Adj Ord or not. The ratio of the number of sentences containing Adj Sup andAdj Ord to the total number of sentences in the document is calculated by formula 19. f NwsAdj Ord ( d ) = (cid:80) | S ( d ) | i =1 | Adj Ord ( S i ( d )) || S ( d ) | (19)In formula 19, | Adj Ord ( S i ( d )) | indicates whether sentence S i of the document d contain Adj Ord ornot.
Spelling Mistake (SM):
The presence of a misspelling in the text reduces its newsworthy. Weutilized Virastyar [58] as a spell checker to find misspelled words based on typographical errors in the15ersian language and calculate Spelling Mistake (SM) to the total number of terms (T) in document d (formula 20). f NwsSM ( d ) = | SM ( d ) || T ( d ) | (20)Finally, the Newsworthy score of document d is computed by formula 21, where wf j is weight of feature f Nwsj and f Nwsj ( d ) is the value of feature j in F Nws = { RT, SI, N E, LD, Cer, SA dec, SA Quot, Adj Ord } of document d . f ImpNws ( d ) = (cid:80) | F Nws | j =1 wf j ∗ f Nwsj ( d ) | F Nws | (21) The essence of rumors is in their ambiguity so that the ambiguity of evidence makes the process ofspreading rumors more widely [39]. The ambiguous expression of news challenges the audience. The ambi-guity arises when either the news is received in distorted form or the person received contradictory news,and or one cannot understand such news. In the following, a set of ambiguity-based features is introducedthat the presence of those features in the document causes ambiguity in the subject.
Uncertainty (Ucer):
Words that indicate the lack of sureness about someone or something. Rumor-monger tries to challenge the audience’s mind by creating a sense of uncertainty about the issue. Thereby, acollection of uncertainty-based words in the Persian language is extracted to measure the uncertainty scoreof the document d . f AmbUcer ( d ) = (cid:40) if | V Cer ( d ) | = 0 & | V Ucer ( d ) | = 0 | V Ucer ( d )) || V Ucer ( d ) | + | V Cer ( d ) | otherwise (22) Sensory Verbs (SV):
These verbs describe one of the five senses of sight, hearing, smell, touch, andtaste. For example, (cid:21)(cid:9)(cid:224)(cid:89)(cid:74)(cid:10) (cid:9)(cid:28) (cid:17)(cid:131)(cid:22)(cid:64) /”ˇsenidan”/hear, (cid:21)(cid:9)(cid:224)(cid:88)(cid:81)(cid:187) (cid:129)(cid:107)(cid:22)(cid:64) /”hes kardan”/feeling, (cid:34) (cid:9)(cid:224)(cid:89)(cid:75)(cid:10)(cid:88)(cid:34) /”didan”/see, andso on. When a rumormonger creates a rumor, there is a clear sign in the sentence that indicates that he haspersonally seen or heard what he speaks about it, or it is the result of his reasoning and speculation. Thesesigns are SVs that create evidentiality in rumors. Evidentiality is a grammatical category that its role isto show the source of information. Of course, these verbs appear in cases where the rumormongers want toincrease the rumor’s credibility, so they use these verbs as a means to emphasize the rumor. f AmbSV ( d ) = (cid:80) | S ( d ) | i =1 | SV ( S i ( d )) || S ( d ) | (23) | SV ( S i ( d )) | in equation 23 has a boolean value and indicates whether the sentence S i of the document d contains a sensory verb or not. Question Speech Act and Tokens (SA Qes, QW and QM):
Question Word (QW) is a functionword used to ask a question. Therefore, QW, Question Mark (QM), and Speech Act Question (SA Ques)are considered as factors that raise questions in the mind of the audience, create ambiguity in rumor, anddisturb the reader’s mind. QM and QW calculate by formulas 24 and 25 respectively. f AmbQW ( d ) = (cid:80) | S ( d ) | i =1 | QW ( S i ( d )) || S ( d ) | (24) f AmbQM ( d ) = (cid:80) | S ( d ) | i =1 | QM ( S i ( d )) || S ( d ) | (25) | QW ( S i ( d )) | and | QM ( S i ( d )) | separately means that the sentence S i of document d has at least onequestion word or question mark or not. 16 xclamation Mark (EM): A punctuation mark is usually used after an interjection or exclamationto indicate strong feelings or high volume (shouting) or to show emphasis. Exclamation mark used for anyother purpose, as to draw attention to an obvious mistake, in road warning signs, (in chess commentaries)beside the notation of a move considered a good one, (in mathematics) as a symbol of the factorial function,or (in logic) occurring with an existential quantifier.
Pronouns (Pro):
Rumormongers use less self-reference (first-person singular pronoun), more group-reference (first-person plural pronoun), and other references (third-person pronouns) to create non-immediacyand uncertainty in their rumors. f AmbP ro ( d ) = (cid:80) | S ( d ) | i =1 | P ro ( S i ( d )) || S ( d ) | (26) | P ro ( S i ( d )) | is the number of sentences containing pronoun (i.e., third-person and first-person pluralpronouns). Tentative (Tntv):
The adjective tentative is used to describe what is unclear. Therefore, rumormongersutilized these types of words to create a sense of hesitation in the audience and engage minds. f AmbT ntv ( d ) = (cid:80) | S ( d ) | i =1 | T ntv ( S i ( d )) || S ( d ) | (27)Thus, a fraction of the sentences S i of the document d that containing the tentative-based words | T ntv ( S i ( d )) | is calculated by formula 27. Negation (Neg):
In rumors, the use of negative words refers to two purposes: (1) creating negativeemotions, (2) an unusual expression of the news event. In Persian language seven negative prefixes are usedto build words with negative or contrastive meaning. These prefixes are: (1)’bi-’/im-(e.g, impolite), (2)un-,in- (e.g., injustice), no-), (3)’zed-’/unti-(e.g., unti-security), (4)’gheir-’/un-(e.g., Unnecessary), (5)’ne-’/’na-’/not(e.g., nemidanam/I do not see, nasalem/unhealthy), (6) hich-/no- (e.g., nobody), (7)la-/without. f AmbNeg ( d ) = (cid:80) | S ( d ) | i =1 | N eg ( S i ( d )) || S ( d ) | (28) | N eg ( S i ( d )) | is the number of sentences containing the negative prefixes in formula 28. Anticipation (Ancpnt):
Anticipation-based words are words that (1)coming or acting in advance (forexample, clouds anticipant of a storm). (2) Expectant (for example, anticipating: a team anticipant ofvictory). Many people are interested in predicting many events, so they try to anticipate the most likelyproblems, but it is impossible to be prepared for each eventuality. The rumormonger also intends to createfear and turmoil in the community by anticipating unpopular events that have not yet happened. f AmbAncpnt ( d ) = (cid:80) | S ( d ) | i =1 | Ancpnt ( S i ( d )) || S ( d ) | (29) Ancpnt ( S i ( d )) is a binary value that indicate whether the sentences S i of the document d contains theanticipate-related words or not. Example Words (EW):
These words in a sentence can provide more context and help to betterunderstand proper usage. Rumormonger uses these words to generalize the issue and get the audience’sattention. f AmbEW ( d ) = (cid:80) | S ( d ) | i =1 | EW ( S i ( d )) || S ( d ) | (30) EW ( S i ( d )) is a binary value that indicate whether the sentences S i of the document d contains theexample words or not. Conditional words (If ):
Conditional conjunctions can be a single word like ”if” or several words like”as long as”. Rumormonger uses different conditional conjunction to describe the necessary condition for17he occurrence of an issue. The use of different conditional conjunction can have a major impact on changingthe audience’s attitude towards something. f AmbIf ( d ) = (cid:80) | S ( d ) | i =1 | If ( S i ( d )) || S ( d ) | (31) If ( S i ( d )) is a binary value that indicate whether the sentences S i of the document d contains theconditional words or not. General Terms (GT):
The general term is the name of a group or a category of a set of things, people,ideas, and the likes. Rumormonger usually uses these terms to discuss an issue as a whole. Examples ofgeneral words include furniture, money, equipment, seasoning, and shoes. f AmbGT ( d ) = (cid:80) | S ( d ) | i =1 | GT ( S i ( d )) || S ( d ) | (32) GT ( S i ( d )) is a binary value that indicate whether the sentences S i of the document d contains the generalterms or not. Un Trust (UT):
The existence of words containing un trust in expressing news about famous peopleor important factors of society causes doubts about the subject in the mind of the audience. Un Trust wordsare words like lack of trust, distrust, suspicion, mistrust, doubt, disbelief, dubiety, wariness, and so on. f AmbUT ( d ) = (cid:80) | S ( d ) | i =1 | U T ( S i ( d )) || S ( d ) | (33) U T ( S i ( d )) is a binary value that indicate whether the sentences S i of the document d containing the UT-based words are calculated. Finally, the ambiguity score of a document is calculated by formula 34 where, wf j is weight of feature f Ambj and f Ambj ( d ) is the value of feature j in F Amb = { U cer, SV, QW, QM, EM,SA ques, P ro, T ntv, N eg, Antcpnt, Adv Exm, If, GT, U T } in the document d . Amb ( d ) = (cid:80) | F Amb | i =1 wf j × f Ambj ( d ) | F Amb | (34) The different features can have different levels of importance for prediction in classification problems.The purpose of feature weighting is to determine the degree of importance of each feature in predictingtwo classes FR and TR. So the weight of each feature will also be effective in calculating SPR. In this step,Particle Swarm Optimization (PSO) [40] is selected among two optimization algorithms including: PSO [40],Forest Optimization Algorithm (FOA) [41], to find optimal weights for each feature. Therefore, high-weightfeatures will be more effective in the classification results. The algorithm of feature weighting as follows:
Algorithm 1
Feature weighting algorithm Feeding algorithm by the extracted features in 5.2 Utilizing the cross-validation method to separate dataset into training and testing set. Setting up parameters of PSO for each training set, generating randomly all particles’ positions andvelocity, setting up the learning parameters, the inertia weight, and the maximum number of iterations. Updating the velocities of all particles at each iteration. Training SVM classifier according to particles values. Calculating the corresponding fitness function for each particle. Obtaining the best gene weights and best kernel parameters values. Training SVM classifier with obtained parameters. Updating the inertia weight and return to step 4.18 able 5: Distribution of Persian rumors datasets.
Dataset FR TR Description
Twitter[22] 783 783 Crawling Twitter rumors from two Iranian websites, Gomaneh.com andShayeaat.ir which publish Persian rumors and annotating by Zamani et al..Telegram[24] 882 882 Crawling Telegram rumors from three Telegram channels of Iranian websites,Gomaneh.com, Wikihoax.org, and Shayeaat.ir. Also, several Telegram channels(i.e., Fars News Agency, Iranian Students’ News Agency (ISNA), Tasnim NewsAgency, Tabnak, Nasim News Agency (NNA), Mehr News Agency (MNA), Is-lamic Republic News Agency (IRNA)) has been crawled to extract non-rumors.
Finally, the SPR score of document d is calculated based on the multiplication of two scores importanceand ambiguity (eq. 4)that have been calculated in the previous sections.
6. Experiments and Results
In this section, three experiments are performed on datasets of Twitter and Telegram to evaluate SPRas a new proposed factor and answer to the research questions: (1) investigating the significant difference ofeach feature between two categories of FRs and TRs. (2) measuring SPR on Twitter and Telegram rumors.(3) evaluating the performance of SPR in validating rumors. In the experiments presented subsequently,10-fold cross-validation is used. The experimental evaluation metrics such as Accuracy, Precision, Recall,and F1 measure are used to evaluate the performance of the classifier in identifying rumors in two classes ofFR and TR using the Random Forest (RF) classifier. In this section, the experimental details are described.
This study evaluates SPR on Persian rumors from two different sources: Twitter and Telegram. Thedetails of these two datasets in table 5 is described.
Twitter is a micro-blogging social network service where users can publish and exchange short messagesof up to 280 characters long; these messages are called tweets. Accessibility, speed, and ease-of-use havemade Twitter a valuable social medium for a variety of purposes that its use is exponentially growing. Weutilized the Persian Twitter dataset introduced by Zamani et al. [22] with the aim of rumor detection.
Telegram is an instant messaging service. Due to the popularity of Telegram in Iran and the disseminationof messages through it, we considered it to evaluate our work. Jahanbbakhsh et al. [24] provided a PersianTelegram dataset for rumor detection. This dataset is a few thousand Persian Telegram posts in two classesTR and FR on various topics, which have been crawled and extracted by using provided API by ComInSyslab of the University of Tabriz and available in [42]. The statistical analysis is performed using the T-test on each feature to understand the impact of 42features in calculating SPR. Also, the distribution of features in both TR and FR categories is representedby boxplots. able 6: The result of t-test for 42 proposed features (those values that are greater than 0.05 are italicized). ETag Fr Su Dsg Sad An Aff MV
P-value 0.952
CW CC PS NS SA Thrt SA Req Adj Sup Adj Cmp
P-value
Strt End RT SI NE LD Cer SA Dec
P-value
SA Quot Adj Ord SM Ucer SV QW QM EM
P-value
SA Ques Pro Tntv Neg Antcpnt Adv Exm If GT
P-value
UT Amb Imp
P-value
Table 7: Independent T-test values for SPR
Levene’s Testfor Equalityof Variances t-test for Equality of MeansF Sig t df Sig.(2-tailed) Mean Dif-ference Std. ErrorDifference 95% ConfidenceInterval of theDifferenceLower Upper S P R Equal variancesassumed 15.188 0.000 4.835 1233
Since our samples are independent, an independent samples T-test is run on 42 features. An independentsamples T-test compares the means and P-value of each feature for two groups FR and TR. NULL hypothesisis rejected if
P < .
05. In this study, the null hypothesis is defined as follows:
Null hypothesis:
The spread power of FRs is equal to the TRs.With the hypothesis that each feature appears with a different frequency in FRs and TRs and candiscriminate between them, the P-value is calculated for each feature listed in Tables 2, 3, and 4. TheP-value results ( < = 0 .
05) demonstrate that most of these features reveal statistically significant differencesbetween FR and TR documents. Questions 2 & 3 in section 2 are answered based on p-value results of”Amb” and ”Imp”. It is indicated that introduced features for computing ”Amb” and ”Imp” are effective.Table 7 demonstrates the result of the T-test for the SPR feature. Since p − value = 0 . ≤ .
05, thenull hypothesis is rejected for SPR, so it shows that there is a significant difference between the spread powerof two classes TR and FR. So SPR can be used as a feature in the rumor identification task. This result isthe answer to question 4 of section ?? on the ability of SPR in distinguishing between TRs and FRs. In this section, the distribution of introduced features for computing the SPR is displayed using theboxplots in two categories TR and FR. The boxplot is a standardized way of displaying the data distributionbased on the summary of five numbers: minimum, first quartile, median, third quartile, and maximum.Graphical representation of the distribution of features in three categories ”Emotional”, ”Newsworthy”, and”Ambiguity” is shown in Figures 2, 3 and 4, respectively. Besides, Figure 5 illustrates the discriminativecapacity of five factors of ”Emotional”, ”Newsworthy”, ”Importance”, ”Ambiguity”, and ”SPR” in twoclasses of FR and TR. As shown in the boxplot diagram, the three features of ambiguity, emotional, andSPR in the FRs have a high distribution than TRs. 20 igure 2: The illustration of the distribution of emotional features by boxplots in two classes of FR (0) and TR (1).Figure 3: The illustration of the distribution of newsworthy-based features by boxplots in two classes of FR (0) and TR (1). igure 4: The illustration of the distribution of ambiguity-based features by boxplots in two classes of FR (0) and TR (1).Figure 5: The illustration of the distribution of five features (Emo, Nws, Imp, Amb, and SPR) by box plots in two classes ofFR (0) and TR (1). able 8: Comparing the the average values of importance (Imp.), ambiguity (Amb.) and spread power of rumors (SPR) in twocategories FRs and TRs on Twitter and Telegram. Dataset Category Avg. Imp. Avg. Amb. Avg. SPR
Twitter [22] FR 0.217 0.137 0.135TR 0.274 0.114 0.103Telegram [24] FR 0.326 0.274 0.242TR 0.361 0.269 0.218
Table 9: Result of precision, recall and F-score measures of RF classifier based on proposed features to compute the SPR (withand without feature weighting by PSO.
Dataset Category Precision (with/without)
Recal (with/without)
F-measure (with/without)Twitter FR 0.772 / 0.750 0.746 / 0.712 0.759 / 0.730TR 0.754 / 0.726 0.780 / 0.763 0.766 / 0.743Avg. 0.763 / 0.738 0.763 / 0.738 0.763 / 0.737Telegram FR 0.791 / 0.742 0.814 / 0.781 0.802 / 0.760TR 0.825 / 0.768 0.803 / 0.729 0.814 / 0.751Avg. 0.808 / 0.755 0.808 / 0.755 0.808 / 0.755
We also investigated the average of the SPR on FRs and TRs in Twitter and Telegram datasets. There-fore, first, the spread power of 1566 FR and TR on Twitter and 1764 FR and TR on Telegram is calculated.Then, the average of SPR is gained for each dataset in two categories of TRs and FRs. The results of thestatistical analysis (Table 8) on these two datasets showed that the average of the spread power of FRs ismore than TRs in both datasets. Therefore, it can be concluded that the characteristics of a fast-spreadingrumor in FRs are more than TRs. As shown in Table 8, the average propagation power in the Twitter dataset is lower than the Telegram data set. The reason for this is that the length of tweets is limited, so littlecontent information is extracted from it.
In this section, two experiments are performed to show the importance of feature weighting in the rumordetection task: (1) Rumor detection based on the features presented in Tables 2, 3, and 4. (2) Rumordetection based on the features of the Tables 2, 3, and 4 that weighted by PSO. Table 9 shows the resultsof these two experiments. The experimental results showed some of these features could not distinguish FRfrom TR and the weights assigned by the PSO to these features is also low.
This experiment aims to answer Question 4 in section 3. For this purpose, two experiments are carriedout to assess the effect of the SPR score in the rumors classification. In the first experiment, the classificationof two classes of FR and TR is performed based on the set of content-based features. These features includea set of features used in previous studies and a set of new features proposed in this study (Tables 2, 3,and 4). In the second experiment, the SPR factor is added to these feature set, and the process of rumorsclassification is done using the RF classifier.Table 10 shows the result of the evaluation metrics of Precision (P), Recall (R), and F-measure (F1) toevaluate the SPR and its impact on rumors detection. As shown in Table 10, the SPR factor as a newfeature has been instrumental in classifying rumors, and the F-measure has been improved from 0.762 to0.828. 23 able 10: The effect of SPR in rumor detection on Telegram dataset using RF classifier.
TP Rate FP Rate P R F1(1) Content features
FR 0.753 0.230 0.766 0.753 0.759TR 0.770 0.247 0.757 0.770 0.764Avg. 0.762 0.238 0.762 0.762 0.762 (2) Content features + SPR
FR 0.802 0.145 0.802 0.846 0.824TR 0.855 0.198 0.855 0.812 0.833Avg. 0.828 0.172 0.828 0.829
Table 11: Comparison of the proposed method with previous methods to detect Persian rumors based on content-based featuresanalysis.
Method Twitter Telegram
TR FR Avg TR FR AvgZamani et al. [22] Pr 0.568
Re 0.780 0.750 F1 The performance of the proposed model is evaluated with two works done to identify Persian rumors byZamani et al. [22] and Jahanbakhsh et al. [24] on two datasets Twitter and Telegram. The proposed methodof Zamani et al. [22] is based on three sets of content, user, and structural features. We re-implementedtheir work based solely on content features (i.e., about 50000 frequent Twitter unigrams) to evaluate ourmodel with Zamani et al. [22]. But when the number of frequent words is considered to be more than 500words, unlike the TR class, the rumor sets of the FR class are not properly classified.Jahanbakhsh et al. [24] used a set of common content features (such as negative and positive sentiment,negation, uncertainty, and certainty-related words, lexical diversity, pronoun, depth of dependency tree,word and sentence length, punctuation, number of words and sentences, adjective, adverb, and verb) alongwith the feature SA to detect Persian Telegram rumors. We also evaluated this work on Twitter Persianrumor [22]. Table 11 shows the results of the comparison of the proposed model, Zamani et al. [22],and Jahanbakhsh et al. [24] on both available datasets (Table 5). The average F-measure of our modelto recognize Twitter and Telegram rumors is 0.763 and 0.828, respectively. These results are satisfactorycompared to both Zamani et al. [22] and Jahanbakhsh et al. [24] works. Based on these results, it can beclaimed that the SPR criterion has been able to have a favorable effect on the classification of rumors alongwith the content and SA features.
7. Conclusion
The use of strong, emotional, and affective expressions in the content of a document has a significantimpact on the power of its publication, especially on social media. Determining the spread power of infor-mation available on online media is an unaddressed and new task in the field of rumors analysis. The ”basiclaw of rumor” theory about rumor power has been proposed by Allport and Postman. This study proposeda content-based model to compute the Spread Power of Rumor (SPR) for the first time. For this purpose, a24et of features introduced to measure the Ambiguity and Importance of documents. T-test results indicatedthat these features are effective in distinguishing TRs and FRs. The result of the T-test on SPR was alsosatisfactory and showed that SPR has significant deference between TRs and FRs. Thereby, the basic lawof rumor is confirmed based on the results of the defined problem. One of the purposes of SPR evaluation isto use it as a feature in distinguishing rumors. The experimental results showed that the SPR score, alongwith other content features, can be effective in distinguishing TRs and FRs.
8. Acknowledgements
This project is supported by a research grant of the University of Tabriz (number S/806).
References [1] N. DiFonzo, P. Bordia, Rumor psychology: Social and organizational approaches., American Psychological Association,Washington, 2007. doi:10.1037/11503-000 .URL http://content.apa.org/books/11503-000 [2] Z. Jin, J. Cao, H. Guo, Y. Zhang, Y. Wang, J. Luo, Detection and analysis of 2016 us presidential election related rumorson twitter, in: D. Lee, Y.-R. Lin, N. Osgood, R. Thomson (Eds.), Social, Cultural, and Behavioral Modeling, SpringerInternational Publishing, 2017, pp. 14–24. doi:10.1007/978-3-319-60240-0_2 .[3] S. M. Alzanin, A. M. Azmi, Detecting rumors in social media: A survey, Procedia Computer Science 142 (2018) 294 –300, arabic Computational Linguistics. doi:https://doi.org/10.1016/j.procs.2018.10.495 .URL [4] G. W. Allport, L. Postman, The psychology of rumor, Vol. 257, John Wiley & Sons, Ltd, 1947. doi:10.1002/1097-4679(194710)3:4<402::AID-JCLP2270030421>3.0.CO;2-T .URL https://doi.org/10.1002/1097-4679(194710)3:4%3C402::AID-JCLP2270030421%3E3.0.CO;2-T [5] Q. Li, Q. Zhang, L. Si, Y. Liu, Rumor detection on social media: Datasets, methods and opportunities (2019). arXiv:1911.07199 .[6] J. Harsin, The rumour bomb: Theorising the convergence of new and old trends in mediated us politics, Southern Review:Communication, Politics & Culture 39 (1) (2006) 84–110.URL https://trove.nla.gov.au/work/39395353?q{&}versionId=52241152 [7] K. P. K. Kumar, G. Geethakumari, Detecting misinformation in online social networks using cognitive psychology, Human-centric Computing and Information Sciences 4 (1) (2014) 14. doi:10.1186/s13673-014-0014-x .URL https://hcis-journal.springeropen.com/articles/10.1186/s13673-014-0014-x [8] X. Zhang, A. A. Ghorbani, An overview of online fake news: Characterization, detection, and discussion, InformationProcessing & Management (2019) 102025 doi:https://doi.org/10.1016/j.ipm.2019.03.004 .URL [9] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, J. Gao, Deep learning based text classification: Acomprehensive review (2020). arXiv:2004.03705 .[10] C. Castillo, M. Mendoza, B. Poblete, Information credibility on twitter, in: Proceedings of the 20th international conferenceon World wide web - WWW ’11, ACM Press, New York, New York, USA, 2011, p. 675. doi:10.1145/1963405.1963500 .URL http://portal.acm.org/citation.cfm?doid=1963405.1963500 [11] S. Kwon, M. Cha, K. Jung, W. Chen, Y. Wang, Prominent features of rumor propagation in online social media, in: 13thIEEE International Conference on Data Mining (ICDM’2013), Dallas, Texas, U.S.A, IEEE, 2013.URL [12] F. Yang, Y. Liu, X. Yu, M. Yang, Automatic detection of rumor on sina weibo, in: Proceedings of the ACM SIGKDDWorkshop on Mining Data Semantics - MDS ’12, ACM Press, New York, New York, USA, 2012, pp. 1–7. doi:10.1145/2350190.2350203 .URL http://dl.acm.org/citation.cfm?doid=2350190.2350203 [13] V. Qazvinian, E. Rosengren, D. R. Radev, Q. Mei, Rumor has it: identifying misinformation in microblogs, in: Proceedingsof the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2011,pp. 1589–1599.URL https://dl.acm.org/citation.cfm?id=2145602 [14] K. Wu, S. Yang, K. Q. Zhu, False rumors detection on Sina Weibo by propagation structures, in: 2015 IEEE 31stInternational Conference on Data Engineering, IEEE, 2015, pp. 651–662. doi:10.1109/ICDE.2015.7113322 .URL http://ieeexplore.ieee.org/document/7113322/ [15] S. Vosoughi, Automatic detection and verification of rumors on twitter, Ph.D. thesis, Massachusetts Institute of Technology(2015).URL https://dspace.mit.edu/handle/1721.1/98553 [16] S. Wang, T. Terano, Detecting rumor patterns in streaming social media, in: 2015 IEEE International Conference on BigData (Big Data), 2015, pp. 2709–2715. doi:10.1109/BigData.2015.7364071 .[17] A. Y. M. Floos, Arabic rumours identification by measuring the credibility of arabic tweet content, International Journalof Knowledge Society Research 7 (2) (2016) 72–83. doi:10.4018/ijksr.2016040105 .
18] Z. Zhao, P. Resnick, Q. Mei, Enquiring minds: Early detection of rumors in social media from enquiry posts, in: Proceed-ings of the 24th International Conference on World Wide Web, WWW ’15, International World Wide Web ConferencesSteering Committee, Republic and Canton of Geneva, Switzerland, 2015, pp. 1395–1405. doi:10.1145/2736277.2741637 .URL https://doi.org/10.1145/2736277.2741637 [19] Y. Liu, S. Xu, Detecting rumors through modeling information propagation networks in a social media environment, IEEETransactions on Computational Social Systems 3 (2) (2016) 46–62. doi:10.1109/TCSS.2016.2612980 .[20] A. Zubiaga, M. Liakata, R. Procter, Exploiting context for rumour detection in social media, in: International Conferenceon Social Informatics, Springer, 2017, pp. 109–123.[21] S. Kwon, M. Cha, K. Jung, Rumor detection over varying time windows, PLOS ONE 12 (1) (2017) 1–19. doi:10.1371/journal.pone.0168344 .URL https://doi.org/10.1371/journal.pone.0168344 [22] S. Zamani, M. Asadpour, D. Moazzami, Rumor detection for persian tweets, in: 2017 Iranian Conference on ElectricalEngineering (ICEE), IEEE, 2017, pp. 1532–1536. doi:10.1109/IranianCEE.2017.7985287 .URL http://ieeexplore.ieee.org/document/7985287/ [23] M. Zarharan, S. Ahangar, F. S. Rezvaninejad, M. L. Bidhendi, S. S. Jalali, S. Eetemadi, M. T. Pilehvar, B. Minaei-Bidgoli,Persian stance classification dataset.[24] Z. Jahanbakhsh-Nagadeh, M.-R. Feizi-Derakhshi, A. Sharifi, A speech act classifier for persian texts and its applicationin identifying rumors, Journal of Soft Computing and Information Technology (JSCIT) 9 (1) (2020).[25] A. Kumar, S. R. Sangwan, A. Nayyar, Rumour veracity detection on twitter using particle swarm optimized shallowclassifiers, Multimedia Tools and Applications 78 (17) (2019) 24083–24101. doi:10.1007/s11042-019-7398-6 .URL https://link.springer.com/article/10.1007/s11042-019-7398-6 [26] L. Zeng, K. Starbird, E. S. Spiro, Rumors at the speed of light? modeling the rate of rumor transmission during crisis,in: 2016 49th Hawaii International Conference on System Sciences (HICSS), IEEE, 2016, pp. 1969–1978. doi:10.1109/HICSS.2016.248 .URL http://ieeexplore.ieee.org/document/7427429/ [27] B. Doer, M. Fouz, T. Friedrich, Why rumors spread so quickly in social networks, Communications of the ACM 55 (6)(2012) 70. doi:10.1145/2184319.2184338 .URL http://dl.acm.org/citation.cfm?doid=2184319.2184338 [28] S. D. Mahmoodabad, S. Farzi, D. B. Bakhtiarvand, Persian rumor detection on twitter, in: 2018 9th InternationalSymposium on Telecommunications (IST), 2018, pp. 597–602.[29] L. Zhou, J. K. Burgoon, J. F. Nunamaker, D. Twitchell, Automating linguistics-based cues for detecting deception intext-based asynchronous computer-mediated communications, Group Decision and Negotiation 13 (1) (2004) 81–106. doi:10.1023/B:GRUP.0000011944.62889.6f .URL http://link.springer.com/10.1023/B:GRUP.0000011944.62889.6f [30] S. Hamidian, M. T. Diab, Rumor detection and classification for twitter data (2019). arXiv:1912.08926 .[31] S. M. Mohammad, P. D. Turney, Crowdsourcing a word-emotion association lexicon, Computational Intelligence 29 (3)(2013) 436–465. doi:10.1111/j.1467-8640.2012.00460.x .URL http://doi.wiley.com/10.1111/j.1467-8640.2012.00460.x [32] A. Golfam, A. Afrashi, G. Moghadam, Conceptualization of the persian simple verbs of motion: a cognitive approach,Journal of Language and Western Iranian Dialects 1 (3) (2014) 103–122.URL http://jlw.razi.ac.ir/article_275_en.html [33] S. Kwon, M. Cha, K. Jung, W. Chen, Y. Wang, Aspects of rumor spreading on a microblog network, in: InternationalConference on Social Informatics, Springer, Cham, 2013, pp. 299–308. doi:10.1007/978-3-319-03260-3_26 .URL http://link.springer.com/10.1007/978-3-319-03260-3{_}26 [34] C. R. Sunstein, On rumors : how falsehoods spread, why we believe them, and what can be done, Princeton UniversityPress, 2014.[35] W. Zhang, S. Skiena, Trading Strategies to Exploit Blog and News Sentiment, in: Fourth International Conference onWeblogs and Social Media, ICWSM 2010, Washington, DC, USA, 2010.URL [36] What Makes a Story Newsworthy?URL [37] Z. Zhang, Z. Zhang, H. Li, Predictors of the authenticity of internet health rumours, Health Information & LibrariesJournal 32 (3) (2015) 195–205. doi:10.1111/hir.12115 .URL http://doi.wiley.com/10.1111/hir.12115 [38] H. Moradi, F. Ahmadi, M.-R. Feizi-Derakhshi, A hybrid approach for persian named entity recognition, Iranian Journalof Science and Technology, Transactions A: Science 41 (1) (2017) 215–222. doi:10.1007/s40995-017-0209-x .URL http://link.springer.com/10.1007/s40995-017-0209-x [39] N. K. U. Nkpa, Rumor mongering in war time, The Journal of Social Psychology 96 (1) (1975) 27–35. doi:10.1080/00224545.1975.9923258 .URL [40] J. Kennedy, Particle swarm optimization, in: Encyclopedia of Machine Learning, Springer US, Boston, MA, 2011, pp.760–766. doi:10.1007/978-0-387-30164-8_630 .URL [41] M. Ghaemi, M.-R. Feizi-Derakhshi, Forest optimization algorithm, Expert Systems with Applications 41 (15) (2014) 6676– doi:10.1016/J.ESWA.2014.05.009 .URL [42] A.-R. Feizi-Derakhshi, M.-R. Feizi-Derakhshi, M. Ranjbar-Khadivi, N. Nikzad–Khasmakhi, M. Ramezani, T. RahkarFarshi, E. Zafarani-Moattar, M. Asgari-Chenaghlu, Z. Jahanbakhsh-Nagadeh, Sepehr RumTel01 (jan 2019). doi:10.17632/JW3ZWF8RDP.2 .URL https://data.mendeley.com/datasets/jw3zwf8rdp/2https://data.mendeley.com/datasets/jw3zwf8rdp/2