[PDF] Uncritical polarized groups: The impact of spreading fake news as fact in social networks

Abstract

The spread of ideas in online social networks is a crucial phenomenon to understand nowadays the proliferation of fake news and their impact in democracies. This makes necessary to use models that mimic the circulation of rumors. The law of large numbers as well as the probability distribution of contact groups allow us to construct a model with a minimum number of hypotheses. Moreover, we can analyze with this model the presence of very polarized groups of individuals (humans or bots) who spread a rumor as soon as they know about it. Given only the initial number of individuals who know any news, in a population connected by an instant messaging application, we first deduce from our model a simple function of time to study the rumor propagation. We then prove that the polarized groups can be detected and quantified from empirical data. Finally, we also predict the time required by any rumor to reach a fixed percentage of the population.

Full PDF

UUncritical polarized groups: The impact of spreadingfake news as fact in social networks

Jes´us San Mart´ın a , F´atima Drubi b , Daniel Rodr´ıguez P´erez c a Dept. Matem´aticas del ´Area Industrial, ETSIDI, Universidad Polit´ecnica de Madrid,Madrid, Spain b Dept. Matem´aticas, Facultad de Ciencias, Universidad de Oviedo, Oviedo, Spain c Dept. F´ısica Matem´atica y de Fluidos, Facultad de Ciencias, Universidad Nacional deEducaci´on a Distancia, Madrid, Spain

Abstract

The spread of ideas in online social networks is a crucial phenomenon to un-derstand nowadays the proliferation of fake news and their impact in democ-racies. This makes necessary to use models that mimic the circulation ofrumors. The law of large numbers as well as the probability distribution ofcontact groups allow us to construct a model with a minimum number ofhypotheses. Moreover, we can analyze with this model the presence of verypolarized groups of individuals (humans or bots) who spread a rumor as soonas they know about it. Given only the initial number of individuals who knowany news, in a population connected by an instant messaging application, weﬁrst deduce from our model a simple function of time to study the rumorpropagation. We then prove that the polarized groups can be detected andquantiﬁed from empirical data. Finally, we also predict the time required byany rumor to reach a ﬁxed percentage of the population.

Keywords: online social network, fake news, rumor propagation, uncritical

Email address: [email protected] (Jes´us San Mart´ın)

Preprint submitted to Elsevier October 18, 2019 a r X i v : . [ c s . S I] O c t enders group

1. Introduction

The extraordinary global increase in online social network usage, and theease of sending messages ubiquitously and almost instantaneously, have pro-vided a fertile ground for the dissemination of fake news [1, 2]. In addition,the use of these platforms can inevitably have deep social and political con-sequences. For instance, a disinformation campaign became a topic of widepublic concern during the Brexit referendum in the UK, as well as duringthe US presidential election, both in 2016 [3, 1, 4].The seriousness of the situation due to the current and growing phe-nomenon of fake news lies not only in the speed and ease of reaching broadand very diﬀerent strata of society but also in the negligible cost of this newform of worldwide interactive communication. This is a novel and simulta-neously complicated situation that makes it necessary to study rumor propa-gation models in order to design appropriate countermeasures and avoid, forexample, their potential impact on destabilization of liberal democracies [2].Nevertheless, online social networks cannot be considered solely as meremedia responsible for the propagation of fake news, since ﬁrst they work inan inverse manner spreading real news. For this reason, understanding howideas spread in social networks, as an element of both economic and politicalmarketing, becomes a fundamental task. Companies, as well as politicalparties, that are interested in promoting their products or disseminatingtheir ideas to the population, can beneﬁt from the use of new and cheapercommunication channels based on mobile applications such as WhatsApp2with millions of users worldwide), which are much more proﬁtable thantraditional marketing strategies, publicity campaigns, etc.In order to avoid the destabilizing eﬀects due to the spread of fake news, itis necessary to provide a suitable propagation model of rumors. In this sense,it is essential to ﬁrst identify the fundamental variables that characterize thisphenomenon. Given that there are great similarities between the spread ofnews and rumors and the spread of infectious diseases, a classic approach tostudy how information is disseminated is to deﬁne epidemiological models ofpopulations [5, 6]. Additionally, other studies focused on identifying the mosteﬃcient spreaders in a network [7]. In [8], two models were presented withthe assumption that either spreaders are not always active or an ignorantis not interested in spreading the rumor. The eﬀect of homogeneity andpolarization, i.e. echo chambers, in the spreading of misinformation onlinewas studied in [9]. Finally, why rumors spread so quickly in social networkswas analyzed in [10]. It should be noted that an element common to allthese diﬀerent approaches is that it is necessary to assume certain more orless believable hypotheses.In this study, we present a news dissemination model base on a proba-bilistic approach that allow us to use the law of large numbers to determinethe probability function of sending messages. To the best of our knowledge,there is no similar result for rumor spread models and we can only refer tothe discussion in [11]. One of our main contributions is that additional un-realistic hypotheses are not necessary. Therefore, we provide a more generalrumor propagation model, which is also robust, without any limitation dueto simpliﬁcations in the hypotheses. 3n addition to reducing the number of necessary hypotheses, another ma-jor issue when modelling is to obtain a faithful description of the reality that,in the particular case of social networks, is manifested by its heterogeneity.In our model, this heterogeneity is reﬂected in the fact that people are relatedthrough groups, with diﬀerent sizes and not with the same sensitivity to thepropagation of a message. In particular, the strongly polarized like-mindedpopulation groups are one of the key elements to amplify the spread of ru-mors. People belonging to this type of group will forwa all kind of news (falseor not) with the sole criterion that they are related to their own ideologicalline.It is interesting to note that news dissemination models hardly take intoaccount the diﬀerent reactions that people show when faced with true andfalse news. In this sense, there are two underlying problems for any model:the ﬁrst refers to the diﬀerent reaction of people to the veracity/falsity of thenews and the second is related to how the certainty of the news is validated.In order to give an answer to these problems with our model, we simplyfocus on the propagation of the message. Thus, the authenticity or falsityof the message will be determined solely by the probability of propagation.Note that, except in very speciﬁc cases such us “the president has beenkilled”, the information is judged according to the sender and is consideredtrue based only on the idiosyncrasy of the person who receives it (similarlyto those people who belong to the same ideological line will do it). Onthe contrary, that information will be considered false by those who have adiﬀerent ideology, opposed to the sender’s. The continuous variations of theideological spectrum in the population imply the need to constantly evaluate4he truth/falsity of a message. This phenomenon can be perfectly describedby means of the diﬀerent probabilities of propagation of the message: themore inclined the population is towards the conservatives, the greater theprobability of spreading news that damages the image of the liberals andvice versa.In fact, the probability of news propagation is a fundamental parameterwhen modeling the dissemination of ideas, since, if most of the individuals in anetwork propagate a message with low probability but a group, which we callthe group of uncritical senders (USG), does so uncritically with probability100%, the USG will be identiﬁed as a very polarized group.The USG label is surely well deﬁned in the sense that humans, not robots,accelerate the spread of false news more than the truth [12]. The fact thatpeople react diﬀerently when they receive the news, mainly depending onwhether or not they have an ideological aﬃnity, makes the probability ofnews propagation a key parameter to present and analyze our results.We must be aware that social networks are dynamic, that is, they are con-stantly changing. Because of this, the proposed models can only be useful ifthey can show how the spread of news changes when the relevant parametersof those models vary continuously. That information must be provided, aswe will, with an analytic formulation. Therefore, we present a predictablemodel that can be used to eliminate or mitigate the dangerous consequencesof spreading fake news as, for instance, biasing the vote in government elec-tions, misallocating resources after natural disasters or terrorist actions, mis-guiding in the investment measures after the stock market crash, etc. Allthis with a great political, social and economic impact, due to the fact that5he number of people who are currently only informed of the news throughtheir social networks is increasing.Finally, given that it is a well-known fact that news does not necessarilyhave to be disseminated from a single source, but that there may be dif-ferent sources from which the same news spreads in cascade to the entiresocial network, we work with our propagation model of rumors but usingdiﬀerent initial conditions, that is, diﬀerent seeds are considered in the pop-ulation. These seeds represent diﬀerent groups of individuals that initiatethe propagation of the same rumor in the network.Our results show how the distributions of the propagation probability ofnews in social networks change over time as a function of polarized groups ofuncritical senders. Particularly, we found that the probability of spreadingrumors varies from an exponential evolution to a logistic one. As a result,simply observing how fake news is spread in a social network, we can detectwith our model the presence of a polarized group of uncritical individuals,which becomes a very useful tool to design countermeasures to deactivatesuch groups, if necessary.The rest of this paper is structured as follows. In section 2, we present amodel of rumor propagation in a social network based on WhatsApp. This isa numerical model deﬁned from empirical data. In particular, the standarddistributions of the number of person-to-person contacts and of group sizeswere estimated experimentally. In addition, the key parameters that allowsus to analyze the phenomena of spreading rumors are identiﬁed and theirvalues are also estimated. In section 3, analytic expressions that ﬁt thenumerical simulations are deduced. Based on these expressions, a theoretical6odel is proposed that captures the observed behavior and sets the basisto interpret the dynamics that led to those results. In section 4, we discussthe potential applications of our model and its predictive capabilities. Weconclude the paper in section 5.

2. Methodology

Taking into account that WhatsApp is one of the most popular messagingapplication all over the world (e.g. it reaches a 70% of the total populationin Spain ), we simulate our rumor propagation model in a social networkbased on this application. In order to properly simulate message spreadingover this network, we must distinguish two types of contacts, person-to-person and groups. We also need to know the standard distributions of thenumber of person-to-person contacts and of group sizes. Both were estimatedexperimentally as described below. A sample of 150 college students (age range 18-20 years) was used to ob-tain the statistical distribution of person-to-person and groups in the What-sApp network. Individuals were asked about the number of contacts in theircell phone contact lists, the number of WhatsApp groups with three or moremembers and the sizes of those groups (number of members including them-selves). Moreover, they were asked about the number of frequent individualWhatsApp contacts, deﬁned as those to whom they would text with a con-troversial message they learned from any source.

7n any contact list, there are many contacts that do not belong to anygroup (meaning by group a set of persons linked by a common interest).Hence, we posed the following question: in case they learned about a partic-ularly “hot news”, to whom of their contacts would they expressly text forcommenting on it? Obviously, that “hot news” would not be sent to theirattorney or plumber and, depending on the topic, neither to a relative. Thus,we deﬁne the groups of size two as those contacts in the contact list that,although they may be members of any WhatsApp group, would receive amessage directly from the user about a news considered especially relevant.Only these size-two groups constitute the set of person-to-person links ofeach WhatsApp user.From our sample data, we ﬁnd out that the number m of person-to-personcontacts an individual has follows approximately a normal distribution N ( m, µ, σ ) = 12 πσ e − ( m − µ )22 σ (1)where µ = 7 .

35 and σ = 4 . ≤ N ≤

30 members (thatincluded the majority of our samples), ﬁts an exponential distribution E ( N ) = 1 − exp( − λ ( N − a )) (2)where coeﬃcients are adjusted from the data as λ = 0 . ± . a = 1 . ± .

12 ( r = 0 . E ( N ) N E e s t ( N ) E(N)

Figure 1: WhatsApp empirical group size distribution (left) and the goodness of ﬁt of thecumulative distribution function (right) to an exponential ( r = 0 . N represents thenumber of individuals in a group (3 ≤ N ≤ E ( N ) represents the empirical distributionfunction, i.e. the fraction of groups of size ≤ N , and E est ( N ) is the estimation of E ( N ). In this paper, we introduce a novel model where individuals are linkedeither by person-to-person relations or by belonging to the same WhatsAppgroup. These links ﬁt the distributions obtained in section 2.1. It is worthnoticing that, initially, a given fraction of population, named the seed, knowsa rumor. This rumor may spread to other linked individuals at each itera-tion during the numerical simulations. The rumor propagation proceedsiteratively taking into account two main rules. First, the individual propaga-tion probability at each iteration is given by a uniform distribution. Second,there may also be a group of individuals, the so-called uncritical sendersgroup, which always propagates the rumor at each iteration, i.e. with anindividual propagation probability of 100%.As the numerical simulations were repeated independently a large numberof times and then we averaged the result, the law of large numbers guaran-tees that the ﬁnal law of distribution followed by the rumor propagation is9etermined as well as the time required for a rumor to reach a given fractionof the population. An important feature of the model is that it shows whena population is polarized regarding a given rumor.For the sake of enabling a better understanding of the model that isdeveloped in detail below, we provide here some useful deﬁnitions:Burned: A person who knows the rumor is said to be a burned individual.Observe that if an individual is burned, he remains in that state untilthe end of the rumor propagation process.Sender: A person who knows the message (a burned individual) and textsit to any of his contacts is called a sender.Receiver: Any individual (burned or not) in the population that is reachedby the message is said to be a receiver.Seed: All those individuals who know the rumor at the beginning of itspropagation constitute the seed, i.e. the set of burned individuals atthe initial time t = 0.Person-to-person relation: A direct link between two individuals of the pop-ulation is called a person-to-person relation. This relationship is singledout from the group relationship because it represents a one-way link.If an individual i is connected with another individual j , then j mayreceive text messages sent by i , but this does not imply that individual j could also send messages to i .Group: The set of three or more individuals in the population that areinterconnected, meaning that what one of the individuals sends to the10roup is received by all others in the group simultaneously.Individual propagation probability ( P IP ): The probability of the rumorbeing sent by a burned individual, who knows the rumor, to one ofits contacts (person-to-person probability) or groups in the WhatsAppnetwork.Individual initial probability ( P II ): The probability that individuals arepart of the seed. That is, the probability that an individual knows themessage at the beginning of the process of spreading the rumor.Uncritical senders group (USG): The set of individuals who automaticallysend the message to all of their contacts when they receive it is namedthe uncritical senders group. Hence, P IP is always equal to 100% forindividuals in the USG.USG membership probability ( P USG ): The probability that individualsbelong to the USG. It represents the fraction of the population thatforms the USG. The size of the USG is ﬁxed as an initial condition andonce ﬁxed, and randomly chosen the individuals that are a part of it,it does not change.

The model consists of two main steps. First, an algorithm that generatespopulations of connected individuals as described in section 2.1. Second,another algorithm simulating the spreading rumors among these populations.11 .3.1. Populations

In order to simulate a network with the statistical properties describedin section 2.1, we proceeded to implement an algorithm that reproducesthat structure. That is, given a population of

N p individuals, we establishconnections between pairs of individuals following the normal distribution(1) and also generate groups of 3 to 30 interconnected individuals with groupsizes given by (2).When come to consider group construction, we will separate groups formedby just 2 individuals from groups formed by 3 or more individuals; the rea-son is that in two-person groups, connections are not bidirectional (as weexplained above), whereas for groups with 3 or more individuals it is.We will assume a normal distribution (1) for the number of 2-groupsan individual belongs to, having the mean and standard deviation ﬁttedfrom our sample. From that distribution, the number of individuals n m that are expected to have a number m of person-to-person connections wascomputed, for m = 0 , , . . . ,

30. That is, ﬁrst the number of individuals inthe population having zero person-to-person contacts, n = N (0 , µ, σ ) wascomputed and then, recursively, the number of individuals having m person-to-person contacts n m = N ( m, µ, σ ) − N ( m − , µ, σ ) for m = 1 , . . . , n m individuals having m contacts were chosen randomly from thepopulation, linking them to other individuals also randomly chosen. Noticethat these relationships, by their very nature, are not bidirectional given thatthey are not proper WhatsApp groups, but the initial individual will sendmessages to the destination individual, but that does not have to work in thereciprocal. 12o compute the number of groups of size 3 or more an individual belongsto, we will use the exponential distribution in (2) with the parameters ﬁttedfrom our sample. Speciﬁcally, the number of groups of size i is calculated as E ( i ) − E ( i −

1) for i = 3 , , . . . , N . Once known the number of groups of eachsize (from 3 to 30 individuals), the individuals of the population belongingto each group are randomly chosen.Thirty populations with 10000 individuals each were simulated followingthe previous procedure, taking into account that only 70% of the individuals(the penetration of WhatsApp in Spanish population) should be connectedeither person-to-person or to groups of sizes 3 to 30. Each of these simulationsprovided a network of connections between individuals of the population. Once the population has been simulated (see above) the rumor spreadsamong its individuals according to the following rules:1. The fraction of the population that knows the rumor, that is the initialseed, is chosen initially from the connected population given by P IP .2. One individual belonging to the see is randomly chosen3. The groups to which that individual belongs are identiﬁed4. That individual will pass the rumor to each of these groups with aprobability P IP ; for this, a uniformly distributed random value in [0 , P IP Steps 2 and 3 are repeated until all seed individuals are exhausted.13. All the individuals that know the rumor become part of the seed, andthe propagation process is repeated 100 times; this number of iterationsis chosen because in most simulations the rumor reaches the entirepopulation before that time.Once the initial seed and the USG have been chosen, according to theindividual initial probability ( P II ), the uncritical senders group membershipprobability ( P USG ), and the individual propagation probability ( P IP ) thatcharacterize the initiation and propagation of the messages on the socialnetwork, the algorithm proceeds according to the following rules:i) The rumor is propagated among the individuals in the population for100 iterations (the simulation unit of time).ii) At each iteration, every burned individual in the population, that is,every individual who knows the rumor, will communicate it to each ofits contacts with probability P IP .iii) If an individual belongs to the USG and is burned, it will propagatethe message to all of its contacts (that is, P IP equals 1 for individualsin the U SG ).iv) After every iteration, burned individuals (and WhatsApp groups) arerecorded and counted.Given that message transmission is a random process, message spreadingalgorithm is run 50 times on each population using, for that population al-ways the same initial seed. The result after ﬁnishing each run is the series ofnumbers of burned individuals at each iteration. The ﬁnal result is summa-rized as the average number of these 50 series. Along the same lines, given14hat each population is computed as a random sample from a probabilitydistribution, in section 3 we will work with the results obtained by averagingover the 30 populations, in order to characterize the temporal evolution ofthe average number of burned individuals for each set of parameters P IP , P II and P USG .

3. Numerical results

To proceed with the simulations, the parameter space P II × P IP × P USG is sampled at 500 points given by the following tuples ( u, v, w ) where u, v ∈{ k · − : k = 1 , , . . . , } and w ∈ { k · − : k = 0 , , , , } . Each ofthese 500 points characterizes individuals and groups in a simulated popula-tion. For each of the 500 points, 30 such populations with 10000 individualseach were randomly generated following the algorithm described in 2.3. No-tice that although these populations are diﬀerent, they are statistically thesame because they were generated using the same parameters at the samepoint of the parameter space. For each of these 30 populations, a news wasspread over the network during 100 iterations, and this random process wasrepeatedly simulated 50 times, choosing a diﬀerent initial condition for eachof the 30 populations (the seed individuals who know the information at thebeginning of the propagation). The propagation time series was computedas the average of the 50 simulations for a given population, resulting in 30averaged spreading evolutions corresponding to the 30 statistically identicalpopulations. Finally, the average over the 30 populations was computed asthe propagation time series corresponding to one of the 500 points in theparameter space. 15he goal of this numeric modeling is to ﬁt an analytic expression thatsummarizes the spreading of the news corresponding to the model describedabove. P USG = 0To see the eﬀect polarized groups have on news spreading, it was ﬁrstneeded to know how news spread without those groups ( P USG = 0). In thissection these results are shown and an analytic expression ﬁtted to them.The analysis is based on data from simulations obtained as the aver-age number of burned individuals at the n -th iteration taking into accountall populations (30) and all 50 simulations performed on each population.That is, the behavior of the function corresponding to the average numberof burned individuals at each iteration is studied: f ( n ) = Total of burned individuals at n-th iterationno. of populations × no. of simulations × N p where N p is the number of individuals of the population connected by What-sApp, and n = 1 , . . . , P II ∈ [0 , .

1] and, for each ofthese values, the behavior for diﬀerent values of P IP ∈ [0 , .

1] is also studied.The simulation results f ( n ) (see Figure 2) picture the average evolutionof the fraction of burned individuals after successive iterations (indexed by n ) that approach a continuous function F ( t ) dependent of time t .For values of P I I around 10%, function F ( t ) is well approximated by anexponential of the form F exp ( t ) = 1 − A exp( − t/a )16 II = 3% P IP = 2% f ( n ) n f ( n ) n f ( n ) n f ( n ) n P IP = 1, 3, 5, 7, 9% f ( n ) n f ( n ) n f ( n ) n f ( n ) n f ( n ) n P II = 1, 3, 5, 7, 9% f ( n ) n Figure 2: Average number of burned individuals f ( n ) at each iteration n for diﬀerentvalues of P IP (left, P II = 3%) and P II (right, P IP = 2%). Fixed parameter: P USG = 0.The blue lines correspond to the ﬁtting expression (3) in each case. whereas for smaller values of P II a better ﬁt is obtained using a logisticfunction F log ( t ) = 11 + B exp( − t/a )Therefore, we construct a function that ﬁts the data in the entire range bychanging from one form to the other through a parameter (cid:15) : F ( t ) = C exp[(1 + (cid:15) )( t − b ) /a ] − (cid:15) − (cid:15) + C exp[(1 + (cid:15) )( t − b ) /a ] (3)where a , b , (cid:15) and C are the ﬁtted coeﬃcients whose values depend on simula-tion parameters P II and P IP (how, will be shown in what follows). Coeﬃcient a represents a characteristic time scale. Coeﬃcient b has the meaning of atime origin, thus can be taken as zero.To fulﬁll the initial condition F ( t = 0) = P II (the initial fraction is justthe seed), C had to be taken as a function of a , b , (cid:15) and P II given by: C ( a, b, (cid:15) ) = 1 + (cid:15) − (cid:15) P II (1 − P II ) exp[(1 + (cid:15) ) b/a ] (4)17rom the ﬁtting of the simulation data for diﬀerent P II and P IP , anexpression is given for (cid:15) in the form: (cid:15) = 1 + aaP II bb/a ) (5)where aa is less than zero, in order to produce an (cid:15) between 0 and 1. Likewise, a is ﬁtted to: 1 /a = cc P eeIP (6)which leads to a characteristic time a tending to inﬁnity, when the probability P IP tends to zero (that is, if the propagation probability is very small, thetime a rumour will take to spread over the entire network will become verylarge). The values of aa , bb , cc and ee can be found in table 1 (for P USG = 0).The power law (6) did not provide a good collapse of all the points towardsthe ﬁtting curve (see Fig. 5a). Thus, in order to achieve that collapse (seeFig. 5b), a factor (1+ ggP II ) is considered to correct for the small dependencythat a showed on P II , resulting in an expression of the form1 /a = cc P eeIP (1 + gg P II ) (7)Inserting into (3) the expressions for C , (cid:15) and a given by (4), (5) and (7)an expression for F ( t ) is obtained.It is remarkable that all the resulting expressions depend exclusively of P IP , P II and P USG so that the number of burned individuals at a giveniteration n , are given by (9) in terms of just these parameters. Hence, thenumber of burned individuals can be computed at any time, given a point( P USG , P IP , P II ) in the parameter space, or vice versa, that point in theparameter space can be computed once the time required to burn a givenfraction of the connected population has been ﬁxed.18 f ( n ) n f ( n ) n f ( n ) n f ( n ) n P US G = 1, 3, 5, 7, 10% f ( n ) n Figure 3: Average number of burned individuals f ( n ) at each iteration n for diﬀerentvalues of P USG . Fixed parameters: P II = 2% and P IP = 1%. The results of ﬁtting thesimulation data to the analytic expression (3) are also shown (blue lines). P USG (cid:54) = 0 : Eﬀect of the uncritical senders group

Assuming the presence of very polarized groups among the population,what we call the uncritical senders group, it is necessary to see how theirexistence modiﬁes the spread of news just described. The introduction ofthe USG aﬀects the evolution of the fraction of the population a rumorreaches at a given time, as the simulations show. Some results of thesenumerical simulations are shown in Figure 3, together with their respectiveﬁtting to expressions formally identical to (3), which shows its validity alsofor P USG (cid:54) = 0.The relationship (5) between (cid:15) and 1 /a obtained above for P USG = 0 isstill valid for P USG (cid:54) = 0, according to the numerical results (see Figure 4 andtable 1). Also expression (7) for 1 /a in terms of P IP and P II remains valid19 P II =1, 3, 5, 7, 9% ε a Figure 4: Fitted values of the coeﬃcient ε with respect to 1 /a for diﬀerent values ofparameter P II . Fixed parameter: P USG = 3%. The graph of the ﬁtted expression relatingboth coeﬃcients with P IP is also shown (blue lines). (see Figure 5 and table 1).Nevertheless, the eﬀect of considering a USG in the network makes thatthe coeﬃcients aa , bb , cc , ee and gg in equations (5) and (7) become functionsof P USG . The functional expressions of aa , bb , cc , ee and gg were derivedfrom the numerical results (for P USG ∈ [0 , . aa = aa + aa P USG + aa P USG bb = bb + bb P USG / (1 + bb P USG ) cc = cc + cc P USG + cc P USG ee = ee + ee P USG + ee P USG (8)where the values of the new coeﬃcients are given in Table 2. Substitutingexpressions (8) in equations (5) and (7), and these in equations (4) and (3),20 / a P PI / a P PI / a P PI / a P PI P US G =0, 3, 5, 7, 10% / a P PI h ( P I P ) P PI h ( P I P ) P PI h ( P I P ) P PI h ( P I P ) P PI P US G =0, 3, 5, 7, 10% h ( P I P ) P PI Figure 5: Left: Fitted values of the coeﬃcient 1 /a for diﬀerent values of P IP and P USG (and of P II ). The graph of the power law expression (6) is also shown (blue lines). Right:Values of h ( P II ) = /a ggP II for diﬀerent values of P IP and P USG . Taking into account theeﬀect of P II on 1 /a , the correction given by (7) is shown (green lines). Notice how thedata now collapse to the modiﬁed power law. the general equation for the evolution of F ( t ) is obtained, exclusively in termsof the parameters P IP , P II and P USG , as was our goal.

In this section we present a theoretical model that captures the behav-iors described above. For that, we will interpret our discrete time model interms of a continuous time model that may be derived from a diﬀerentialequation. For that, it is enough to see that P IP is a probability per unit timeof a message being propagated by one individual to another or to its group,being that unit time the time separating one iteration from the next one.Expression (3), describing the fraction of burned individuals as a function oftime turns out to be a solution of the following diﬀerential equation: dF/dt = 2 (cid:15)a (cid:20) − (cid:15) (cid:15) + F (cid:21) (1 − F ) (9)21f this equation is interpreted as a law of mass action, the coeﬃcient A = (cid:15)a would stand for the spreading velocity towards the unburned population(1 − F ) of news originating in the burned population F plus an “invisible”population given by the additional term G = − (cid:15) (cid:15) .In order to ﬁnd out the meaning of this “invisible” population (which isconstant along the spreading process), let us consider the case of P II → (cid:15) →

1. Then, employing (5), equation (9) tends to dF/dt = 1 /a (cid:20) aaP II bb/a ) + F (cid:21) (1 − F )that is, the “invisible” population is proportional to the seed population thatknows the rumor at t = 0.

4. Discussion

For further clariﬁcation, we ﬁrst summarize the procedure developedabove to obtain the function F ( t ), which depends only on parameters P IP , P II and P USG . Next, we discuss our model and its capability to make pre-dictions about the phenomenon of spreading rumors.We outline below the steps necessary to adjust the experimental data andﬁnd the function F ( t ). First step:

A partition of the parameter space is deﬁned. For each of the500 points in the partition, numerical simulations are performed andthe results obtained are analyzed as indicated below.

Second step:

Coeﬃcients a and (cid:15) are computed using (3). C is estimatedusing (4), so that the initial conditions are fulﬁlled.22 hird step: We study the dependence of these coeﬃcients on the parame-ters P IP and P II (for P USG ﬁxed), using (5) and (7).

Fourth step:

Using the data set that was ﬁtted in the ﬁrst steps, we deducea general expression of the function F ( t ) and also analyze its depen-dence on P USG (see (8)).In section 3.2, the distribution of burned individuals, F ( t ), was deducedas a function dependent only on the parameters inherent to the network, P IP , P II and P USG . This function becomes a fundamental tool for predictinghow long fake news will take to spread and cause problems or for estimatingthe time necessary for messages, as for instances a marketing campaign orspeciﬁc information, to be disseminated among the target audience.

One of the main results of this study is that the function F ( t ) can be usedto estimate the number of burned individuals at any time. An immediateconsequence, no less important, is that the time necessary for a rumor toreach a given fraction of the population can also be calculated from thisfunction. In summary, and more formally, the number of iterations t X (or n X , for discrete time in our model) that are necessary for a rumor to reacha fraction of the population X , assuming that the message begins to spreadat time t = 0, can be calculated using (3). t X = F − ( X ) = b + a (cid:15) log (cid:32) (cid:15) − (cid:15) X (1 − X ) C (cid:33) (10)Therefore, the equation X = F ( t X ) provides an implicit relation between theparameters inherent in the network, P II , P IP and P USG , and the variables23hat deﬁne the necessary time t X and the fraction of the population X . Asa result, we can estimate the size of the seed (that is, the value of P II ) suchthat a given fraction X of the population is reached at time t X ﬁxed, in anetwork characterized by P IP and P USG . In particular, the eﬀect of the seed( P II ) and the size of the uncritical senders group ( P USG ) can be studied toobtain the result sought in a given population.

The function that models the propagation of a rumor in a network de-pends, obviously, on the parameters associated with that particular rumor.Our model can be used to estimate these parameters from empirical obser-vations on how the rumor is spread. In fact, only four observations of thenumber of individuals burned at four diﬀerent times, since the beginning ofthe propagation of a rumor, are suﬃcient to determine the coeﬃcients thatﬁt the function F ( t ). Then, we can use this adjusted expression to calculatethe number of individuals burned at any time in the future.As it is shown in ﬁgure 6, the ﬁtted function (blue line) accurately followsthat obtained by numerically ﬁtting the data (green dots). These approxima-tions work for diﬀerent values of P II . It should be noted that the adjustmentof the coeﬃcients is done for a burned fraction of the population of less than20%, so the function thus ﬁtted can be used to predict when a catastrophewill occur (i.e., a large-scale spreading of an idea). Last but not least, theadjusted coeﬃcients are obtained not only from small values of individualsburned in the population but, and this is the most important, from valuesthat remain almost unchanged over time, see ﬁgure 6.24 USG = 0% P USG = 3% P USG = 7% F ( n ) n F ( n ) n F ( n ) n Figure 6: Approximations of ( t, F ( t )) based on the evolution of the early times. Fromleft to right, the values of P USG are 0%, 3% and 7%. Fixed parameter: P II = 5% and P IP = 1%. The dotted line (green) corresponds to numerical simulation data ( n, f ( n )).The four points used to approximate the coeﬃcients of F ( t ) are highlighted in red. Knowing the number of burned individuals as a function of spreadingtime allows us to infer the very structure of the network in which the newsis being disseminated. Once the propagation of a rumor for a given topic isknown, its current evolution f ( n ) is used to estimate the coeﬃcients a , (cid:15) and C in (3), and from them the parameters of the network, P II , P IP y P USG ,can be calculated.From equations (5) and (8), the value of the parameter P USG is calculated.Next, when entering in the function F ( t ) the initial value of P II , the estimatedvalue P USG and the number of individuals burned X at time t X , we obtainthe value of parameter P IP .Note that the calculation of P II and P USG has to be performed for eachtype of news, because each topic will have its own values for the parameters P II and P USG . Once these parameters are determined, the function F ( t ) canbe used to predict the propagation of similar topic news, and use that pre-diction to design countermeasures, either to avoid further propagation, e.g.,25rom a political point of view, or to improve the dissemination of news, e.g.,with commercial or public information, such as the annual ﬂu vaccinationcampaign.

5. Conclusions

As far as we know, we present here a new approach to study the crucialphenomena of rumor propagation through WhatsApp. It is noteworthy thatthe results obtained in this paper, as well as our algorithms and main tech-niques used, can be naturally extended to any other type of similar instantmessaging application.One of our main contributions is to provide a manageable function of timethat describes the time evolution of rumor spreading taking into account onlyperson-to-person relationships and contact groups. To be more precise, ﬁxedthe initial and propagation probabilities for each individual ( P II and P IP ),the function in (3) deﬁnes the fraction of the population who knows the rumorat diﬀerent times in terms of the parameters that characterize the structureof the network and the dynamics of propagation. In fact, the values of theseparameters were estimated from the data obtained by numerical simulationsfor diﬀerent scenarios depending on the size of the uncritical senders group( P USG ).Conversely, we show that, given the temporal evolution of a rumor by afunction as in (3), it can be predicted how the message propagates throughoutthe network and how much time it takes to burn a fraction of the population.As a set of few parameters ( P II , P IP and P USG ) typiﬁes the structure of thenetwork, how a rumor spreads in any social network can be simulated simply26y replacing diﬀerent values of these parameters in the analytical expressionswe provide in this paper. Therefore, this information can be used to assesshow long it will take a rumor to become dangerous and, consequently, to makedecisions accordingly during that time in order to avoid the damages causedby the disinformation. Moreover, the number of individuals that should beburnt to ensure that the rumor reaches a ﬁxed fraction of the population ina given time can be calculated.Finally, we also study the impact of the so-called uncritical senders group,i.e. a group of individuals who automatically forward a message as soon asthey receive it. For all we know, this is a novel idea that allows us to simulatethe behavior of groups of highly polarized humans that disseminate fake newswith the vile purpose of inﬂuencing and causing major changes in society.From our model, we can detect and estimate the size of a group of uncriticalsenders in a social network by analyzing in particular how the presence ofthis group changes the temporal evolution in the propagation of a rumor.In summary, we present a model that shows how some few but key pa-rameters inﬂuence the spread of a rumor and determine the speed with whicha rumor may reach a large part of the population. Furthermore, we studyhow this rumor propagation may be manipulated both through information(or disinformation campaigns) and a group of uncritical senders that activelydisseminate some types of news to the entire population.

References P USG

Values Error Values Error0% − . ± . . ± .

16 2.18%3% − . ± . . ± .

09 1.23%5% − . ± . . ± .

09 1.12%7% − . ± .

002 1.12% 8 . ± .

11 1.31%10% − . ± .

006 1.93% 9 . ± .

19 2.02%cc ee P USG

Values Error Values Error0% 0 . ± . . ± . . ± . . ± . . ± . . ± .

002 0.26%7% 0 . ± . . ± .

003 0.30%10% 0 . ± . . ± .

004 0.43%gg P USG