Why Do Cascade Sizes Follow a Power-Law?
Karol W?grzycki, Piotr Sankowski, Andrzej Pacuk, Piotr Wygocki
WWhy Do Cascade Sizes Follow a Power-Law?
Karol W˛egrzycki
Institute of Informatics,University of Warsaw, Poland [email protected] Piotr Sankowski
Institute of Informatics,University of Warsaw, Poland [email protected] Pacuk
Institute of Informatics,University of Warsaw, Poland [email protected] Piotr Wygocki
Institute of Informatics,University of Warsaw, Poland [email protected]
ABSTRACT
We introduce random directed acyclic graph and use it tomodel the information diffusion network. Subsequently, weanalyze the cascade generation model (CGM) introduced byLeskovec et al. [19]. Until now only empirical studies ofthis model were done. In this paper, we present the firsttheoretical proof that the sizes of cascades generated by theCGM follow the power-law distribution, which is consistentwith multiple empirical analysis of the large social networks.We compared the assumptions of our model with the Twittersocial network and tested the goodness of approximation.
Keywords
Social networks; Information Diffusion; Modelling and Val-idation; Twitter
1. INTRODUCTION
Each day billions of instant messages, comments, articles,blog posts, emails, tweets and other various mediums of com-munication are exchanged in the reciprocal, social relations.The study of the information propagation through networkis more and more demanded. Such models of propagationare used to minimize transmission costs, enhance the secu-rity and prevent information leaks or predict a propagationof malicious software among the users [13].When considering state-of-the-art models of informationdiffusion, the underlying network structure of a transmis-sion is constructed based on the known connections (e.g.,the graph of followers in the Twitter network). Here, wehave discovered that the graph of information disseminationhas noteworthy features, unexploited in the previous works.It has been well known that more active individuals in thenetwork have more acquaintances [27]. We have conductedexperiments confirming those observations in the informa-tion diffusion network and showed its underlying structure.c (cid:13)
WWW 2017,
April 3–7, 2017, Perth, Australia.ACM 978-1-4503-4913-0/17/04.http://dx.doi.org/10.1145/3038912.3052565.
Our model of the information diffusion network explainspower-law (or Pareto) distribution of the number of informednodes (cascade size). This is a major improvement over thestate-of-the-art models, which give predictions inconsistentwith the real data [8]. Random power-law graphs may suf-ficiently describe the follower-followee relation in the socialnetwork [5], but these graphs may not necessarily character-ize the medium of an information propagation. The resultsof our study will allow researchers to enhance their mod-els of the information transmission and will enable them todevelop a framework to validate these models.
Up to the best of our knowledge this work presents the firsttheoretical analysis of the cascade size distribution using cascade generation to model the spread of the informationin the social networks. The first experimental analysis hasbeen conducted by Leskovec et al. [19]. They proposed acascade generation model and simulated it on the dataset ofblog links. Since then, the research on the cascade sizes hasbecome a fruitful field (for more references see [23]).Through study on an epidemiology and a solid-state physics,different models such as SIR (susceptible-infectious-recover-ed) or SIS (susceptible-infectious-susceptible) has been em-ployed to model the dynamics of spread of the information.However, all of these models assume that everyone in thepopulation is in contact with everyone else [3], which is un-realistic in large social networks.The classical example of the modified spreading processincorporates the effect of a stifler [2].
Stiflers never spreadthe information even if they were exposed to it multipletimes. Nevertheless, stiflers can actively convert other spread-ers or susceptible nodes into stiflers . That complicated logicmay lead to the elimination of the epidemic threshold andhas been actively developed [6].In 2002 Watts [26] proposed exact solution of the globalcascade sizes on an arbitrary random graph. Notwithstand-ing, this process of the information propagation called thethreshold model is utterly different from cascade generation by Leskovec et al. [19] and does not fully explain the dy-namics of modern social networks like the Twitter.Iribarren et al. [13] have developed the similar model,where an integro-differential equations have been introduced.That equations describe the cascade sizes when the num- Epidemic threshold determines whether the global epi-demic occurs or the disease simply dies out. a r X i v : . [ c s . S I] F e b er of messages send by a node is described by the Harrisdiscrete distribution.However, the general solution to theirequations is not known and merely solutions for nontrivialcases (e.g., superexponential processes [13]) has been con-sidered. Our discoveries provide much simpler method andlucidly explain, that the underlying social graph is far morecomplex than just the graph of followers.Results similar to ours were also obtained in the study onthe bias of traceroute sampling. Achlioptas et al. [1] char-acterize the degree distribution of a BFS tree for a randomgraph with a given degree distribution. Their explanationwhy the degree distribution under traceroute sampling ex-hibits power-law motivated researchers to study bias in P2Psystems [25] and network discovery [4]. Their research alsoresulted in the development of the new tools in the socialnetworks sampling [16].In the seminal paper of Leskovec et al. [18], the cascadesize distribution in the network of recommendation has beenanalyzed. Leskovec et al. [18] showed that the product pur-chases follow a “long tail” distribution, where a significantfraction of sold items were rarely sold items. They fit thedata to the power-law distribution and discovered, that theparameters may differ for distinct networks (remarkably, thepower-law exponent was close to −
2. MODELING INFORMATION CASCADES
Intuitively, information cascades are generated by the fol-lowing process: one individual passes the information to allits acquaintances. Then in each round newly informed nodesrandomly decide to pass it to their acquaintances. This pro-cess continues until no new individuals are informed. Thegraph generated by the spread of the information is calledthe cascade.The cascade generation model (CGM) established by [19]introduces a single parameter α that measures how infec-tious a passed information is. More precisely, α is the prob-ability that the information will be passed to the acquain-tance.According to Leskovec et al. [19], the cascade is generatedby the following:1. Uniformly at random pick a starting point of the cas-cade and add it to the set of newly informed nodes.2. Every newly informed node independently with theprobability α informs their direct neighbors. 3. Let newly informed be the set of nodes that has beeninformed for the first time in step 2 and add them tothe generated cascade.4. Repeat steps 2 and 3 until newly informed set is empty.In this model we assume that all nodes have an identicalimpact ( α = const) on their neighbors and all generated cas-cades are trees, since we pick a single, initial node. It is nota major problem, since the most of cascades are trees [19]. cascade size10 −7 −6 −5 −4 −3 −2 −1 p r o b a b ili t y Figure 1: The log-log plot of the cascade size distri-bution on the Twitter dataset. It follows a power-law with an exponent − . . cascade size10 −6 −5 −4 −3 −2 −1 p r o b a b ili t y Figure 2: The log-log plot of the cascade size dis-tribution predicted by the simulation of CGM usingthe Twitter followers graph. Note the phase transi-tion, absent in real data.
Recently, information diffusion models have been evalu-ated on large social networks and it has been observed [24,8] that the actual cascade size distribution is inconsistentwith the distribution predicted by the model. First, thesimulation registered an obvious phase transition (the cas-cades are either extremely large or are smaller than 100) (seeFigure 2). No such gap has been observed in the real datasee Figure 1). Second, the probability of the large cascadeis intolerably high in simulations. There has been many attempts to readjust the cascadegeneration model [11, 8, 17], but as we have discussed above,those attempts oversimplify the process or incorrectly de-scribe the distribution of cascades sizes. We introduce novelobservations concerning the underlying social network andbased on them we introduce a theoretical model of informa-tion propagation.
The underlying network of rumor spreading is unknown.Even if we would have the network of all social contacts,the rumor could propagate through the mass media with arandom interaction or even evolve in time. Because of that,when analyzing the social media we focus on the networkof information propagation. The same technique has beenused by Leskovec et al. [19], but their network was gener-ated by links in blogs. To observe a non-trivial structure ofthe social network we considered actions generated by reply-ing to messages. Such replies in the Twitter microbloggingnetwork are called retweets . We analyzed a set of over 500millions tweets and the retweets from a 10% sample of alltweets from May 19 to May 30 2013. Each retweet containsidentifiers of the cited and replying users. Based on that,we generate a directed graph of the information transmis-sion (same as [27, 8]). We use data published in [20, 22] andpublish our code and experimental results on [21].It is well known that the degree distribution of the gen-erated graph follows a power-law [5]. However, this charac-terization does not necessarily describe the network of theinformation transmission. The intuition is that the nodeswith greater degree are more active. Hence, when informa-tion spreads through the underlying network the distributionof spreading nodes should prefer nodes with higher degree,in consequence inflating the probability for these nodes.Moreover, we have observed the hierarchical structure ofthe graph of retweets. The probability that a popular blog-ger replies to the message of an unpopular one is extremelysmall.To confirm our intuition, we have determined the distribu-tion of neighboring degrees for all nodes with a given degree.As shown on Figure 3 each degree has a distinct distributionof neighbors’ degrees (implementation is available on [21]).It means that it is more likely that a node is followed bysome popular nodes when node itself is popular. Moreover,there is a pattern: the probability decreases with degree (forfollowers with degree greater than the followed node). Basedon this observation we will model the aforementioned distri-bution as an approximated step function. This observationis consistent with the state-of-the-art analysis [19] and themost of cascades are “tree-like” or “stars”. According to ourknowledge no further research has been conducted for ex-amining distributions of neighbors degrees for node with agiven degree (only cumulative degree for every cascade hasbeen studied [19]).Undoubtedly, the process of information transmission insocial networks is far more complex to be modeled by sim-ulations just on the random power-law graphs. Because ofthe hierarchical structure and the relation of activity with According to [8] the probability that the cascade will havea size greater than 2 500 is 0 . . p r o b a b ili t y followed degree Figure 3: The degree distribution of the followersaggregated by the followed degree. Note the inflec-tion point increases with a followed degree. node degree effects, we propose the model where the infor-mation is spread only to the nodes with lower degree withapproximately uniform distribution.
One of the basic methods for generating random graphshas been introduced in 1960 by Erd˝os and R´enyi [9]. In anutshell: for a given set of vertices, all edges have the sameprobability of being present or absent in a graph. This modelof the graph is not suited for modeling the social networksbecause its degree distribution does not follow a power-law.The distribution of degrees for Erd˝os-R´enyi model is bino-mial.
Figure 4: A diffusion network generated by the ran-dom DAG algorithm (see Algorithm 1). his model is used in the theoretical research for modelinginteractions between networks and propagation of catastro-phes [12]. Even though Erd˝os-R´enyi model does not charac-terize connections between nodes, we believe it can representthe process of information spreading. We will propose theintuitive variation of Erd˝os-R´enyi model for directed graphs.According to Leskovec et al. [19] the cascades very rarelyexpress cycles and can be modeled as a tree-like structure.Even though this graphs do not model relationships in thesocial network, the directed acyclic graphs (DAG) are ap-propriate structure of the information propagation in thesocial network.We introduce the procedure that generates the randomdirected acyclic graphs (DAG) and prove that propagatinginformation in CGM regime results in cascade sizes obeyingthe power-law.Let us denote by rdag ( n, p ) a random graph generated bythe RandomDAG( n, p ) (see Algorithm 1). Algorithm 1
Generation of a random DAG procedure
RandomDAG ( n, p ) G ← empty graph with vertices 1 , , . . . , n for i ← n − dofor j ← i + 1 to n do with probability p add directed edge ( j, i ) to G end forend forreturn G end procedure The final n -vertices graph is acyclic since all edges ( j, i )obey j > i (see Figure 5).Any DAG can be generated by the RandomDAG( n, p ) when p ∈ (0 , n to 1 in thetopological order, the graph G will consist only of edges ( j, i )that j > i . Finally, all edges obeying j > i can be presentin the graph with independent probability p .1 2 3 4 5 Figure 5: Sample DAG generated by procedureRandomDAG( , p ). Dotted edges are unchosen. Leskovec et al. [19] suggested that in the real informationdiffusion network, the small and simple graphs will occurmore often than the complex, non trivial DAGs. This isexactly the case in
RandomDAG( n, p ) . The distribution of in-degrees in the rdag ( n, p ) satisfies: P [indeg( v ) = k ] = 1 n n − (cid:88) i = k (cid:32) ik (cid:33) p k (1 − p ) i − k . (1)Similarly to the Erd˝os-R´enyi graph, the in-degree distri-bution of a given vertex i is binomial but with different pa-rameters for each node. As a consequence, it leads to astep-function like shape (see Figure 6). p r o b a b ili t y n=100, p=0.3n=200, p=0.2n=100, p=0.5n=200, p=0.4 Figure 6: The in-degree distribution of the rdag ( n, p ) graphs, according to Formula (1) . The in-degree distribution is almost uniform when the in-degree is lower than np . Naturally, this is merely a simpleapproximation of the true network of a probable informa-tion transmission. Still, based on observations in Section 2.1 rdag ( n, k ) is the more accurate model than the follower-followee graph. The random DAG, in contrast to the standard Erd˝os andR´enyi graph is a directed graph. It means that it can modelone-way communication and the distribution of degrees. Nextkey difference is the hierarchical structure (i.e., the node n will not follow the node with the lower label). These differ-ences enable the random DAG model to produce cascadeswith the power-law distribution of sizes.
3. ANALYSIS
In this section, we will formally analyze the introducedmodel. Subsequently, we will quantitatively describe a pro-cess of the information diffusion on a random DAG by de-termining the cascade size distribution.Recall p to be the probability of an edge in a rdag ( n, p ) and α to be the average infectiousness of an information. Thenset β = 1 − pα for simplicity. Hence, β is the probability thatthe informed user will not spread the information through agiven edge.Now we will determine the probability P n,k , that the cas-cade reaches k distinct vertices in a graph with n nodes,commencing from the vertex no. 1. Clearly P , = 1 and P n,k = 0 when k > n or k ≤
0. For a remaining case assumethe cascade has size k and consider two distinct states of the n -th node: • n was informed. Then, at least one of the other k − n -th node, so theprobability is (1 − β k − ) · P n − ,k − . • n was not informed. It can happen only when noneof the other k informed nodes passed the informationto n -th node. The probability of such event equals β k · P n − ,k .Hence, we obtain a formula when 1 ≤ k ≤ n : n,k = β k · P n − ,k + (1 − β k − ) · P n − ,k − , To determine the distribution of the cascade size, we as-sume that an information shall commence in any node withan equal probability. Because the process of propagatingthe information in rdag ( n, p ) starting from node 1 is identi-cal to propagating it from node i in rdag ( n + i − , p ), thedistribution is: P [ | Informed | = k ] = S n,k = 1 n n (cid:88) i =1 P i,k . Now, we have the exact equation for the cascade size dis-tribution. This equation does not have a simple form. How-ever, we can ask what will happen when the number of nodesin graph is large. Let us recall that two series x n , y n are asymptotically equivalent when: x n ∼ y n iff lim n →∞ x n y n = 1 . The cascade size distribution satisfies Theorem 1.
Theorem S n,k ∼ n (1 − β k ) Proof.
Let us denote (cid:101) S n,k = nS n,k . We need to provethat: A k := lim n →∞ (cid:101) S n,k = 11 − β k . (2)We will prove it by induction. For k = 1: (cid:101) S n, = n (cid:88) i =1 P i, = P , + n − (cid:88) i =1 P i +1 , = 1 + n − (cid:88) i =1 βP i, = 1 + β (cid:101) S n − , = 1 + β + β + . . . + β n − . Hence, (cid:101) S n, is the sum of the geometric series: (cid:101) S n, = 1 − β n − β → − β . For k > (cid:101) S n,k = n (cid:88) i =1 P i,k = n (cid:88) i =1 β k P i − ,k + n (cid:88) i =1 (1 − β k − ) P i − ,k − . So (cid:101) S n,k obeys the recursive formula: (cid:101) S n,k = β k (cid:101) S n − ,k + (1 − β k − ) (cid:101) S n − ,k − . (3)Technical induction shows, that the (cid:101) S n,k is bounded andincreasing in respect to n , so A k = lim n →∞ (cid:101) S n,k exists.Hence, we can take a limit on both sides of Equation 3 andobtain: A k = β k A k + (1 − β k − ) A k − ,A k = 1 − β k − − β k A k − . (4) Finally, by unwinding the recursive Formula (4), for each k > A = − β ): A k = 1 − β k − − β k − β k − − β k − · · · − β = 11 − β k . Hence, we have proved an asymptotic Formula (2) of thecascade size distribution.
Recall, that β = 1 − pα = 1 − (cid:15) . Because p and α areextremely small, β is close to 1. Taking the Laurent seriesof our function we get:11 − (1 − (cid:15) ) k = 1 k(cid:15) + k − k + O ( (cid:15) ) . The social networks have an extremely large number ofnodes (e.g., the Twitter network has about 300 millions dis-tinct users [5]). On the other hand, new information reachesonly few nodes (the cascade size distribution is believed tobe a power-law for k smaller than 10 000) [8, 19]. Then, be-cause the element k − kn is insignificant when n is that large,for k (cid:28) pα (cid:28) n we get: S n,k ∼ n (1 − (1 − pα ) k ) = 1 knpα + k − kn + O ( pαn )Hence, the distribution of cascade size: P [ | Informed | = k ] ≈ npα k − + const (5)in the first-order perturbation follows the power-law. Dueto the low number of the large cascades, the distribution ofsizes is unknown for k close to pα . In that case, one shoulduse an exact formula. On the Figure 7 we have presented a comparison betweenapproximation and exact formula for the distribution of cas-cade sizes. For a relatively small cascade size k the slope ofa distribution matches ideally.How large cascades can we model using the aforemen-tioned assumptions? The number of nodes n in the Twitternetwork is approximately 300 millions. In [27] the averageinfectiousness α of the information on Twitter is said to beof order of 0 .
01. According to Leskovec et al. [19], the num-ber of edges in a cascade is proportional to n . . So theparameter p should be approximately p ∝ n . n , since thenumber of possible edges is n . The largest rumor in ourdataset has roughly 70 000 informed nodes, hence the lowerbound for parameter p is of order of (7 · ) . (7 · ) ≈ · − .Still, αp ≈ · (cid:28) · ≈ n . So, for the Twitternetwork we can model the cascades with sizes k (cid:28) · .Remarkably, it is enough, since approximately 10 − of Twit-ter rumors have the size greater than 10 . K-S test comparing the power-law distribution and thereal cascade size distribution [22] is 0 . . cascade size10 −6 −5 −4 −3 −2 −1 p r o b a b ili t y S n,k1n(1−β k ) powerlaw Figure 7: The log-log plot of an exact formula S n,k ,asymptotic bound n (1 − β k ) and the power-law distri-bution with exponent − . One would state that cascade generation model is counter-intuitive for microblogging services such as the Twitter. Infact, it is more intuitive that every follower of the spreadereventually will be informed. So every follower, after beinginformed for the first time ought to make exactly one deci-sion (with probability α ): whether to pass the informationto all of its acquaintances simultaneously (previously the in-formation was passed to each of its followers independentlyand each follower may had multiple opportunities to becomea spreader). In such a case, the Formula (5) is exactly thesame (for further proof see Appendix A).
4. CONCLUSION AND FUTURE WORK
The graph of the information diffusion is utterly differentfrom the global network of social connections. In contrast tomultiple previous approaches, we model the cascade of in-formation propagation as the random directed acyclic graph and we show that in the scheme of CGM the distribution ofinformation popularity is asymptotically equivalent to: P [ | Informed | = k ] ∼ n (1 − β k ) , where n is the number of nodes and β is a parameter de-pendent on both infectiousness of average information anddensity of the cascade. We show that for a sufficientlybig number of nodes this distribution follows the power-law, what is consistent with real world observations. Wehope that an introduction of this framework will inspire thetheoretical affords to model and describe the informationdiffusion in the large social networks. The analysis of the rdag ( n, p ) graph showed that the cascade size distributionfollows the power-law P [ | Informed | = k ] ∝ k γ for γ = − −
1. Leskovec et al. [18] suggestthat the information propagation in the network of recom-mendations may produce cascades with desired exponent.However, in the real data, the parameter γ can be com-pletely different (e.g., see Figure 1 of the Twitter cascadesize distribution with power-law exponent γ = − . rdag ( n, p ) model to those cases, one can customizea distribution of random cascades (we assumed a fairly sim-ple method to generate them). We believe that adaptationof cascade shape distribution to the experimental data willreadjust the γ parameter (for a start one can use the shapedistribution provided by Leskovec et al. [19]). Further en-hancements might also be achieved by adapting the infor-mation diffusion scheme to a particular society (similarly toAppendix A). To encourage other researchers to apply ourmodel in practice, we publish the code used to generate allfigures and the results on [21].Still, we need more experiments and research to answerwhat type of social networks the random directed acyclicgraphs model and how richer set of features (e.g., spatio-temporal features) influences the cascade size distribution.
5. ACKNOWLEDGMENTS
This work was partially supported by NCN grant UMO-2014/13/B/ST6/01811, ERC project PAAl-POC 680912, ERCproject TOTAL 677651, FET IP project MULTIPLEX 317532and polish funds for years 2013-2016 for co-financed inter-national projects.
6. REFERENCES [1] D. Achlioptas, A. Clauset, D. Kempe, and C. Moore.On the bias of traceroute sampling: Or, power-lawdegree distributions in regular graphs.
J. ACM ,56(4):21:1–21:28, 2009.[2] M. Barthelemy, A. Barrat, and A. Vespignani. Therole of geography and traffic in the structure ofcomplex networks.
Advances in Complex Systems ,10(1):5–28, 2007.[3] N. Bayley. The mathematical theory of epidemics.
Griffin, London , 1975.[4] Z. Beerliova, F. Eberhard, T. Erlebach, A. Hall,M. Hoffmann, M. Mihal’ak, and L. S. Ram. Networkdiscovery and verification.
IEEE Journal on selectedareas in communications , 24(12):2168–2181, 2006.[5] P. Brach, M. Cygan, J. Lacki, and P. Sankowski.Algorithmic complexity of power law networks. InR. Krauthgamer, editor,
Proceedings of theTwenty-Seventh Annual ACM-SIAM Symposium onDiscrete Algorithms, SODA 2016, Arlington, VA,USA, January 10-12, 2016 , pages 1306–1325. SIAM,2016.[6] P. Brach, A. Epasto, A. Panconesi, and P. Sankowski.Spreading rumours without the network. In A. Sala,A. Goel, and K. P. Gummadi, editors,
Proceedings ofthe second ACM conference on Online social networks,COSN 2014, Dublin, Ireland, October 1-2, 2014 , pages107–118. ACM, 2014.[7] J. Cheng, L. A. Adamic, J. M. Kleinberg, andJ. Leskovec. Do cascades recur? In J. Bourdeau,J. Hendler, R. Nkambou, I. Horrocks, and B. Y. Zhao,editors,
Proceedings of the 25th InternationalConference on World Wide Web, WWW 2016,Montreal, Canada, April 11 - 15, 2016 , pages 671–681.ACM, 2016.[8] B. Cui, S. J. Yang, and C. Homan. Non-independentcascade formation: Temporal and spatial effects. InJ. Li, X. S. Wang, M. N. Garofalakis, I. Soboroff,T. Suel, and M. Wang, editors,
Proceedings of the 23rdCM International Conference on Conference onInformation and Knowledge Management, CIKM2014, Shanghai, China, November 3-7, 2014 , pages1923–1926. ACM, 2014.[9] P. Erd˝os and A. R´enyi. On the evolution of randomgraphs. In
PUBLICATION OF THEMATHEMATICAL INSTITUTE OF THEHUNGARIAN ACADEMY OF SCIENCES , pages17–61, 1960.[10] A. Gaba, S. Voulgaris, K. Iwanicki, and M. van Steen.Revisiting gossip-based ad-hoc routing. In
WiMAN 2012: Proceedings of the 6th InternationalWorkshop on Wireless Mesh and Ad Hoc Networks ,Munich, Germany, July 2012. IEEE.[11] R. Ghosh and B. A. Huberman. Ultrametricity ofinformation cascades.
CoRR , abs/1310.2619, 2013.[12] S. Havlin, N. A. M. Araujo, S. V. Buldyrev, C. S.Dias, R. Parshani, G. Paul, and H. E. Stanley.Catastrophic cascade of failures in interdependentnetworks.
CoRR , abs/1012.0206, 2010.[13] J. L. Iribarren and E. Moro. Branching dynamics ofviral information spreading.
CoRR , abs/1110.1884,2011.[14] D. Kempe, J. M. Kleinberg, and ´E. Tardos.Maximizing the spread of influence through a socialnetwork. In L. Getoor, T. E. Senator, P. M.Domingos, and C. Faloutsos, editors,
Proceedings ofthe Ninth ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Washington,DC, USA, August 24 - 27, 2003 , pages 137–146.ACM, 2003.[15] S. Krishnan, P. Butler, R. Tandon, J. Leskovec, andN. Ramakrishnan. Seeing the forest for the trees: newapproaches to forecasting cascades. In W. Nejdl,W. Hall, P. Parigi, and S. Staab, editors,
Proceedingsof the 8th ACM Conference on Web Science, WebSci2016, Hannover, Germany, May 22-25, 2016 , pages249–258. ACM, 2016.[16] M. Kurant, A. Markopoulou, and P. Thiran. On thebias of bfs (breadth first search). In
TeletrafficCongress (ITC), 2010 22nd International , pages 1–8.IEEE, 2010.[17] K. Lerman and R. Ghosh. Information contagion: Anempirical study of the spread of news on digg andtwitter social networks. In W. W. Cohen andS. Gosling, editors,
Proceedings of the FourthInternational Conference on Weblogs and SocialMedia, ICWSM 2010, Washington, DC, USA, May23-26, 2010 . The AAAI Press, 2010.[18] J. Leskovec, L. A. Adamic, and B. A. Huberman. Thedynamics of viral marketing.
TWEB , 1(1), 2007.[19] J. Leskovec, M. McGlohon, C. Faloutsos, N. S. Glance,and M. Hurst. Patterns of cascading behavior in largeblog graphs. In
Proceedings of the Seventh SIAMInternational Conference on Data Mining, April26-28, 2007, Minneapolis, Minnesota, USA , pages551–556. SIAM, 2007.[20] A. Pacuk, P. Sankowski, K. Wegrzycki, andP. Wygocki. There is something beyond the twitternetwork. In J. Blustein, E. Herder, J. Rubart, andH. Ashman, editors,
Proceedings of the 27th ACMConference on Hypertext and Social Media, HT 2016, Halifax, NS, Canada, July 10-13, 2016 , pages279–284. ACM, 2016.[21] A. Pacuk, P. Sankowski, K. W (cid:44) egrzycki, andP. Wygocki. Python code of experiments and results.http://social-networks.mimuw.edu.pl/ (cid:44) egrzycki, andP. Wygocki. Twitter anonymised graph.http://social-networks.mimuw.edu.pl/
Diffusion of innovations . Free Press,2010.[24] G. V. Steeg, R. Ghosh, and K. Lerman. What stopssocial epidemics? In L. A. Adamic, R. A. Baeza-Yates,and S. Counts, editors,
Proceedings of the FifthInternational Conference on Weblogs and SocialMedia, Barcelona, Catalonia, Spain, July 17-21, 2011 .The AAAI Press, 2011.[25] D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, andW. Willinger. On unbiased sampling for unstructuredpeer-to-peer networks.
IEEE/ACM Transactions onNetworking (TON) , 17(2):377–390, 2009.[26] D. J. Watts. A simple model of global cascades onrandom networks. In
Proceedings of the NationalAcademy of Sciences of the United Statesof America
Proceedings ofthe 21th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, Sydney,NSW, Australia, August 10-13, 2015 , pages1513–1522. ACM, 2015.
APPENDIXA. DEPENDANT PASSING OF THE INFOR-MATION
In this model we count spreaders who pass the informationto all of theirs followers with probability α . These followerswill receive the information and might become new spread-ers.For clarity, we will use the notation from Section 3 andthe proof of Theorem 1.Analogously to previous analysis, we obtain the recursiveformula: P , = α,P n,k = 0 , when k > n,P n,k = (cid:0) − (1 − β k ) α (cid:1) · P n − ,k + α (1 − β k − ) · P n − ,k − . Rest of the proof is almost identical to the proof of The-orem 1.For k = 1, we have:lim n →∞ (cid:101) S n, = lim n →∞ − (cid:0) − (1 − β k ) α (cid:1) n − β = 11 − β . For k >
1, when n → ∞ :lim n →∞ (cid:101) S n,k = A k = (cid:0) − (1 − β k ) α (cid:1) A k ++ α (1 − β k − ) A k − . y subtracting the expression on both sides: A k − A k (1 − (1 − β k ) α ) = A k − α (1 − β k − ) . And after simplification we get: A k (1 − β k ) α = A k − (1 − β k − ) α Hence, we have obtained the same formula as in Theo-rem 1: A k = A k − − β k − β k − and finally obtain:lim n →∞ (cid:101) S n,k = 11 − β kk