Small But Slow World: How Network Topology and Burstiness Slow Down Spreading
M. Karsai, M. Kivelä, R. K. Pan, K. Kaski, J. Kertész, A.-L. Barabási, J. Saramäki
aa r X i v : . [ phy s i c s . s o c - ph ] A ug Small But Slow World: How Network Topology and Burstiness Slow Down Spreading
M. Karsai, M. Kivel¨a, R. K. Pan, K. Kaski, J. Kert´esz,
1, 2
A.-L. Barab´asi,
2, 3 and J. Saram¨aki BECS, School of Science and Technology, Aalto University, P.O. Box 12200, FI-00076 Institute of Physics and BME-HAS Cond. Mat. Group, BME, Budapest, Budafoki ´ut 8., H-1111 Center for Complex Networks Research, Northeastern University, Boston, MA 02115 (Dated: August 24, 2010)While communication networks show the small-world property of short paths, the spreading dy-namics in them turns out slow. Here, the time evolution of information propagation is followedthrough communication networks by using empirical data on contact sequences and the SI model.Introducing null models where event sequences are appropriately shuffled, we are able to distinguishbetween the contributions of different impeding effects. The slowing down of spreading is found tobe caused mainly by weight-topology correlations and the bursty activity patterns of individuals.
PACS numbers: 89.75.-k,05.45.Tp
Most complex physical, biological and social networksshow the small-world property, where the average short-est path length is strikingly short when compared to thenetwork size [1]. This means that there is at least oneshort path between any two nodes, which should give riseto rapid transmission of influence. However, dynamicphenomena on networks [2], such as spreading of pan-demics, electronic viruses, and information, follow theirown pathways, which are not necessarily topologically ef-ficient [3]. Spreading on real small-world networks turnsout to be surprisingly slow, e.g., new infections by a com-puter virus are reported years after its emergence or theintroduction of an anti-virus [4]. Here we aim at resolvingthis puzzle. For issues such as strategies and timing ofvaccinations, improvement of information diffusion, andthe slow decay of prevalence of computer viruses, it is cru-cial to understand the role of the underlying network andtemporal activity patterns in the dynamics of spreading.The dynamics of spreading is commonly studied withSI, SIR, or SIS models [5] on static lattices or in meanfield, where the dynamics is defined by state changesof individuals between (S)usceptible, (I)nfectious, and(R)ecovered. These models lead to a rapid, exponentialgrowth of prevalence at early stages of spreading, whilethe dynamics at later stages depend on the model andlattice. For the SI process, the prevalence grows until thewhole system reachable from initial conditions is infected,with exponential slowing down towards the end. For theSIR process, competing effects set in and the spreadingmay remain local or percolate through the system whilethe SIS process has more complex dynamics.While these results capture some of the qualitative fea-tures of real-world processes, the heterogeneity of the sys-tems limits their applicability. First, the interactions ofreal-world systems span networks by broad distributionsof node connections and mesoscopic features in the formof communities with dense internal and sparse externalconnectivity. Second, interaction intensities vary and areclosely coupled to network topology. Third, the daily cy-cle and bursty character of interaction events give rise to important temporal inhomogeneities.Some aspects of these features have already been stud-ied. For static networks, it is known that spatial struc-ture has an effect on epidemics (see, e.g., [6, 7]), andcommunity structure slows down information diffusiondue to trapping in dense regions [8–10]. There is an in-timate relation between inhomogeneous link weights andnetwork topology in social and communication networks[11, 12]: Links within communities are strong, while linksbetween them are weak. This Granovetter-type structureenhances the trapping effect of the communities, leadingto additional slowing down of spreading [12].The bursty nature of human interactions has receivedparticular interest and it has turned out that the cor-responding activity patterns are usually non-Poissonian,often power-law correlated (see [13]). The effect of burstydynamics on spreading has been approached using em-pirical data together with approximate analytical mod-els [14, 15]. In Ref. [14], computer worm spreading wasstudied using email logs and the SI model, and it wasfound that the non-Poissonian inter-event time distri-bution leads to slow spreading in the late stages of theprocess. Slow spreading was also observed in Ref. [15],where an Internet viral marketing experiment was car-ried out and modeled as a branching process in the non-percolating regime. It was also argued that on the con-trary, in the percolating regime, broad inter-event timedistributions should give rise to faster spreading.In this Letter, we study the problem of spreading dy-namics in its full complexity, using time-stamped eventdata on human communication networks and the SImodel. We apply proper null models on the event se-quences and show that spreading is slowed down due tosimultaneous effects of structural and temporal correla-tions.For the event sequences, we have used the followingdata: a) Mobile phone data from a European operator(national market share ∼ ∼
325 million time-stamped voice call records over a period of 120 days. Wehave only retained links with bidirectional calls withinthe largest connected component (LCC) of the aggre-gated call network (MCN), yielding N = 4 . × nodes, L = 9 × links, and 306 × calls. We definelink weights as the number of calls between two users.The network is sparse (average degree h k i = 3 .
96) show-ing small world property with an average shortest pathlength of h l i = 12 .
31; b) Mobile call data from the Real-ity Mining project [18] (RM), where the LCC consists of59 users and 93 edges with 2293 calls over ∼ t , if there is an event be-tween them. For the events, we use records of the timesand participants of calls, and the times and addresses ofemails. Calls are one-to-one communication and enable bidirectional exchange of information, while emails mayhave multiple addresses and the information flow is di-rected . Hence for calls, if either participant is infectedhe/she infects the susceptible one, whereas for emails,transmission is from the sender to the recipient(s). Weinitiate simulations by infecting a randomly chosen nodeat a randomly chosen event with the spreading quantity(information, rumor, or virus) and set all other nodessusceptible. Then the spreading dynamics is simulatedby using temporally periodic boundary conditions (i.e.,repeating the event sequence) until the set of reachablenodes is exhausted. We record the prevalence, i.e., thefraction of infected nodes h I ( t ) i /N as a function of timeaveraging over 10 initial conditions and the time to fullprevalence t f . For the email network, we start the spread-ing process from a node in IN or SCC and iterate theprocess until all nodes in SCC and OUT are infected.To gain insight into the effects of different correlations,we employ null models where the original event sequencesare randomized. These are defined so that in each nullmodel, some of the correlations are separately destroyed: EVENT SEQUENCE D C W B EOriginal
X X X X X
Equal-weight link-sequence shuffled
X X X X
Link-sequence shuffled
X X X
Time shuffled
X X X
Configuration model X TABLE I: Correlations retained in different null models. D:daily pattern, C: community structure, W: weight-topologycorrelations, B: bursty single-edge dynamics, E: event-eventcorrelations between edges. community structure (C), weight-topology correlations(W), bursty event dynamics on single links (B), andevent-event correlations between links (E). In addition,the overall event frequencies follow a daily pattern (D),with decreased night-time activity and some day-timepeaks (see inset in Fig. 3) The null models are as follows,with the letters indicating retained correlations (Table I):– DCWB (equal-weight link-sequence shuffled) : Wholesingle-link event sequences are randomly exchanged be-tween links having the same number of events. Tempo-ral correlations between links are destroyed. (For largeweights we did binning with 2-3 weight values.)– DCB (link-sequence shuffled) : Whole single-link eventsequences are randomly exchanged between randomlychosen links. Event-event and weight-topology correla-tions are destroyed.– DCW (time-shuffled) : Time stamps of the whole orig-inal event sequence are randomly reshuffled. Temporalcorrelations are destroyed.– D (configuration model) : The original aggregated net-work is rewired according to the configuration model,where the degree distribution of the nodes and connect-edness are maintained but the topology is uncorrelated.Then, original single-link event sequences are randomlyplaced on the links, and time shuffling as above is per-formed. All correlations except seasonalities like thedaily cycle are destroyed.Fig. 1 displays the results for the MCN. In all casesthe spreading is slow, with full prevalence times t f of theorder of several hundred days. It is clear that both topo-logical and temporal correlations slow down the spread-ing. It is the fastest when all correlations except the dailypatterns are destroyed (configuration model, D). Switch-ing on the community structure and associated weight-topology correlations (DCW) slows down the spreading DDCWDCB DCWBorig DDCWDCB origDCWB
FIG. 1: (color online) (Left) Fraction of infected nodes h I ( t ) /N i as a function of time for the original event sequence( ◦ ) and null models: equal-weight link-sequence shuffledDCWB ( ♦ ), link-sequence shuffled DCB ( △ ), time-shuffledDCW ( (cid:3) ) and configuration model D ( ▽ ). Inset: h I ( t ) /N i forthe early stages, illustrating differences in the times to reach h I ( t ) /N i = 20%. (Right) Distribution of full prevalence times P ( t f ) due to randomness in initial conditions. FIG. 2: (color online) Spreading dynamics in the Reality Min-ing (left) and email networks (right), for the original eventsequence ( ◦ ) and null models: DCW ( (cid:3) ) and DCWB ( ♦ ).In the email network, the spreading process is directed. Themaximum prevalence is limited to the total fraction of theSCC and the OUT component ( ∼ strongly, as expected because of the bottleneck caused byweak links between communities and the broad distribu-tion of link weights [12, 17] . However, comparing thiswith the DCB null model indicates that bursty single-edge dynamics (B) has an even stronger slowing-downeffect than weight-topology correlations (W). Finally, in-cluding all except event-event correlations (DCWB) givesrise to spreading dynamics very close to the originalevent sequence (DCWBE). Here, for early times, DCWBspreading is slightly slower than the original one. The leftpanel inset shows quantitative differences in the times to20% prevalence. It also indicates that temporal correla-tions (E) between adjacent edges have initially a minoraccelerating effect. This can be attributed to the easyreachability of the members within the community wherethe spreading begins. However, for long times, bottle-necks appear, and event-event correlations slow the pro-cess down. Note that the initial conditions have an effecton the duration of the process, reflected in the distribu-tions in the right panel of Fig. 1 (the SI process itselfis deterministic). However, the overall shape of the dy-namics and the effects of correlations are consistent forindividual runs too.Results for the Reality Mining mobile call network andfor the email logs are shown in Fig. 2, with the DCW andDCWB null models; the outcome is qualitatively similarwith that of MCN. However, there are certain differences.In the small and sparse RM network, successive calls tomany people within a short time period by a hub giverise to a steep prevalence rise. Such behavior is a one-offevent and the effect is destroyed in the null models. In theemail network, very high-degree hubs sending frequentemails give rise to rapid spreading once they are reached.This effect is conserved in the null models.The daily activity pattern, i.e. variation in overall com-munication frequency by the hour, is retained in everynull model that is based on randomizing the original FIG. 3: (color online) Spreading dynamics as obtained from aPoissonian event-generating model on the aggregated MCN,with daily pattern ( (cid:3) ) and without ( ▽ ). Link weights weretaken into account and the curve with the daily pattern iscomparable with the DCW null model. Inset: the averagedaily pattern as observed for the MCN event sequence withbinning by the hour. The continuous line is to guide the eye. event sequence. In [20], it was suggested that naturalperiodicities, such as the daily cycle, are responsible forthe fat-tailed waiting time distributions. In order to eval-uate the impact of the daily pattern on the spreadingspeed, we carried out simulations where the aggregatedMCN was used as the lattice. Events were generatedon its links by two Poisson processes that conserve linkweights: a homogeneous Poisson process, and a processwhose instantaneous rate follows the daily pattern as cal-culated from the call statistics on hourly basis (see insetin Fig. 3). The SI dynamics for both cases are shownin Fig. 3. The difference between the two curves is neg-ligible, demonstrating that the daily pattern has only aminor impact on the spreading speed. This, togetherwith the observation that temporal correlations do havea significant decelerating effect on spreading strongly in-dicates that there are important, non-Poissonian corre-lations in the system beside the daily type cycles.The non-Poissonian, bursty character of event se-quences is clearly demonstrated by the fat-tailed distribu-tion of single-link inter-event times for the MCN, as seenin Fig. 4. In order to exclude the possibility that thefat tail in the inter-event time distribution is only dueto the broad weight distribution as suggested in [20], wecalculated the distributions for binned weights and ob-tained a satisfactory scaling with the average inter-eventtime, similarly to [16]. We find that the distribution canbe fitted by a power law with an exponent 0 . ∼
20 seconds is found. This peak is dueto event correlations between links. The power law indi-cates the non-Poissonian, bursty character of the events.Both the characteristics vanish for the time-shuffled nullmodel DCW, and the inter-event time is well described
FIG. 4: (color online) Scaled inter-event time distributionsfor the MCN. Edges were log-binned by weight and for ev-ery second bin the inter-event time distribution of the eventsoccurring in the corresponding bin is shown, scaled by the av-erage inter-event time of that bin τ ∗ . Inset: scaled inter-eventtime distributions for the original ( ◦ ) and for the time-shuffledevents ( (cid:3) ). An exponential density distribution with averagevalue of 1 is shown as a light (yellow) line. by an exponential function (see inset of Fig. 4), i.e., theprocess is Poissonian.The effect of burstiness on the spreading speed canbe easily demonstrated with the following single-link cal-culation. Let us denote the average time for the infec-tion to spread through a link (the residual waiting time)by h τ R i , and assume that one of the nodes gets infectedat a uniformly chosen random time. Similarly to Irib-arren et al. [15] and Vazquez et al. , [14] we calculate h τ R i for a given inter-event time distribution P ( τ ). Forsimplicity, we consider how the burstiness introduced bya continuous power-law distribution of inter-event times P ( τ ) ∼ τ − α affects the average infection times whencompared to a Poisson process. If we fix the averageinter-event time (and thus the number of events for a longobservation period), the ratio of average infection timesis r = h τ R, powerlaw i / h τ R, poisson i = ( α − α − α − for α > r is decreasing with α , r < α > √ ≈ . r goes to infinity at α = 3. This indicates that theburstiness characterized by power law distributions withslow decay has a decelerating effect on spreading with re-spect to the Poisson process with the same mean. How-ever, if the decay is fast enough, i.e., the second momentof the power law distribution is smaller than that of thePoisson distribution, we see acceleration. This mean fieldtype of reasoning has its limitations. Nevertheless it illus-trates the mechanisms of slowing down because of bursts:the residual waiting time increases because the chance forlong waiting times after getting infected increases.In conclusion, we have studied the effects of differ-ent topological and temporal correlations on spreading in complex communication networks. Using time-stampedevent data and appropriately prepared null models wehave managed to quantitatively distinguish between dif-ferent contributions to the slowing down of spreading.We have shown that the main contributions are (i) thecommunity structure and its correlation with link weightsand (ii) the inhomogeneous and bursty activity patternson the links. Somewhat surprisingly, the daily patternand event correlations between links seem to play onlya minor role in the overall spreading speed. Finally, webelieve that our null models can be generally applied toinvestigate the effects of temporal and structural corre-lations on dynamic processes on networks. Acknowledgement
Financial support from EU’s7 th Framework Program’s FET-Open to ICTeCollectiveproject no. 238597 and by the Academy of Finland, theFinnish Center of Excellence program 2006-2011, projectno. 129670, as well as by OTKA K60456 and TEKES(FiDiPro) are gratefully acknowledged. [1] M. Newman, A.-L. Barab´asi and D. J. Watts
The Struc-ture and Dynamics of Networks (Princeton UP, 2006), M.Newman
Networks: An Introduction (Oxford UP, 2010)[2] A. Barrat, M. Barthel´emy and A. Vespignani
Dynamicalprocesses on complex networks (Cambridge UP, 2008).[3] P. Holme, Phys. Rev. E , 046119 (2005)[4] R. Pastor-Satorras and A. Vespignani Evolution andstructure of the Internet (Oxford UP, 2004)[5] R.M. Anderson and R.M. May,
Infectious Diseases of Hu-mans: Dynamics and Control (Oxford Science Publica-tions, 1992).[6] M.J. Keeling, Proc. R. Soc. Lond. B , 859 (1999)[7] K.T.D. Eames, Theor Popul Biol , 104 (2008)[8] R. Lambiotte, J.-C. Delvenne and M. Barahona,(arxiv.org/abs/0812.1770) (2008).[9] R. Toivonen et al. , Phys. Rev. E , 016109 (2009).[10] P.J. Mucha et al. , Science , 876 (2010).[11] M. Granovetter, Am. J. Sociol. , 1360 (1973).[12] J.-P. Onnela et al. , Proc. Natl. Acad. Sci. (USA) ,7332 (2007).[13] A.-L. Barab´asi, Bursts: The Hidden Pattern Behind Ev-erything We Do (Dutton Books, 2010).[14] A. Vazquez, B. R´acz, A. Luk´acs and A.-L. Barab´asi,Phys. Rev. Lett. , 158702 (2007).[15] J.L. Iribarren and E. Moro, Phys. Rev. Lett. , 038702(2009).[16] J. Candia et al. , J. Phys. A: Math. Theor. , 224015(2008)[17] J.-P. Onnela et al. , New J. Phys. , 179 (2007)[18] N. Eagle, A. Pentland, and D. Lazer, Proc. Natl. Acad.Sci. (USA) , 15274 (2009).[19] J. Eckmann, E. Moses, and D. Sergi, Proc. Natl. Acad.Sci. (USA) 101, 14333 (2004)[20] R. D. Malmgren et al. , Proc. Natl. Acad. Sci. (USA) 105,18153 (2008), R.D. Malmgren et.al. , Science325