Dynamics of cascades on burstiness-controlled temporal networks
Samuel Unicomb, Gerardo Iñiguez, James P. Gleeson, Márton Karsai
DDynamics of cascades on burstiness-controlled temporal networks
Samuel Unicomb, ∗ Gerardo I˜niguez,
2, 3, 4
James P. Gleeson, and M´arton Karsai
2, 1 Universit´e de Lyon, ENS de Lyon, INRIA, CNRS, UMR 5668, IXXI, 69364 Lyon, France Department of Network and Data Science, Central European University, A-1100 Vienna, Austria Department of Computer Science, Aalto University School of Science, FI-00076 Aalto, Finland IIMAS, Universidad Nacional Auton´oma de M´exico, 01000 Ciudad de M´exico, Mexico MACSI and Insight Centre for Data Analytics, University of Limerick, Limerick V94 T9PX, Ireland
Burstiness, the tendency of interaction events to be heterogeneously distributed in time, is criticalto information diffusion in physical and social systems. However, an analytical framework capturingthe effect of burstiness on generic dynamics is lacking. We develop a master equation formalism tostudy cascades on temporal networks with burstiness modelled by renewal processes. Supported bynumerical and data-driven simulations, we describe the interplay between heterogeneous temporalinteractions and models of threshold-driven and epidemic spreading. We find that increasing in-terevent time variance can both accelerate and decelerate spreading for threshold models, but canonly decelerate epidemic spreading. When accounting for the skewness of different interevent timedistributions, spreading times collapse onto a universal curve. Our framework uncovers a deep yetsubtle connection between generic diffusion mechanisms and underlying temporal network struc-tures that impacts on a broad class of networked phenomena, from spin interactions to epidemiccontagion and language dynamics.
Temporal networks provide a representation of real-world complex systems where interactions between com-ponents vary in time [1–3]. Although they were initiallymodelled as Poisson processes, where independent eventsare homogeneously distributed in time, real-world net-work interactions have been found to be heterogeneouslydistributed and to exhibit temporal correlations [4–6].In particular, interaction events in real systems concen-trate within short periods of intense activity followed bylong intervals of inactivity, an effect known as burstiness.Bursty dynamics appear in diverse physical phenomenaincluding earthquakes [7] and solar flares [8], biologicalprocesses like neuron firing [9], and even the dynamics ofhuman social interaction [5, 10].Burstiness in temporal interactions has profound im-plications for the diffusion of information over tempo-ral networks, as demonstrated in a growing number ofworks [11–17]. This is true in the case of epidemic pro-cesses, often referred to as simple contagion, where theprobability of infection of an uninfected node dependslinearly on the number of exposures, i.e., temporal in-teractions with infected neighbours in the network [18].Epidemic models successfully describe the spread of bi-ological disease [19], and have been shown to criticallydepend on burstiness and other patterns of temporal in-teractions [12, 20–23]. Epidemic spreading over tempo-ral networks appears to be slowed due to burstiness insome cases [11, 24–26], while accelerated in others [27].Threshold mechanisms provide another class of phenom-ena where bursty temporal networks play a crucial role.Threshold dynamics, also known as complex contagion,are used to model the spread of information where infec-tion requires the reinforced influence of at least a certainfraction of neighbours in the network [28]. Threshold ∗ [email protected] driven dynamics over static networks have been exten-sively studied both empirically [29] and theoretically [29–33], but analysis of their behaviour on temporal networkshas been limited to a small number of empirical stud-ies [34–37]. Here we propose an analytical framework tosystematically describe the relationship between the dif-fusion of information and bursty temporal interactions,thus providing the theoretical foundation necessary touncover the role of burstiness in generic diffusion pro-cesses, including simple and complex contagion modelsof physical, biological and social phenomena.We incorporate the most widely documented featuresof temporal interactions into a framework of binary statedynamics and benchmark its behaviour with standardmodels of threshold driven and epidemic spreading. Al-though stochastic bursty interactions are likely emergentphenomena [4, 38], their dynamics are well approximatedby renewal processes [39]. Temporal heterogeneity innetwork interactions can then be characterised by thevariability in interevent times τ (the time between con-secutive events on a given edge), parameterised by theinterevent time distribution ψ ( τ ), while other featuresof the temporal network are considered maximally ran-dom. Renewal processes represent the simplest model ofbursty, non-Markovian dynamics, and a departure fromthe memoryless assumption implicit in Poisson processes.Nevertheless, we are able to show that such a systemcan be accurately captured by a master equation formal-ism, which is essentially memoryless, implying the exis-tence of a purely Markovian system with almost identicalbehaviour. We show both analytically and numericallythat bursty temporal interactions give rise to a percola-tion transition in the connectivity of the temporal net-work, separating phases of slow and rapid dynamics forboth epidemic and threshold models of information diffu-sion. We find that diffusion dynamics are sensitive to thechoice of interevent time distribution, particularly in re- a r X i v : . [ phy s i c s . s o c - ph ] J u l gard to its skewness, and we demonstrate a data collapseacross distributions when controlling for this effect. Temporal network model.
To model a temporal net-work, we consider an undirected, unweighted static net-work of N nodes as the underlying structure, which actsas a skeleton on top of which temporal interactions takeplace. The degree of a node (how many neighbours it has)takes discrete values k = 0 , . . . , N − p k . Pairwise temporal interactions, or events ,occur independently at random on each static edge via arenewal process with interevent time distribution ψ ( τ ).Time is continuous and events are instantaneous, whileconsecutive interevent times are uncorrelated. We alsoassume that the renewal process is stationary (for fur-ther details see Methods and Supplementary Note 4).By using a static underlying network, we assume the timescales of edge formation and node addition or removal arefar longer and thus negligible relative to the time scaleof event dynamics over existing edges.In its simplest form, information diffusion is a binary-state process where each node occupies one of two mu-tually exclusive states, which we term uninfected and in-fected . The probability of a node changing state is afunction of the state of its neighbours, as well as thestrength of their interactions. Interaction strength, alsoreferred to as mutual influence , is a non-negative scalarthat we consider to be a function of the elapsed renewalprocess time series. We desire that the mean of the emer-gent distribution of interaction strengths be stationary,and invariant to the underlying burstiness of the system.This is achieved when the contribution of a single eventto interaction strength (i) goes to zero as the event ages,and (ii) is additive, meaning a spike in edge activity leadsto a spike in the interaction strength between neighbours.Under these assumptions, the simplest such coupling is astep function, i.e., the contribution of an event to interac-tion strength is constant for a duration η , after which itgoes to zero. As such we define the interaction strength w j of an edge at time t (or state j for short), as the num-ber j of events having occurred in the preceding timewindow of width η .It follows that the local configuration of a node is de-termined by the number k j of its neighbours connectedvia edges in state j , with the degree k of the node relatedto its k j values by k = (cid:80) j k j at any time t . We introduce m j as the number of infected neighbours of a node con-nected via edges in state j . Consequently, 0 ≤ m j ≤ k j with m = (cid:80) j m j the total number of infected neigh-bours. For each node, we store k j and m j for all j invectors k and m , providing a description of edge andnode states in the local neighbourhood of a node. Nodesin class ( k , m ) become infected at a rate F k , m , and arestatistically identical in this sense. We also store the in-teraction strength w j = j in the vector w for all j . Thedynamics of the influence received by a node is thus fullydetermined by ( k , m ) and w . Models of information diffusion.
To examine the ef-fect of temporal interactions on information diffusion,
TABLE I. Transmission rate F k , m for nodes in configuration( k , m ) with interaction strength w and infection rate p dueto external noise. In complex contagion models with rela-tive (RT) and absolute (AT) thresholds, infection is regulatedby parameters φ and M φ , respectively. In the Susceptible-Infected (SI) model, infection is determined by the rate λ . thresholdrelative absolute SI (cid:40) , m · w ≥ φ k · w p, otherwise (cid:40) , m · w ≥ M φ p, otherwise max( p, m · λ )we explore three widely known models of transmission.We consider both relative (RT) and absolute (AT) vari-ants of a threshold mechanism [28, 30, 40], as well asthe Susceptible-Infected (SI) model of epidemic spread-ing [41] (see Table I for details). All models are non-recovery, meaning the uninfected state cannot be reen-tered, and we consider infection due to external noise ata low, but nonzero rate p .The study of threshold dynamics focuses on the con-ditions leading to cascades , or large avalanches of infec-tions that sweep through the network. In the simplestimplementation of threshold dynamics, infection occurswhen the number m of infected neighbours of an unin-fected node exceeds a fraction φ of its degree k [28, 30].Generalising this rule to the case or arbitrary interactionstrength [32, 42], in the RT model infection occurs whenthe influence of infected neighbours, m · w , exceeds afraction φ of all potential influence, k · w . The RT modelcaptures instances of real-world diffusion where interac-tion between elements affect the probability of infectiononly in aggregate, similar to the response of individu-als to new behavioural patterns or transmission in bio-logical neural networks [43, 44]. When considering theRT model over temporal networks, the probability of in-fection may increase during bursts of interaction eventswith infected neighbours or, conversely, bursts of activ-ity with uninfected neighbours may temporarily main-tain a node in the uninfected state. In the AT model,influence from infected neighbours is not normalised, butcompared to some absolute value M φ [40]. In contrastto the RT model, infection is not hindered by interactionactivity with uninfected neighbours, and bursts can onlyincrease the probability of infection. In the SI model, fi-nally, each interaction event with an infected neighbourtriggers infection at a rate λ . In our framework of tem-poral networks, infected neighbours trigger infection viaedges in state j at a rate λj . Writing λ = λ w , the in-fection rate for a node with a neighbourhood of infectednodes described by m is m · λ . Similar to the AT model,bursts can only increase the probability of infection inthe SI model. Binary dynamics over temporal networks.
We extenda master equation formalism [32, 45] to account for net-work temporality. We introduce the state space of allconfigurations ( k , m ) allowed by the underlying degreedistribution p ( k ), under the condition that each edge is inone of a finite number of possible edge states (see Meth-ods, and Supplementary Note 1 for lattice diagrams ofthis space). We introduce the state vector s ( t ) contain-ing the probability that a randomly selected node withunderlying degree k is uninfected and in class ( k , m ) attime t . The time evolution of s is governed by the ma-trix W ( s , t ), containing the transition rate W ij from the i -th to the j -th configuration ( k , m ) at time t . Transi-tions arise from three mechanisms. First, ego transitions,contained in the matrix W ego , describe the loss to config-uration ( k , m ) due to its nodes becoming infected. Thisoccurs at a rate F k , m , as per Table I, so the diagonalterms of W ego are given by − F k , m and off-diagonals arezero. Second, neighbour transitions, contained in ma-trix W neigh , describe the gain or loss to configuration( k , m ) due to the infection of neighbours of nodes in thisclass. This transition is determined by β j dt , the proba-bility of an uninfected neighbour in configuration j be-coming infected over an interval dt (see Methods for anexplicit calculation). Taken together, W ego and W neigh accurately describe diffusion dynamics over static andheterogeneously distributed edges, such as weighted andmultiplex networks [32, 33].Temporal networks require a third component, edge transitions, contained in the matrix W edge , describingthe gain or loss to configuration ( k , m ) due to changesin an edge’s state j . This applies to any temporal net-work model that can be formulated in terms of discrete,dynamic edge states. We denote by µ j dt and ν j dt theprobabilities that a randomly selected edge in state j un-dergoes a positive or negative transition and enters state j +1 or j −
1, respectively, over an interval dt . Combiningthese terms gives the master equation ddt s = ( W ego + W neigh + W edge ) s = W ( s , t ) s . (1)Modelling temporal network dynamics amounts to solv-ing Eq. (1), which along with the initial condition s (0),determine the evolution of the system.To apply this formalism we derive the edge transitionrates µ j and ν j in the case of renewal processes. We firstnote that microscopically, on the scale of a single edge,transitions from state j to j ± j dependence at this scale. A renewal process may thenseem at odds with a Markovian master equation [where s ( t + dt ) depends only on s ( t ), as per Eq. (1)]. Macroscop-ically however, on the scale of large ensembles of edges,the renewal process exhibits effective j -dependent ratesthat are constant in time. We can calculate the prob- regular Poisson bursty σk/ h k i (a)-4 -2 0 2 4 6 8-3-2-10 log σ τ / h τ i l og ρ f tf l og ρ f (b) Generated (c)
Email (d)
Forum φ -6 -4 -2 0 2-3-2-10 log η/ h τ i l og ρ f φ -6 -4 -2 0 2log η/ h τ i φ -6 -4 -2 0 2log η/ h τ i FIG. 1. Normalised density of noise-induced infections ρ f as afunction of interevent time standard deviation σ τ and memory η . Normalised diffusion time t f produces an almost identicaleffect [see (a), inset]. The small σ τ limit, leading to regularpatterns in τ , comprises the quenched limit in (a) where η = 1and the network is effectively static. The large σ τ limit pro-duces large bursts in activity, comprising the annealed regimewhere the network is effectively sparsified and plays no role ininformation diffusion ( ρ f = 1). Mirroring results are achievedby varying memory η for fixed σ τ in generated (b) and empir-ical (c-d) temporal networks. Analytic solution is denoted bydashed lines, and Monte Carlo results by solid lines. Gener-ated networks have lognormal degree distribution with mean (cid:104) k (cid:105) = 7 and standard deviation σ k = 2. We use Weibull-distributed interevent times with mean (cid:104) τ (cid:105) = 1. Plot (b)uses σ τ = 1. For empirical data description see Methods.Node dynamics correspond to the RT model with threshold φ = 0 .
15 and external noise p = 2 × − . Cutoff densityis ρ c = 0 .
4. Monte Carlo simulations are averaged over 10 realisations. Network size is 10 in (a), and 5 × in (b). ability E j that a randomly selected edge is in state j ,and the probability that it transitions to state j ± dt , giving µ j and ν j [see Methods for explicitexpressions for j >
0, with the j = 0 case of E j and µ j comprising a special case that we define in Eqs. (2) and(3) below].Since the rates µ j and ν j are heterogeneous in terms of j , they can be viewed as a signature of the model param-eters ψ ( τ ) and η , and of the non-Markovianity inherentat the scale of a single edge. On a macroscopic scale, µ j , ν j , and E j are constant in time, meaning our sys-tem is indistinguishable from a continuous-time Markovchain model of edge state. That is, a random walk onthe non-negative integers, with transition rates given by µ j and ν j , and a stationary distribution of walkers givenby E j [see Supplementary Note 2 for an illustration of µ j and ν j in the case of a gamma distribution ψ ( τ )].Applying the system-wide rates µ j and ν j at the finer-grained level of configurations ( k , m ) amounts to a meanfield approximation. Monte Carlo simulations (see Sup-plementary Fig. 8) demonstrate that the actual edgetransition rates deviate slightly from µ j and ν j for eachconfiguration ( k , m ), even if they are exact for the net-work as a whole, in the limit of large N . The accuracy ofthe master equation solution provides a measure of theremarkable similarity between a renewal process, whereEq. (1) is an approximation, and the biased random walkinterpretation of edge state, where Eq. (1) is exact. Burstiness and information diffusion.
We validate ouranalytical framework with Monte Carlo simulations ofdiffusion dynamics over temporal networks. Simulationsuse an underlying static, configuration-model networkwith lognormal degree distribution of mean (cid:104) k (cid:105) and stan-dard deviation σ k . We measure the time t c required toreach an arbitrary density ρ c of infected nodes, in thepresence of background noise at rate p . We also measure ρ f , the relative frequency of infections due to externalnoise, such that 0 < ρ f ≤
1, with 1 /ρ f the ratio of allto noise-induced infections, measuring the catalytic ef-fect of external noise (for a detailed description of ρ f seeMethods). We normalise t c by the time taken to reachthe desired density by noise only, providing t f , such that0 < t f ≤
1. Remarkably, ρ f and t f are almost equivalent,with a value of ρ f = t f = 1 indicating slow diffusion withcomplete reliance on external noise, and small ρ f and t f representing rapid diffusion with external noise produc-ing a substantial catalytic effect. Together, they measurethe extent to which the temporal network, rather thanexternal noise, drives the diffusion of information.We first examine the effect of varying interevent timestandard deviation σ τ for fixed memory η = (cid:104) τ (cid:105) = 1 (a) RT-2 -1 0 1 2 3-4-3-2-10123 log σ τ / h τ i l og η / h τ i -3 -2 -1 0 log t f (b) AT-2 -1 0 1 2 3 log σ τ / h τ i -3 -2 -1 0 log t f (c) SI annealedquenched-2 -1 0 1 2 3 log σ τ / h τ i -4 -3 -2 -1 0 log t f FIG. 2. Monte Carlo simulation of the normalised diffusiontime t f as a function of interevent time standard deviation σ τ and memory η . Diffusion dynamics correspond to the RTmodel with φ = 0 .
15 (a), the AT model with M φ = 2 (b), andthe SI model with λ = 0 .
02 (c). Red dashed lines indicatethe σ τ value producing the minimum diffusion time t f , fora given η . This demonstrates, for example at large η in (a),that small values of burstiness maybe be accelerative withrespect to the Poisson case. This is evidenced by the minimumextending beyond σ τ = 1. Oscillations in the line between 0 < log η < j = 0 edges[identical but not shown in (a) and (b)]. Degree distribution islognormal with (cid:104) k (cid:105) = 7 and σ k = 0 .
5. Results correspond to asingle realisation of each ( σ τ , η ) value in networks of size N =10 . The interevent time distribution ψ ( τ ) is Weibull withmean (cid:104) τ (cid:105) = 1 (see Supplementary Note 6 for corresponding ρ f values). [Fig. 1(a)]. We choose a Weibull interevent time distri-bution ψ ( τ ), used widely to model behavioural bursts inboth human [46] and animal [47] dynamics. A Weibulldistribution reduces to the exponential distribution for σ τ = (cid:104) τ (cid:105) = 1. Node dynamics follow the RT model forthreshold φ = 0 .
15 and background noise p = 2 × − .Approaching the small σ τ limit from above, events ar-rive in an increasingly regular pattern, and an increasingfraction of edges are frozen in the mean state η/ (cid:104) τ (cid:105) = 1.We refer to this as the quenched regime, whereby edgesconverge to a single state and the network is effectivelystatic. In the opposing limit of large σ τ , burstiness meansthat at any given time, edge activity is concentratedamong an arbitrarily small fraction of edges that undergolarge spikes in activity, with the remainder in state j = 0.We refer to this as the annealed regime, where the net-work is maximally sparse and has a vanishingly small rolein information diffusion ( ρ f and t f approach one).Both quenched and annealed regimes lead to slow,noise-reliant diffusion, where the expected edge state η/ (cid:104) τ (cid:105) is preserved [Fig. 1(a)]. For intermediate values of σ τ there is a well-mixed regime where relatively rapid dif-fusion is due to edge state fluctuations that are ultimatelyfavourable to transmission. In the RT model this impliesa spike of activity on an infected neighbour overcominga node’s threshold, or decreased activity on uninfectededges lowering the relative influence to be overcome. Thedecelerative effect of quenching is increased for narrowerunderlying degree distributions, since an increasing frac-tion of nodes are frozen in a state unfavourable to trans-mission, a static network effect already reported in [30].A mirroring effect can be obtained by varying mem-ory η for constant σ τ = (cid:104) τ (cid:105) = 1 [Fig. 1(b)]. Thequenched limit is recovered for large η , as large sam-ples of events on each edge result in edges convergingto a mean state, η/ (cid:104) τ (cid:105) , with an increasingly narrow dis-tribution, due to the central limit theorem. As for thecase of fixed η , quenching may be decelerative if cas-cades on the corresponding static network are noise de-pendent. For example, increasing φ can cause slower dif-fusion in the quenched limit [Fig. 1(b)]. The annealed(noise-driven) regime is effectively recovered when η isvanishingly small, meaning almost all edges are in state j = 0 and the role of the network in information diffu-sion vanishes ( ρ f = t f = 1). The correspondence be-tween σ τ and η suggests data-driven experiments thatallow an indirect inference of the effects of varying σ τ inreal systems, an open problem in the study of informa-tion diffusion. We simulate the RT model on two em-pirical temporal networks and vary only the memory η ,recovering qualitatively the effects observed on syntheticnetworks [Fig. 1(c-d), see Methods for data description].This suggests the accelerative and decelerative effects ofburstiness may well be a feature of real-world informationdiffusion.We systematically explore the RT, AT, and SI modelswith Monte Carlo simulations over ( σ τ , η )-space (Fig. 2).The underlying degree distribution is lognormal with Absolute threshold modelSusceptible-Infected model-1 0 1 2 3 4 5 6-3-2-10 Mφ Poisson (a)log σ τ / h τ i l og ρ f ψ ( τ )Poisson (b)log σ τ / h τ i lognormalWeibullgamma -3 -2 -1 0(c) [legend (b)] log ξ E -1 0 1 2 3 4 5 6-2-10 z , λ Poisson (d)log σ τ / h τ i l og ρ f
9, 0.029, 0.013, 0.023, 0.01 ψ ( τ )Poisson (e)log σ τ / h τ i lognormalWeibullgamma -3 -2 -1 0(f) [legend (e)] log ξ E ξE ρ f ξE ξ µ ξE ρ f ξE ξ µ FIG. 3. Relative density of noise-driven infections ρ f as a function of interevent time standard deviation σ τ for the AT (a-c)and the SI (d-f) models. Analytic solution is denoted by dashed lines, and Monte Carlo results by solid lines. (a) Effect ofrelative threshold M φ . (b) Dependence of ρ f on choice of ψ ( τ ). (c) Data collapse of (b) after controlling for effective sparsity ξ E . Left inset is a closeup of the main plot in linear scale, revealing differences in ρ f remain after controlling for ξ E . This isexplained by the differing mixing rates ξ µ (inset right). (e-f) Similar results for the SI model. Cutoff node density is ρ = 0 . η = (cid:104) τ (cid:105) = 1. Analytic solution is shown by dashed lines, Monte Carlo simulations by solid lines. Degree distribution islognormal with (cid:104) k (cid:105) = 7 and σ k = 1. We use a threshold M φ = 2 for the AT model, and external noise p = 2 × − . MonteCarlo simulations involve 10 realisations on networks of size N = 10 . (cid:104) k (cid:105) = 7 and σ k = 0 .
5, and interevent times are Weibull-distributed with (cid:104) τ (cid:105) = 1. Our aim is to understand howthe temporal connectivity evolves over ( σ τ , η )-space. Aspreviously observed, the quenched regime appears eitherin the small σ τ limit for constant (but sufficiently large) η , or in the large η limit for constant σ τ . The temporalnetwork enters the annealed regime in two ways, eitherby taking the small η limit for constant σ τ , or the large σ τ limit for constant η . The two regimes are separatedby a percolation transition, i.e., the emergence of a giantconnected component in the subgraph formed by edgesin state j > ξ E = (cid:90) ∞ η Ψ( τ ) dτ (2)and ξ µ = (cid:82) ∞ η ψ ( τ ) dτ (cid:82) ∞ η Ψ( τ ) dτ , (3)where ξ E equals E , the density of edges in state zero,and ξ µ equals µ , the probability that a randomly se-lected edge in state zero enters state one over an interval dt . We may refer to ξ E as the effective sparsification,or alternatively, the effective annealing. Here Ψ( τ ) isthe complementary cumulative distribution relating to ψ ( τ ). We denote by q ( k ) the degree distribution ob-tained by randomly removing a fraction ξ E of edges ina static configuration model network with degree distri- bution p ( k ), which is identical to the expected subgraphformed by removing state zero edges in the stochastictemporal network. The percolation transition for q ( k )can be computed analytically (see Supplementary Note3), and despite the static assumption, provides an ex-cellent estimate of the boundary between quenched andannealed regimes (Fig. 2), indicating the onset of slow,noise-dependent diffusion for all diffusion dynamics con-sidered.Even if the outcome of information diffusion (as mea-sured by t f ) is qualitatively similar across diffusion mod-els with respect to the features of the temporal network( σ τ and η ), we can identify differences due to node dy-namics by measuring the values of σ τ that produce aminimum diffusion time for given η (see dashed linesin Fig. 2). In the RT model, both quenched and an-nealed regimes produce relatively slow diffusion. Betweenthe two regimes, the minimum diffusion time shifts tolarger σ τ for increasing memory η , and eventually ex-ceeds σ τ = 1, meaning burstiness is accelerative. As η increases, larger and larger fluctuations in τ are requiredto exit the quenched regime and enter the mixing phasewhere rapid diffusion occurs, a consequence of the centrallimit theorem [Fig. 2(a)]. Increasing σ τ also produces anaccelerative effect in the AT model [Fig. 2(b)]. In con-trast to the RT model, burstiness is accelerative only forsmall values of memory η . Since in the AT model wedo not normalise infectious influence by total influence,increasing η is always favourable to transmission, andquenching never slows down diffusion [see Fig. 3(a)]. Inthe SI model burstiness does not have an accelerative ef-fect [Fig. 2(c) and Fig. 3(d)], since its infection rate isunbounded (as opposed to threshold models, as per Ta-ble I). The annealing effect of burstiness overwhelms theincreased rate of local diffusion afforded by unboundedtransmission rates, which increases proportionally to thesize of the burst. As such, acceleration due to bursti-ness appears to be a hallmark of threshold mechanisms,whether relative or absolute.Finally, we examine the effect of the choice of in-terevent time distribution ψ ( τ ) (Fig. 3). We measure thenoise dependence ρ f for lognormal, Weibull and gammadistributions, controlling for both (cid:104) τ (cid:105) and σ τ . Considerfirst the AT model with M φ = 2 and η = (cid:104) τ (cid:105) = 1[Fig. 3(b)]. Here, we observe a striking dependence on ψ ( τ ), with the lognormal distribution leading to the mostrapid diffusion, outpacing the gamma distribution in dif-fusion speed and relative noise dependence by up to afactor of 83, and the Weibull distribution by up to afactor of 14. These differences can be accounted for bycomparing the rate of onset of annealing in terms of ξ E as we increase σ τ . The gamma distribution rapidly an-neals the network, yielding the largest ξ E values of allchoices of distribution, meaning the most edges in state j = 0. As a result, it exhibits the slowest, most noisereliant diffusion. In terms of the value of ξ E induced, thegamma is followed by the Weibull distribution, then thelognormal distribution. In fact, the lognormal requiresorder-of-magnitude larger σ τ to produce equal values of ξ E as the Weibull and gamma distributions. By plot-ting ρ f against ξ E we observe the data to collapse ap-proximately onto a single curve, revealing ξ E to be a farbetter predictor of dynamics than σ τ [see Fig. 3(c) in con-trast to Fig. 3(b)]. Some disagreement persists, however[Fig. 3(c), left inset], which can be explained by notingthat increased rates of mixing ξ µ [Fig. 3(c), right inset]ensure that the small number of active edges redistributeabout the network at a greater rate, thus mediating cas-cades more effectively. An identical effect is observed forthe SI model [Fig. 3(e-f)].The data collapse in Fig. 3(c) and (f) confirms thatabove all it is ξ E , the density of edges in state j = 0,that ultimately determines the diffusion dynamics in ourframework. It remains to determine why the value of ξ E is so sensitive to the choice of interevent time distribu-tion ψ , and in particular, what the properties are of agiven distribution ψ that most contribute to the value of ξ E , beyond its mean and standard deviation. We havefound two properties that correlate with our observationsin Fig. 3(b) and (e), at least qualitatively, as shown inSupplementary Fig. 9. They are the third raw moment, (cid:104) τ (cid:105) , which is closely related to skewness, and differen-tial entropy (as defined in Supplementary Note 8). Thesemeasures provide a rule of thumb such that, for instance,given two distributions ψ with equal mean and variance,it is the one with the greater skewness that produces thelowest ξ E , and the most rapid diffusion. Discussion . Our study shows that generic dynamicsof information diffusion are closely tied to the level of burstiness in the underlying temporal network. By con-sidering three binary-state models of transmission, wehave demonstrated that they differ in their response toburstiness only in their details. For instance, while hav-ing a purely decelerative effect on SI models, increasingburstiness at intermediate values can be accelerative forthreshold models. Nevertheless, the prevailing trend isthat increasing burstiness is strongly decelerative overall,with the onset of the decelerative phase heavily depen-dent on the choice of interevent time distribution. Thekey assumptions here are that the underlying network isfixed, and that due to a memory mechanism, a fractionof edges enter a non-interacting state due to long waitingtimes. These assumptions result in a temporal networktopology that has profound implications for many dy-namical processes. It is likely that structural featuresof the temporal network, such as the percolation transi-tion separating slow and fast diffusion, and the data col-lapse observed when controlling for the effective sparsity,will also be critical for the more general class of binary-state dynamics, including not only threshold models andmodels of disease, but language, voter, and Ising models,among others.Our master equation formalism can be extended to abroad class of temporal network models. In particular,any model that can be formulated in terms of discrete,dynamic edge states is a candidate for our approach. Thisincludes growing, decaying and adaptive networks, aswell as models of rewiring. In line with our use of renewalprocesses, a large family of point processes have natu-ral descriptions in terms of discrete edge states, such ascascading Poisson and Cox processes. Extensions to thePoisson process in general suggest promising applicationsof our approach. In particular, our treatment of non-Markovianity could be applied to other systems. Thatis, while a single component in a large system may bestrongly non-Markovian, as was the case in our renewalprocess, stationary statistics may emerge at an ensemblelevel that act as a signature of the non-Markovianity oc-curring microscopically. Our biased random walk inter-pretation of the renewal process model shows that strik-ingly similar Markovian counterparts may be availablefor analysis. Incidentally, biased random walk modelsof edge state suggest a broad class of Markovian mod-els to which our master equation applies exactly. Thesemay be extended, for example, to L´evy flights, and usedas a probe of various complex systems where memory iscritical.
Acknowledgements
S.U. acknowledges the Pˆole Scientifique deMod´elisation Num´erique from ENS Lyon for theircomputing support, as well as L. Taulelle and V. Lef`evrefor useful technical advice. G.I. acknowledges partialfunding by the European Commission through H2020project HumanE AI under G.A. No. 761758. J.P.G. issupported by Science Foundation Ireland (grant num-bers 16/IA/4470, 16/RC/3918, 12/RC/2289 P2 and18/CRT/6049) with co-funding from the EuropeanRegional Development Fund. M.K. is supported bythe DataRedux (ANR-19-CE46-0008) and SoSweet(ANR-15-CE38-0011) projects funded by ANR and theSoBigData++ (H2020-871042) project.
Author contributions
S.U. derived the analytical solution, and performed allrelated calculations and numerical experiments. S.U.,G.I., J.P.G., and M.K. designed the research and con-tributed to the manuscript.
Competing interests
The authors declare no competing interests.
Methods
Master equation configuration space . We provide herea minimal description of the master equation formalism,with a focus on class transition rates, with a completedescription provided in Supplementary Note 1. We intro-duce C k , m , the set of all nodes in the network with localconfiguration ( k , m ), such that ≤ m ≤ k . Whereas C k , m is a set of nodes, we define C k as the set of allsets C k , m with total degree k . This can be written C k = { C k , m | (cid:80) j k j = k } . Then, we refer to the configu-ration space C as the set of all possible sets C k , m . Givena degree distribution p k , we define C = { ( k , m ) | k ∈ supp p ( k ) and ≤ m ≤ k } , (4)which partitions the network at any given time. Writ-ten this way, C is potentially infinite. To ensure that itbe finite in numerical constructions, we assume an uppercutoff in the degree distribution p ( k ), and the set of edgestates to be of a finite size n . Note that C includes any setfor which C k , m is empty at a given time. The cardinality | C | of configuration space is thus determined entirely bythe support of p k , along with n . Since ( k , m ) does notconvey ego state, just edge and neighbour configuration,we partition C k , m into sets of uninfected and infectednodes, such that C k , m = S k , m ∪ I k , m . Similar definitionsallow us to introduce S k and I k , S k and I k , as well as S and I . Although in general | S k , m | (cid:54) = | I k , m | , the struc-ture of the uninfected and infected configuration spacesis identical, such that | C | = | S | = | I | , | C k | = | S k | = | I k | and | C k | = | S k | = | I k | .The evolution of a dynamical process over a networkamounts to a flow of nodes through the sets S k , m and I k , m over time. Since the number of nodes N in thenetwork is conserved, it is their distribution over the sets S k , m and I k , m that evolves in time. These distributionsprovide the state of the network at time t . Since ourformalism is independent of network size, we deal withthe densities of nodes rather than the absolute sizes ofthese sets. To this end we introduce (cid:107) C k (cid:107) ≡ (cid:88) C k , m ∈ C k | C k , m | (5)as shorthand for the number of nodes with underlyingdegree k . This is in contrast to | C k | and | C k | whichgive the number of configurations with degrees k and k , respectively. To convert from absolute node countto densities of nodes, we need to normalise S k , m and I k , m by some non-zero quantity that is conserved overthe course of a dynamical process. Since our temporalnetwork models assume a static underlying network, anode’s underlying degree k is preserved, and as a result,so is (cid:107) C k (cid:107) , defined in Eq. (5). The density of uninfectednodes in class ( k , m ) in this case is given by s k , m = | S k , m |(cid:107) C k (cid:107) , (6)with i k , m defined analogously. The node conservationprinciple leads to the condition (cid:80) C k ( s k , m + i k , m ) = 1,which is to say that the sum of all densities s k , m and i k , m with underlying degree k , is one. We then have ρ k = 1 − (cid:88) C k s k , m (7)and ρ = (cid:88) k p ( k ) ρ k , (8)where the sum in the first expression is over all configu-rations ( k , m ) that satisfy (cid:80) j k j = k . The term ρ k givesthe probability that a randomly selected node with un-derlying degree k will be infected, and ρ the probabilitythat any randomly selected node will be infected.As discussed in the main text, s is the | C | -dimensionalvector storing the densities s k , m . In practice, we uselexicographic ordering of the tuples in C to define a one-to-one mapping ( k , m ) (cid:55)→ i , for some i ∈ { , . . . , | C |} to define the i -th element s i of s . Finally, it is possibleto show that for fixed n and limiting k , the size of C behaves like Θ( k n ). Now that we have defined the spaceof allowed configurations, we turn to its dynamics. Master equation transition rates . Ego transitions occurat rates F k , m , and involve the flow of nodes from set S k , m to I k , m . As such, no change to the ego’s local neighbour-hood ( k , m ) takes place, and the transition represents atype of self-edge, or loop, in the lattice representation ofconfiguration space, illustrated in Supplementary Note1. The rates F k , m are encoded in transmission functionssuch as those shown in Table I. Flux measurements ofthese transitions, such as those in Supplementary Note7, are expected to be exact, and are an important bench-mark for verification of experiments. The rates F k , m arecontained in the matrix W ego .Neighbour transitions are based on the probability β j dt that an uninfected neighbour of an uninfected node be-comes infected over an interval dt . To calculate β j we usea straightforward ensemble average over S . To obtain theexpected fraction of neighbours undergoing transitions,we observe the number of nodes undergoing ego transi-tions at time t , and count the number of neighbour tran-sitions produced as a result. That is, when an uninfectednode in class ( k , m ) becomes infected, which occurs withprobability F k , m dt , it has k j − m j uninfected neighboursthat observe this transition, or k j − m j nodes undergoingneighbour transitions. The number of such edges acrossthe entire network is given by (cid:80) S p k ( k j − m j ) F k , m s k , m ,where the sum is over all uninfected classes. We comparethis to the total number of uninfected-uninfected edges, (cid:80) S p k ( k j − m j ) s k , m , giving the neighbour transition rate β j dt = (cid:80) S p k ( k j − m j ) F k , m s k , m (cid:80) S p k ( k j − m j ) s k , m dt, (9)which has previously been used in master equation solu-tions of binary-state dynamics on static networks. Therates β j are contained in the matrix W neigh , weighted bythe values k j and m j of the relevant classes ( k , m ), asdetailed in Supplementary Note 1.Edge transitions occur at rates µ j and ν j , and give theprobability of edges in state j transitioning to state j + 1or j −
1, respectively, over an interval dt . Their valuedepends upon the temporal network model in question.In this work, edge transition rates are determined by re-newal processes following interevent time distributions ψ ( τ ), with complementary cumulative distributions Ψ.If the state of an edge is determined by the number ofevents j having occurred in the preceding time windowof duration η due to a renewal process, edge transitionrates are µ j dt = Ψ ∗ ψ ∗ j Ψ ∗ ψ ∗ ( j − ∗ Ψ dt (10)and ν j dt = Ψ ∗ ψ ∗ ( j − Ψ ∗ ψ ∗ ( j − ∗ Ψ dt, (11)with E j = Ψ ∗ ψ ∗ ( j − ∗ Ψ (12)giving the probability that a randomly selected edge is instate j . It is this quantity that provides the normalisingconstant for the rates µ j and ν j . Here, ψ ∗ j is the j -thconvolution power of ψ . A complete derivation is givenin Supplementary Note 1. The Gaver-Stehfest algorithmis used to compute the inverse Laplace transforms, and an efficient numerical procedure reducing µ j and ν j toa matrix-vector product is developed in SupplementaryNote 5. These expressions hold for j >
0, with Eqs. (2)and (3) in the main text giving the special case of j = 0for E j and µ j , respectively. Regardless of the form of ψ , the mean edge state η/ (cid:104) τ (cid:105) is always conserved on anetwork-wide level. Applying Eqs. (10) and (11) at thelevel of class transitions amounts to a mean field approx-imation, since flux measurements of Monte Carlo simu-lation show edge transition rates to deviate slightly from µ j and ν j at the class level ( k , m ), even if exact for thenetwork as a whole, as shown in Supplementary Note 7. Simulation . We simulate networks G = ( V , E ) com-posed of a node set V of size N , and an underlying edgeset E . The edge set is produced by a desired degree dis-tribution, wired according to the configuration model.Overlying temporal network activity is initialised to thesteady state, such that at time t = 0, the time to the firstevent follows exactly the residual distribution Ψ, in thelimit of large networks. Specifically, we set the time to t = − η , and draw |E| residual times from Ψ, or one foreach edge. Subsequent interevent times are drawn from ψ . Advancing in time from − η ensures that a stationarydistribution of edge states E j is achieved exactly at t = 0,when we begin to allow node dynamics to evolve. Dueto the large values of interevent time standard deviationstudied in this work, out-of-the-box sampling routineswere either inefficient or broke down for large σ τ . Assuch, we develop a simple, yet efficient routine in Sup-plementary Note 4 based on approximate inverse trans-form sampling of ψ and Ψ, using a bisection method.This is performed on a numerical grid of Ψ values, withrelevant details of the probability distributions outlinedin detail in Supplementary Note 9. A third-order splineinterpolation on a logarithmic scale provides intermedi-ate values of the grid, such that the resultant underlyingdistribution is close to exact.Node dynamics are implemented via a Gillespie algo-rithm, which uses the fact that the waiting time to in-fection for an uninfected node in class ( k , m ) follows anexponential distribution with mean 1 /F k , m . Initially allnodes are in the uninfected state, and the diffusion pro-cess is triggered by low-level background noise at rate p .To simulate the temporal network itself, a time-orderedsequence of edge events is implemented in parallel withthe node update sequence. This amounts to two sepa-rate time-ordered sequences of events executed simulta-neously. Algorithms are described in detail in Supple-mentary Note 4 with pseudocode.We use the normalised density of noise-induced infec-tions, ρ f , and normalised diffusion time, t f , as measuresof the diffusion process. We define these quantities asfollows. The probability that a randomly selected nodehas been infected as a result of external noise is˜ ρ ( t ) = p (cid:90) t (1 − ρ ( τ )) dτ, (13)meaning 0 < ˜ ρ ≤ ρ . We define ρ f as the fraction ofinfections that are due to noise ρ f = ˜ ρ/ρ , such that0 < ρ f ≤
1. This value cannot equal zero since theremust be at least one noise induced infection, namely, thefirst infection in the diffusion process. A value approach-ing ρ f = 1 means almost all infection is due to externalnoise. This occurs in the annealed limit, when almost alledges are in state j = 0, and network interactions playa vanishingly small role in the diffusion process. As aconsequence, the time evolution of the diffusion processis governed by ˙ ρ = p (1 − ρ ) (14)whose solution ρ = 1 − e − pt can be inverted to give thetime required to the achieve a given density ρ of infectionsrelying solely on noise, that is, t = − ln(1 − ρ ) p . (15)If t c is the time required in the general case to reach acutoff density of infections ρ c , normalising t c by Eq. (15)evaluated at ρ c defines t f , such that 0 < t f ≤
1. A valueof t f = 1 means the system is driven entirely by noise,and a value approaching 0 a rapid diffusion process. Animportant feature of this work is that t f and ρ f seem tobe interchangeable, as per the inset of Fig. 1(a), and anyresult shown in terms of ρ f produces an identical picturein t f . Data description . In this work we use two empiricaltemporal networks used by [48] and references therein,which we describe below. To simulate diffusion processes on these networks we use periodic boundary conditions,starting at a randomly selected point in time.The first dataset is a temporal network of email ex-change [48, 49], extracted from the log files of a univer-sity email server. The sender, recipient and the times-tamp are used to form the network. The dataset consistsof N = 3188 nodes, and |E| = 31857 underlying edges,such that the average degree is 19 .
99. A total of 308730events were recorded, with a resolution of one secondover a period of 81 . .
691 eventsoccur per edge. We determine the interevent time distri-bution by taking the the subset of edges observing morethan one event, of which there are 21199. The mean in-terevent time is then calculated to be (cid:104) τ (cid:105) = 3 .
125 days,with standard deviation σ τ = 6 .
620 days. This yields acoefficient of variation σ τ / (cid:104) τ (cid:105) = 2 . N = 7083 nodes, and |E| = 138144 underlying edges, such that the average de-gree is 39 .
01. A total of 1428493 events were recorded,with a resolution of one second over a period of 3133days. An average of 10 .
34 events occur per edge. We de-termine the interevent time distribution by taking the thesubset of edges observing more than one event, of whichthere are 70902. The mean interevent time is then cal-culated to be (cid:104) τ (cid:105) = 16 .
60 days, with standard deviation σ τ = 76 .
53 days. This yields a coefficient of variation σ τ / (cid:104) τ (cid:105) = 4 . [1] Holme, P. & Saram¨aki, J. Temporal networks. Phys.Rep. , 97–125 (2012).[2] Masuda, N. & Lambiotte, R.
A Guide to Temporal Net-works (World Scientific, 2016).[3] Holme, P. Modern temporal network theory: a collo-quium.
Eur. Phys. J. B , 234 (2015).[4] Barab´asi, A.-L. The origin of bursts and heavy tails inhuman dynamics. Nature , 207 (2005).[5] Goh, K.-I. & Barab´asi, A.-L. Burstiness and memory incomplex systems.
EPL , 48002 (2008).[6] Karsai, M., Kaski, K., Barab´asi, A.-L. & Kert´esz, J. Uni-versal features of correlated bursty behaviour. Sci. Rep. , 1–7 (2012).[7] Davidsen, J. & Kwiatek, G. Earthquake interevent timedistribution for induced micro-, nano-, and picoseismic-ity. Phys. Rev. Lett. , 068501 (2013).[8] de Arcangelis, L., Godano, C., Lippiello, E. & Nicodemi,M. Universality in solar flare and earthquake occurrence.
Phys. Rev. Lett. , 051102 (2006).[9] Turnbull, L., Dian, E. & Gross, G. The string method ofburst identification in neuronal spike trains. J. Neurosci.Methods , 23–35 (2005).[10] Karsai, M., Jo, H.-H. & Kaski, K.
Bursty Human Dy-namics (Springer, 2018). [11] Karsai, M. et al.
Small but slow world: how networktopology and burstiness slow down spreading.
Phys. Rev.E , 025102 (2011).[12] Lambiotte, R., Tabourier, L. & Delvenne, J.-C. Bursti-ness and spreading on temporal networks. Eur. Phys. J.B , 320 (2013).[13] Jo, H.-H., Perotti, J. I., Kaski, K. & Kert´esz, J. Ana-lytically solvable model of spreading dynamics with non-Poissonian processes. Phys. Rev. X , 011041 (2014).[14] Horv´ath, D. X. & Kert´esz, J. Spreading dynamics onnetworks: the role of burstiness, topology and non-stationarity. New J. Phys. , 073037 (2014).[15] Williams, O. E., Lillo, F. & Latora, V. Effects of mem-ory on spreading processes in non-Markovian temporalnetworks. New J. Phys. , 043028 (2019).[16] Vazquez, A., R´acz, B., Luk´acs, A. & Barab´asi, A.-L.Impact of non-Poissonian activity patterns on spreadingprocesses. Phys. Rev. Lett. , 158702 (2007).[17] Mancastroppa, M., Vezzani, A., Mu˜noz, M. A. & Buri-oni, R. Burstiness in activity-driven networks and theepidemic threshold. J. Stat. Mech.: Theory Exp. ,053502 (2019).[18] Pastor-Satorras, R., Castellano, C., Van Mieghem, P. &Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. , 925–979 (2015).[19] Vespignani, A. et al. Modelling COVID-19.
Nat. Rev.Phys.
Phys. Rev. Lett. , 128301(2017).[21] Liu, S., Perra, N., Karsai, M. & Vespignani, A. Con-trolling contagion processes in activity driven networks.
Phys. Rev. Lett. , 118702 (2014).[22] Masuda, N. & Holme, P.
Temporal Network Epidemiology (Springer, 2017).[23] Masuda, N. & Holme, P. Small inter-event times gov-ern epidemic spreading on networks.
Phys. Rev. Res. ,023163 (2020).[24] Miritello, G., Moro, E. & Lara, R. Dynamical strengthof social ties in information spreading. Phys. Rev. E ,045102 (2011).[25] Hiraoka, T. & Jo, H.-H. Correlated bursts in temporalnetworks slow down spreading. Sci. Rep. , 1–12 (2018).[26] Min, B., Goh, K.-I. & Vazquez, A. Spreading dynamicsfollowing bursty human activity patterns. Phys. Rev. E , 036102 (2011).[27] Rocha, L. E., Liljeros, F. & Holme, P. Simulated epi-demics in an empirical spatiotemporal network of 50,185sexual contacts. PLoS Comput. Biol. , e1001109 (2011).[28] Granovetter, M. Threshold models of collective behavior. Am. J. Sociol. , 1420–1443 (1978).[29] Karsai, M., I˜niguez, G., Kikas, R., Kaski, K. & Kert´esz,J. Local cascades induced global contagion: how hetero-geneous thresholds, exogenous effects, and unconcernedbehaviour govern online adoption spreading. Sci. Rep. ,27178 (2016).[30] Watts, D. J. A simple model of global cascades on ran-dom networks. Proc. Natl. Acad. Sci. U.S.A. , 5766–5771 (2002).[31] Gleeson, J. P. Cascades on correlated and modular ran-dom networks. Phys. Rev. E , 046117 (2008).[32] Unicomb, S., I˜niguez, G. & Karsai, M. Threshold drivencontagion on weighted networks. Sci. Rep. , 3094 (2018).[33] Unicomb, S., I˜niguez, G., Kert´esz, J. & Karsai, M. Reen-trant phase transitions in threshold driven contagion onmultiplex networks. Phys. Rev. E , 040301 (2019).[34] Karimi, F. & Holme, P. Threshold model of cascades inempirical temporal networks.
Physica A , 3476–3483(2013).[35] Karimi, F. & Holme, P. A temporal network versionof Watts’s cascade model. In Holme, P. & Saram¨aki,J. (eds.)
Temporal Networks , 315–329 (Springer BerlinHeidelberg, 2013). [36] Takaguchi, T., Masuda, N. & Holme, P. Bursty communi-cation patterns facilitate spreading in a threshold-basedepidemic dynamics.
PloS One (2013).[37] Backlund, V.-P., Saram¨aki, J. & Pan, R. K. Effects oftemporal correlations on cascades: threshold models ontemporal networks. Phys. Rev. E , 062815 (2014).[38] V´azquez, A. et al. Modeling bursts and heavy tails inhuman dynamics.
Phys. Rev. E , 036127 (2006).[39] Whitt, W. Approximating a point process by a renewalprocess, I: two basic methods. Oper. Res. , 125–147(1982).[40] Centola, D. & Macy, M. Complex contagions and theweakness of long ties. Am. J. Soc. , 702–734 (2007).[41] Valdano, E., Ferreri, L., Poletto, C. & Colizza, V. Ana-lytical computation of the epidemic threshold on tempo-ral networks.
Phys. Rev. X , 021005 (2015).[42] Ya˘gan, O. & Gligor, V. Analysis of complex contagionsin random multiplex networks. Phys. Rev. E , 036103(2012).[43] Gerstner, W., Kistler, W. M., Naud, R. & Paninski, L. Neuronal Dynamics: from Single Neurons to Networksand Models of Cognition (Cambridge University Press,2014).[44] Iyer, R., Menon, V., Buice, M., Koch, C. & Mihalas, S.The influence of synaptic weight distribution on neuronalpopulation dynamics.
PLoS Comput. Biol. , e1003248(2013).[45] Gleeson, J. P. High-accuracy approximation of binary-state dynamics on networks. Phys. Rev. Lett. ,068701 (2011).[46] Jiang, Z.-Q. et al.
Calling patterns in human commu-nication dynamics.
Proc. Natl. Acad. Sci. U.S.A. ,1600–1605 (2013).[47] Sorribes, A., Armendariz, B. G., Lopez-Pigozzi, D.,Murga, C. & de Polavieja, G. G. The origin of behavioralbursts in decision-making circuitry.
PLoS Comput. Biol. (2011).[48] Saram¨aki, J. & Holme, P. Exploring temporal networkswith greedy walks. Eur. Phys. J. B , 334 (2015).[49] Eckmann, J.-P., Moses, E. & Sergi, D. Entropy of dia-logues creates coherent structures in e-mail traffic. Proc.Natl. Acad. Sci. U.S.A. , 14333–14337 (2004).[50] Karimi, F., Ramenzoni, V. C. & Holme, P. Structuraldifferences between open and direct communication inan online community.
Physica A , 263–273 (2014).[51] Abate, J. & Whitt, W. A unified framework for numer-ically inverting Laplace transforms.
INFORMS Journalon Computing , 408–421 (2006). Contents
Supplementary Note 1. Master equation solution 1Supplementary Note 2. Random walk equivalence 7Supplementary Note 3. Edge-state distribution 7Supplementary Note 4. Monte Carlo simulation 8Supplementary Note 5. Laplace transform inversion 11Supplementary Note 6. Diffusion speed and noise 12Supplementary Note 7. Mean field approximation 12Supplementary Note 8. Skewness and entropy 14Supplementary Note 9. Probability distributions 15
Supplementary Note 1. Master equation solution
In this Supplementary Note we detail the master equa-tion solution used to solve for binary-state dynamics onour temporal network model. The general approach willbe to assign one of a finite number of types to each edge,and allow this quantity to evolve over time. To formulatea master equation solution, one defines a state space of al-lowed node configurations, which we term configurationspace . The second step is to define the allowed transi-tions between node configurations. The time evolutionof a probability density over this state space amountsto a set of first-order differential equations, or rate equa-tions, that among other things, provides the total densityof infected nodes at a given time.
Configuration space
As discussed in the main text, a network can be par-titioned by the configurations ( k , m ), where each nodeis assigned exactly one configuration, at any point intime. As a reminder, k and m are n -dimensional vec-tors storing k j and m j , the number of neighbours alongedges of type j , and the number of infected neighboursalong edges of type j , respectively. As a consequence,we have 0 ≤ m j ≤ k j , with 1 ≤ j ≤ n . Now considera network of size N , following a degree distribution p k ,where k = (cid:80) j k j is the total degree, in a system allow-ing a maximum of n edge states. We introduce C k , m , theset of all nodes in the network with local configuration( k , m ). Whereas C k , m is a set of nodes , we introduce C k to define the set of all sets C k , m with total degree k ,that is C k = { C k , m | (cid:80) j k j = k } . Finally, C is the setof all possible sets C k , m . Provided a distribution of totaldegrees p k , and edge dimension n , we define C = { ( k , m ) | k ∈ supp( p k ) and ≤ m ≤ k } , (1)which partitions the network at any given time. Thisincludes sets for which C k , m = ∅ at a given time. Thecardinality of this universal set, | C | , is determined by thesupport of p k , in addition to n . Since ( k , m ) does notconvey ego state, just edge and neighbour configuration, we partition C k , m into uninfected and infected nodes,such that C k , m = S k , m ∪ I k , m . Similar definitions allowus to introduce S k and I k , S k and I k , as well as S and I .Although in general | S k , m | (cid:54) = | I k , m | , the structure of theuninfected and infected configuration spaces is identical,such that | C | = | S | = | I | , | C k | = | S k | = | I k | as well as | C k | = | S k | = | I k | .The evolution of a dynamical process over a networkamounts to a flow of nodes through the sets S k , m and I k , m over time. Since the number of nodes N in the net-work is conserved, it is just their distribution over thesets S k , m and I k , m that evolves in time. These distribu-tion provide the state of the network at time t . Since ourformalism is independent of network size, we deal withthe densities of nodes rather than the absolute sizes ofthese sets. As such, we define (cid:107) C k (cid:107) ≡ (cid:88) C k , m ∈ C k | C k , m | , (2)in order to give the number of nodes with degree vectors k and m , and total degree k . This is in contrast to | C k | and | C k | which give the number of configurations withdegrees k and k . To convert from absolute node count todensities of nodes, we need to normalise S k , m and I k , m bysome non-zero quantity that is conserved over the courseof a dynamical process. For the temporal network modelsin question, the desired quantity is (cid:107) C k (cid:107) , defined above.The density of uninfected nodes in class ( k , m ) in thiscase is given by s k , m = | S k , m |(cid:107) C k (cid:107) , (3)with i k , m defined analogously. In the case of temporalnetworks, the node conservation principle leads to thenormalisation condition (cid:80) k , m | k ( s k , m + i k , m ) = 1, for agiven k class. We then have ρ k = 1 − (cid:88) k , m | k s k , m (4)and ρ = (cid:88) k p k ρ k , (5)where the sum in the first expression is over all configura-tions ( k , m ) that satisfy (cid:80) j k j = k . The time-dependentterm ρ k gives the probability that a randomly selectednode with total degree k will be infected, and ρ the prob-ability that any randomly selected node will be infected.Finally, we define s as the | C | -dimensional vector stor-ing the densities s k , m . In practice, we use lexicographicordering of the tuples in C to define a one-to-one map-ping ( k , m ) (cid:55)→ i , for some i ∈ { , . . . , | C |} to define the i -th element s i of s . Finally, it is possible to show thatfor fixed n and limiting k , the size of C behaves like (0 ,
0) (1 ,
0) (2 ,
0) (3 ,
0) (4 , ,
0) (1 ,
0) (2 ,
0) (3 , ,
1) (1 ,
1) (2 ,
1) (3 , ,
0) (1 ,
0) (2 , ,
1) (1 ,
1) (2 , ,
2) (1 ,
2) (2 , ,
0) (1 , ,
1) (1 , ,
2) (1 , ,
3) (1 , ,
0) (0 ,
1) (0 ,
2) (0 ,
3) (0 , k = ( , ) T k = ( , ) T k = ( , ) T k = ( , ) T k = ( , ) T m = m = m = m = m = Supplementary Figure 1.
Lattice diagram of temporal network configuration space.
Temporal network configurationspace for a node of total degree k = 4, with n = 2 allowed edge states. Right-diagonal transitions indicate neighbour infection,and left-diagonals the increments and decrements between edge-states j = 0 and 1 that may occur in models of temporalnetworks. Nodes in this lattice are labelled by their infected degree vector m T , with corresponding k shown. Right-diagonaltransitions preserve the degree vector k , indicated at the bottom of these diagonals. Left-diagonal transitions preserve totalinfected neighbour count m , shown at the top of these diagonals. Θ( k n ). Now that we have defined the space of allowedconfigurations, we turn to its dynamics. Configuration transitions
As outlined in the preceding section, the state of thesystem at time t is given by the | C | dimensional vector s ( t ). After providing an initial condition s (0), the evolu-tion of the system can be approximated with the matrix W ( s , t ), such that ddt s = W ( s , t ) s = ( W ego + W neigh + W edge ) s , (6)where W can be decomposed into separate | C | -dimensional square matrices corresponding to flowsdriven by ego , neighbour , and edge transitions, respec-tively. We outline these transitions in the following para-graphs. In the following we assume that besides the tran-sitions specified, entries in W ego , W neigh and W edge arezero. Despite W being sparse, and the numerical imple-mentation ultimately being in the form of dictionaries,we prefer the matrix form for exposition. Ego transitions . In a non-recovery node dynamics, egotransitions of a node in class ( k , m ) involve an uninfectednode becoming infected, thereby exiting class S and en-tering class I . These transitions drive the actual node dynamics that overlie the temporal network substrate.The transitions between these two classes are defined byinfection and recovery rates F k , m and R k , m that dependon the dynamical model of interest. Examples of F k , m ,the relative and absolute threshold rules, are given inthe main text. We set R k , m = 0 as we are interested innon-recovery dynamics here, where infected nodes cannotreenter the uninfected state, although this is straightfor-ward to generalise. We assume that node transitions oc-cur homogeneously in time regardless of the underlyingdynamics. In other words, a node in the uninfected statebecomes infected over an interval [ t, t + dt ] with proba-bility F k , m dt . If W ( A, B ) is the rate of transition fromclass A to B , ego transitions in a non-recovery systemare given by W ( S k , m , I k , m ) = F k , m . (7)Since a node’s egocentric network ( k , m ) doesn’t changeduring such a transition, the off-diagonal terms of W ego are zero, with the i -th diagonal term being − F i , withego transitions being a net loss to the S set. In contrast,the following transitions correspond to off-diagonal ma-trices, since nodes undergoing these transitions remainin the S class, and are compensated for elsewhere in W . These transitions appear as a type of self-loop inlattice diagrams of configuration space, SupplementaryFigs. (1) and (2). Further, flux measurements of these (0, 0) (1, 0) (2, 0)(0, 0) (1, 0)(0, 1) (1, 1)(0, 0) (0, 1) (0, 2) k = ( , ) T k = ( , ) T k = ( , ) T m = m = m = (a) static (0, 0) (1, 0) (2, 0)(0, 0) (1, 0)(0, 1) (1, 1)(0, 0) (0, 1) (0, 2) k = ( , ) T k = ( , ) T k = ( , ) T m = m = m = (b) temporal Supplementary Figure 2.
Comparison of static and temporal configuration space.
Allowed transitions in static networks,(a), and temporal networks, (b), for a k = 2 degree node with an n = 2 level edge-state set. See caption in SupplementaryFig. 1 for interpretation. Note that the number of configurations ( k , m ) is identical in each case, | C | = 10, it is just the numberof allowed transitions that differ. transitions in Monte Carlo simulation ought to be exact,in the limit of large networks, and therefore act as a usefulbenchmark in transition rate studies. See SupplementaryFig. 8(a) for an illustration. Neighbour transitions . Neighbor transitions refer tothe change in nodes class due to the change in state ofone of its neighbours, reflected in the value of the partialdegree vector m . We distinguish neighbour transitions bythe type j of the corresponding edge. The rates at whichnodes leave the class ( k , m ) due to neighbour infectionare given by W ( S k , m , S k , m + e j ) = β j ( k j − m j ) . (8)The coefficient β j gives the rates at which uninfectedneighbours of uninfected nodes become infected. Thisquantity is derived below. Influx to ( k , m ) from the class( k , m − e j ) due to the same mechanism is W ( S k , m − e j , S k , m ) = β j ( k j − m j + 1) . (9)To calculate β j we use a straightforward ensemble av-erage, or mean-field approximation, over the set of alluninfected nodes. To obtain the expected fraction ofneighbours undergoing transitions, we observe the num-ber of egos undergoing transitions at time t , and countthe number of neighbour transitions thus produced. Thatis, when an uninfected node in class ( k , m ) becomesinfected, which occurs with probability F k , m dt , it pro-duces k j − m j uninfected nodes that observe neighbourtransitions. The number of such edges across the en-tire network is given by (cid:80) S p k ( k j − m j ) F k , m s k , m , wherethe sum is over all uninfected classes. We comparethis to the total number of uninfected-uninfected edges, (cid:80) S p k ( k j − m j ) s k , m , giving the neighbour transition rate β j dt = (cid:80) S p k ( k j − m j ) F k , m s k , m (cid:80) S p k ( k j − m j ) s k , m dt, (10) which has previously been used in master equation so-lutions of binary-state dynamics on static networks, seemain text for references. At this point in the derivation,we could stop and write W = W ego + W neigh in order torecover the static network transition matrix. These tran-sitions appear as the right diagonals in lattice diagramsof configuration space, Supplementary Figs. (1) and (2),where they preserve the degree vector k . Further, fluxmeasurements of these transitions in Monte Carlo sim-ulation appear to show that Supplementary Eq. (10) isexact, in the limit of large temporal networks. See Sup-plementary Fig. 8(b) and (c) for an illustration. Positive edge transitions . The temporal nature of theunderlying network is implemented using changes to edgestates in the network. A positive edge transition refersto the change in a node’s configuration due to an incre-ment in the state of one of its edges over an interval dt .Regardless of the interpretation of edge state and themechanism driving the transition, the probability of thisoccurring on a randomly selected edge of type j is µ j dt .In fact, we delay until the next section our discussionsof models of temporal networks - for now it suffices toassume that they can be represented by networks withdynamic edge state. For a configuration ( k , m ), a pos-itive edge transition on a j -type edge means losing anedge of that type, and gaining an edge of type j + 1. Forbrevity, we introduce the term ∆ ± j = − e j + e j ± , corre-sponding to the change in the degree vector k imposedby such a transition. That is, an adjacent node loses a j -type edge, and gains a j ± k . The symmetry relations∆ + j = − ∆ − j +1 and ∆ − j = − ∆ + j − clearly hold.The configuration that a ( k , m ) node enters whenundergoing a positive transition on a j -type edge is( k +∆ + j , m ), or ( k +∆ + j , m +∆ + j ), depending on whetherthe neighbouring node was uninfected or infected, so thatwe have W ( S k , m , S k +∆ + j , m ) = µ j ( k j − m j ) (11)and W ( S k , m , S k +∆ + j , m +∆ + j ) = µ j m j , (12)respectively. Similarly, nodes may enter the configuration( k , m ) through a positive transition on a j − k − ∆ − j , m ) and ( k − ∆ − j , m − ∆ − j ). We have W ( S k − ∆ − j , m , S k , m ) = µ j ( k j − m j + 1) (13)and W ( S k − ∆ − j , m − ∆ − j , S k , m ) = µ j ( m j + 1) , (14)if the neighbour is uninfected or infected, respectively.Combining these terms gives the flow through the con-figuration ( k , m ) due to positive transitions on j -typeedges. Note that we typically impose boundary condi-tions, if not because sharp cutoffs arise naturally in manytemporal network models, because we require the config-uration space to remain finite. If n remains the numberof allowed edge states, we impose the condition that onecannot observe a positive edge transition on an n -typeedge. As such, a node cannot lose an n -type edge througha positive edge transition, and we write µ n = 0. Negative edge transitions . A negative edge transitionrefers to the change in a node’s configuration when anadjacent event is forgotten over an interval dt , causinga decrease in the number of memorable events on thatedge. As defined above, this occurs with probability ν j dt .When an event terminates on an edge of type j , it gainsan edge of type j −
1, and loses and edge of type j , pre-serving the total degree k . The relations between nodeclasses due to negative edge transitions mirror their pos-itive counterparts, and we include them here for com-pleteness. A node in configuration ( k , m ) moves to class( k + ∆ − j , m ) and ( k + ∆ − j , m + ∆ − j ), if an event on a j -type edge terminates while connected to an uninfectedor infected neighbour, respectively. This occurs at rates W ( S k , m , S k +∆ − j , m ) = ν j ( k j − m j ) (15)and W ( S k , m , S k +∆ − j , m +∆ − j ) = ν j m j . (16)Similarly, nodes may enter the configuration ( k , m ) witha negative edge transition on a j +1 edge, from the classes( k − ∆ + j , m ) and ( k − ∆ + j , m − ∆ + j ) as follows. If the eventterminates between a node and an uninfected neighbour, we have W ( S k − ∆ + j , m , S k , m ) = ν j ( k j − m j + 1) (17)and W ( S k − ∆ + j , m − ∆ + j , S k , m ) = ν j ( m j + 1) , (18)if the neighbour is infected. Combining these terms givesthe flow through the configuration ( k , m ) due to negativeedge transitions. Note the boundary condition, namelythat a j = 0 edge is the case where no events have takenplace in the last η interval. As such, a node cannot losea 0-type edge through a negative edge transition, and wereflect this by writing ν = 0. These transitions appear asleft diagonals in lattice diagrams of configuration space,Supplementary Figs. (1) and (2), and preserve the totalnumber of infected neighbours m . Further, flux mea-surements of these transitions in Monte Carlo simulationshow that constant µ j and ν j can be excellent approxi-mations of non-Markovian systems. See SupplementaryFig. 8(d) to (f) for an illustration. Calculating µ j and ν j for renewal processes In this section, we calculate the rates of positive andnegative edge transitions µ j and ν j in the stochastic tem-poral network model discussed in the previous section.Here, µ j dt gives the probability that at time t , for a re-newal process having already produced j events in thepreceding time window of duration η , a ( j + 1)-th eventis observed between time t and t + dt . Conversely, ν j dt gives the probability of an event exiting the η window.This is illustrated in Supplementary Fig. 3. At the outset,describing such a process with constant rates µ j and ν j seems inappropriate, as it is memoryless only in so-called event space . In this representation, a renewal process isnothing other than a sequence of trials, { τ , τ , . . . } , orthe random sampling of a value τ from a distribution ψ ( τ ). In this sense, the process is memoryless. However,in the resulting time-series for continuous t , the process isnon-Markovian, as the time of the next event was decidedat the time of the previous event, and the probability ofan event occurring at a time between these points is zero.This is in contrast to a Poisson process, where the changein edge state due to the occurrence of an event is constantin time. Thus, on a microscopic level, where we observea stochastic process on a single edge, a rate descriptionis nonsensical.We note, however, that in our master equation formal-ism, classes ( k , m ) really represent ensembles of nodes,and although constant rates cannot be identified on a mi-croscopic level, useful quantities do exist on a network-wide macroscopic level. That is, calculating the fractionof j -type edges that change state over an interval dt turnsout to be strongly heterogeneous for varying j , which isclearly not the case for a Poisson process. The hetero- (a)(b) tt − η t j +1 t j τ j . . .t t τ t τ τ tt − η t j τ . . .t t τ t τ τ Supplementary Figure 3.
Enumeration of edge state con-figurations.
Configuration of j events occurring over an in-terval of length η , where η amounts to memory of observedevents. In plot (a) we enumerate all configurations of edgesin state j at time t , with a ( j + 1)-th event occurring over theinterval t < t j +1 < t + dt . In plot (b) we enumerate all theconfigurations of edges in state j . We use this to calculate therate of positive edge transition µ j , and edge state distribution E j , respectively. Interevent times τ , . . . , τ j are drawn from ψ , whereas τ (cid:48) and τ (cid:48)(cid:48) are drawn from Ψ, defined in the text. geneity of the distribution ψ is reflected in the hetero-geneity of µ , ν and E .The rate of positive edge transition µ j dt is calculatedby finding the probability of a ( j + 1)-th event occurringover a given interval dt , on the condition that j eventshave already been produced in the preceding time inter-val of duration η . This is illustrated in SupplementaryFig. 3(a), where we set t = η for convenience. In Supple-mentary Fig. 3, the interevent times τ , . . . , τ j are drawnfrom the distribution ψ , and the times τ (cid:48) and τ (cid:48)(cid:48) from itscomplementary cumulative distribution Ψ, also known asthe residual time distribution. It is defined asΨ( τ ) = (cid:90) ∞ τ ψ ( t ) dt, (19)and gives the probability that the time between eventsis of duration at least τ . We introduce the domain T oftimes spanned by the configurations allowed in Supple-mentary Fig. 3, or the times t < t < ... < t j in an inter-val of duration η , such that t j − t < η and t j +1 − t < η .Note that consecutive event times t j and t j +1 cannot co-incide, with interevent times drawn from distributions ψ and Ψ, which defined over positive τ . We write T = (0 , η ] × (0 , η − t ] × (0 , η − t ] × . . .. . . × (0 , η − t j − ] × (0 , η − t j − ] ⊆ R j + , (20)where R j + is the j -dimensional space of positive real num-bers. The probability of observing the configuration inSupplementary Fig. 3(a) is Ψ( τ (cid:48) ) ψ ( τ ) . . . ψ ( τ j ), which isthe same as Ψ( t ) ψ ( t − t ) . . . ψ ( η − t j ) given that we’veset the time t to η for simplicity. Similarly, the configu-ration in Supplementary Fig. 3(b) is observed with prob-ability Ψ( τ (cid:48) ) ψ ( τ ) . . . ψ ( τ j − )Ψ( τ (cid:48)(cid:48) ), which is the same asΨ( t ) ψ ( t − t ) . . . ψ ( t j − t j − )Ψ( η − t j ). The weighted sum of all such configurations yields the probability ofobserving a j type edge undergoing a transition to state j + 1 over an interval dt , and the probability of randomlyselecting an edge in state j , as shown in SupplementaryFig. 3(a) and (b), respectively. The only difference is thefinal term, which is drawn either form ψ or Ψ. Withrespect to the domain T , these sums can be written (cid:90) T dt . . . dt j Ψ( t ) ψ ( t − t ) ψ ( t − t ) × . . .. . . × ψ ( t j − t j − ) ψ ( η − t j ) = Ψ ∗ ψ ∗ j , (21)and (cid:90) T dt . . . dt j Ψ( t ) ψ ( t − t ) ψ ( t − t ) × . . .. . . × ψ ( t j − t j − )Ψ( η − t j ) = Ψ ∗ ψ ∗ ( j − ∗ Ψ , (22)respectively. Here, ψ ∗ j is the j -th convolution power of ψ , and is discussed at length in following sections. Toobtain the rate µ j at which edges in state j transition tostate j + 1, Supplementary Eq. (21) must be normalisedby Supplementary Eq. (22), the probability E j that arandomly selected edge is in state j . Schematically, thiscorresponds to normalising the transition in Supplemen-tary Fig. 3(a) by those in Supplementary Fig. 3(b). Notethat ν j dt , the probability that an edge in state j forgetsan event over an interval dt is the same as µ j − undertime reversal, up to the normalising constant. As such,the rates µ j and ν j , along with the distribution E j , canbe written compactly as µ j dt = Ψ ∗ ψ ∗ j Ψ ∗ ψ ∗ ( j − ∗ Ψ dt (23)and ν j dt = Ψ ∗ ψ ∗ ( j − Ψ ∗ ψ ∗ ( j − ∗ Ψ dt, (24)with E j = Ψ ∗ ψ ∗ ( j − ∗ Ψ . (25)These quantities can be calculated either numerically oranalytically, depending on the tractability of the chosendistribution ψ . In general, if ψ ( τ ) is locally integrable,then the Laplace transform of ψ and Ψ exists and al-lows us to calculate the convolution as a product in thefrequency domain, which will be useful especially if j islarge. Since we don’t impose any cutoffs on ψ and Ψin the text, j indeed can grow arbitrarily large, underbursty dynamics. If we denote the Laplace transform of ψ ( τ ) by L { ψ ( τ ) } = ˆ ψ ( s ) = (cid:90) ∞ ψ ( τ ) e − sτ dτ (26)then the transform of Supplementary Eq. (19) can bewritten as ˆΨ( s ) = 1 − ˆ ψ ( s ) s . (27)Finally, we can argue that by induction from j = 0, andusing the fact that the distribution E is constant at sta-tionarity of the renewal process, that µ j E j = ν j +1 E j +1 ,meaning that along with E j , the rates µ j and ν j are sta-tionary. The observed experimental rates µ j , ν j and E j match exactly the predicted values, for increasingly largenetworks. Illustration using the exponential distribution
In this section we explicitly calculate the edge transi-tion rates µ j and ν j for an exponential interevent timedistribution ψ . This is an exercise to illustrate theLaplace inversion procedure, in general we calculate thesequantities numerically, as described in SupplementaryNote 5. The exponential distribution is an importantbenchmark in this work, and we consider two alternativegeneralisations, namely the gamma and Weibull distribu-tions. Both reduce to the exponential distribution when σ τ = (cid:104) τ (cid:105) = 1, and recover the rates given here. Considersuch a distribution with average (cid:104) τ (cid:105) defined by ψ ( τ ) = 1 (cid:104) τ (cid:105) e − τ/ (cid:104) τ (cid:105) (28)with Ψ( τ ) = e − τ/ (cid:104) τ (cid:105) , (29)having transforms ˆ ψ ( s ) = 1 (cid:104) τ (cid:105) s + 1 (30)and ˆΨ( s ) = 1 − ˆ ψs = (cid:104) τ (cid:105)(cid:104) τ (cid:105) s + 1 , (31)respectively. Substituting these transforms into Supple-mentary Eq. (25), and applying the convolution theoremallows us to calculate the expected size of the set of edgesin state j , which is also the normalising constant in therates µ j and ν j , as the expression for L { E j } simplifiesto to ˆΨ · ˆ ψ j − · ˆΨ = (cid:104) τ (cid:105) ( (cid:104) τ (cid:105) s + 1) j +1 . (32)We use the fact that L − { s j +1 } = η j /j !, for integer j ,a known Laplace transform relating to the gamma func-tion. We use also the translation property L − { ˆ ψ ( s + a ) } = e − a ψ , which directly results from the definitionSupplementary Eq. (26). The inverse Laplace transformof the above expression can then be written explicitly asΨ ∗ ψ ∗ ( j − ∗ Ψ = (cid:104) τ (cid:105) e − η/ (cid:104) τ (cid:105) j ! (cid:18) η (cid:104) τ (cid:105) (cid:19) j , (33)meaning E is simply the Poisson distribution with mean η/ (cid:104) τ (cid:105) . This is expected, due to our construction of thememory window, and the fact that an exponential in-terevent time distribution recovers a Poisson process.Similarly, the Laplace transform of the numerator in Sup-plementary Eqs. (23) and (24) can be used to calculateΨ ∗ ψ ∗ j = e − η/ (cid:104) τ (cid:105) j ! (cid:18) η (cid:104) τ (cid:105) (cid:19) j (34)and Ψ ∗ ψ ∗ ( j − = e − η/ (cid:104) τ (cid:105) ( j − (cid:18) η (cid:104) τ (cid:105) (cid:19) j − , (35)yielding µ j dt = 1 (cid:104) τ (cid:105) dt (36)and ν j dt = jη dt, (37)after normalising by Ψ ∗ ψ ∗ ( j − ∗ Ψ. In this special case ofexponentially distributed τ , we are able to derive theserates using much simpler arguments, namely with thedefinition of the Poisson process, and the Poisson distri-bution. Crucially, µ j has no j dependence here, whichclearly expresses the memoryless property of the Poissonprocess. These rates may be verified by simulating an en-semble of independent, stationary renewal processes, andobserving the flux in the system over an interval ∆ t . Theflow through the set E j over that interval, scaled by thesize of that set and the size of the measurement window,give the rates µ j and ν j . Alternatively, by simulating asingle renewal process for a sufficiently long time, andmeasuring its change in behaviour over each interval ∆ t ,one obtains rates µ j and ν j that are identical to thosecalculated in the ensemble. Convolution powers, an aside
In order to calculate the distribution of edge states E j , as well as the mean field edge transition rates µ j and ν j , we need an efficient method for computing con-volution powers. In general a convolution is defined . . . j − j j + 1 . . .ν j µ j Supplementary Figure 4.
Random walk interpretation ofedge state.
Positive and negative edge transition rates, µ j and ν j , act as a signature of the non-Markovianity in the re-newal process model. On a macroscopic level, this model isindistinguishable from a random walk as illustrated above,where the transition rates are provided by µ j and ν j by con-struction. In contrast, this equivalence is broken when takinginto account node dynamics on the level of classes ( k , m ), aswe shown in Supplementary Fig. 8. See also SupplementaryFig. 5 for illustrative values of the rates µ j and ν j for varying σ τ . for two real valued functions f and g over the domain f, g : ( −∞ , ∞ ) → R . However, in the case where f and g take non-negative values, as is the case in our study, theconvolution reduces to( f ∗ g )( t ) = (cid:90) ∞−∞ f ( τ ) g ( t − τ ) dτ = (cid:90) t f ( τ ) g ( t − τ ) dτ. (38)We use this to express edge-state properties, Supplemen-tary Eqs. (21) and (22), as convolutions over the non-negative reals. It is worthwhile noting the conventionthat if ψ ∗ j is the j -th convolution power, or ψ ∗ j = ψ ∗ ψ ∗ . . . ∗ ψ (cid:124) (cid:123)(cid:122) (cid:125) j terms , (39)then the zeroth order convolution, ψ ∗ = δ , is simply theDirac delta function, the identity of convolution. This isused in the j = 1 case of Eqs. 23, 24 and 25. Supplementary Note 2. Random walk equivalence
In this Supplementary Note, we discuss the expectedsteady-state network topology that emerges due to ourstochastic temporal model. In particular, we describethe existence of a pure Markovian system with identi-cal macroscopic dynamics to our non-Markovian renewalprocess model. We consider the dependence of the edgetransition rates µ j and ν j on our choice of interevent timedistribution ψ , and in particular, its parameterisation interms of standard deviation σ τ .We assume an arbitrarily large network consisting ofindependent, stationary renewal processes, such that thetime to the next event at any time t follows the resid-ual distribution Ψ( τ ). We illustrate in SupplementaryFig. 5 the edge transitions rates µ j and ν j that emergefrom a gamma interevent time distribution ψ ( τ ), for in-creasing values of standard deviation σ τ . Since µ j and ν j are heterogeneous, they can be interpreted as pro-viding a signature of the non-Markovianity inherent to the renewal process microscopically. For instance, when σ τ = (cid:104) τ (cid:105) = 1, the gamma distribution reduces to an ex-ponential distribution, corresponding to a Poisson pro-cess. That this process is memoryless is reflected in theedge transition rate µ j = 1 / (cid:104) τ (cid:105) , being homogeneous in j .The greater the departure from the Poisson process, thegreater the heterogeneity in j , illustrated in Supplemen-tary Fig. 5(a).A central result in this work is to note that foran uncorrelated ensemble of edges, the renewal pro-cess model is indistinguishable from a continuous-timeMarkov chain, namely, a one-dimensional biased randomwalk, as illustrated in Supplementary Fig. 4. In otherwords, the probability of a randomly selected edge under-going a transition over an interval dt in the renewal pro-cess model is trivially identical to a Markov chain whereby construction, transitions occur at rates µ j and ν j . In-deed, since all temporal network information is stored in µ j , ν j and E j in the master equation, any class of systemproducing a given set of µ j , ν j and E j values has the samepredicted dynamics. Since in the special case of a Markovchain transition rates are exact at all scales (both locallyon the scale of a single edge, and globally on the scaleof an uncorrelated ensemble), the master equation solu-tion is exact here. Since Supplementary Eqs. (23) and(24) represent a mean field approximation on the scale ofclasses ( k , m ) in the renewal process model, deviations inthe master equation solution emerge here. These errorsprovide a measure of the extent to which non-Markoviandynamics can be captured by the heterogenity of a simpleMarkov chain. Supplementary Note 3. Edge-state distribution
In this Supplementary Note we mention some basicproperties of the edge state distribution. First, for allchoices of ψ ( τ ) and η in this work, a useful conservedquantity is the expected edge state, or the expected num-ber of events per edge across the entire network. If (cid:104) τ (cid:105) is the mean of this distribution, and η is observer mem-ory, the expected edge state is (cid:104) E (cid:105) = η/ (cid:104) τ (cid:105) , and is usefulfor monitoring the accuracy of the implementation. Fur-ther, note that if E is the set of underlying edges in thenetwork, the superposition of |E| renewal processes con-verges to a exponential distribution with mean (cid:104) τ (cid:105) / |E| ,for large E . As such, we expect |E| dt/ (cid:104) τ (cid:105) events per timewindow dt in simulation. Temporal percolation transition
As discussed in the main text, the probability that arandomly selected edge is in state zero is ξ E . One canuse ξ E to determine whether active edges in the network,the set of all edges in state one or higher, are expected toform a giant component at any given time. If a giant com-ponent exists, we say that percolation has taken place. j l og µ j σ τ / h τ i log j l og ν j Supplementary Figure 5.
Illustration of edge transition rates.
Positive and negative edge transition rates, µ j and ν j ,respectively, for a gamma distributed interevent time with mean (cid:104) τ (cid:105) = 1. On a macroscopic level, our renewal process modelis indistinguishable from a random walk model of edge state, where the state j increments and decrements at rates µ j and ν j . Legend in (b) applies also to (a). Observer memory and mean interevent time are given by η = (cid:104) τ (cid:105) = 1, according toour stochastic temporal network model. The exponential distribution is recovered in the case of σ τ = (cid:104) τ (cid:105) = 1, producing amemoryless system, as indicated. If percolation does not occur, nodes form finite activeclusters whose size goes to zero in the limit of large net-works. This is true even if the underlying network itselfconsists of a giant component. The percolation transitionhelps to explain the sharp transition in dynamics in theupper right corners of Fig. 2 in the main text, separat-ing regimes of fast and slow information diffusion. Thepercolation condition for a configuration model networkwith degree distribution q ( k ) is given by ∞ (cid:88) k =0 k ( k − q ( k ) = 0 , (40)where q ( k ) = ∞ (cid:88) l ≥ k p ( l ) (cid:18) lk (cid:19) (1 − ξ E ) k ξ l − kE , (41)is the degree distribution obtained by randomly remov-ing a fraction ξ E of edges from a network with degree dis-tribution p ( k ), which is effectively the network inducedwhen removing edges in state zero in out temporal model. Maximum edges state
In our analytic solution, we introduce n to denote themaximum edge state j . In experiment, no such restric-tions are imposed when sampling the interevent time dis-tributions ψ and Ψ, in contrast to related work [13] whereit is common to introduces upper lower bounds on τ . Asa consequence, bursts in activity can lead to arbitrarilylarge edge states j . However, as we see in the structureof configuration space, if n is the maximum edge state al-lowed in the system, the number of equations grows like Θ( k n ). Clearly, in the interest of the numerical imple-mentation of the master equation solution, n cannot bearbitrarily large.It is noteworthy that despite edge state being unre-stricted in simulation, the resultant diffusion dynamicscan be accurately solved using relatively small values of n . Consider that large bursts are most common whenthe interevent time standard deviation σ τ is large. Inthis regime, the fraction of edges in state j = 0 is sig-nificant. As a consequence, nodes observing bursts ofactivity on some edges frequently observe no activity onothers. Indeed for large enough σ τ , it is rare for a node toobserve more than on active neighbour, with that activeneighbours generally being in a very large state j . Forthe relative threshold (RT) model, such a node is infectedwith high probability if the neighbouring node is active,since the threshold φ is guaranteed to be overcome here.Similarly, for the absolute threshold (AT) model, any in-fected neighbour in state j = (cid:100) M φ (cid:101) or higher is likely toadopt. Finally, for the susceptible-infected (SI) model,consider that the duration of a spike in activity is on theorder of η , i.e., the duration of observer memory. Sincethe infection rate increases proportionally to the size ofthe burst, a large burst leads to infection soon after itsobservation, and long before they have left the time win-dow. Since the local neighbourhood of the newly infectednode is likely sparse in this setting, a much smaller burstwould lead to an identical diffusion outcome. Effects suchas these mean that surprisingly low value of n are suffi-cient to accurately model the diffusion process. Supplementary Note 4. Monte Carlo simulation
In this Supplementary Note we discuss the MonteCarlo simulation methods used in this work. Since nodedynamics do not feed back into edge dynamics, we as-sume a steady state renewal process is in place at t = 0.This involves initialising the system with one τ drawnfrom the tail distribution Ψ. We start the simulation at t = − η , so that at time t = 0, the system is at steadystate. Parallel event sequences
Our model of node dynamics is Markovian, since anuninfected node v becomes infected at a constant rate F v . In the case of threshold models, the transmissionrate is a step function, whose upper value is set conven-tionally to one. A straightforward Monte Carlo simula-tion in this case is to advance in time with fixed intervals∆ t = N , where N is network size, and randomly select-ing a node v to trigger with probability F v at each step.This approach is unsuitable when transmission rates areunbounded, since the time step ∆ t cannot be rescaled apriori to preserve the random node selection approach.This is the case for the SI rule in our model. We define thetransmission rate here to be proportional to edge state F k , m = max( p, m · λ ), where λ = λ w , for a node inclass ( k , m ). Since m and λ are unbounded in our sim-ulations, as we impose no restriction on ψ and Ψ, burstsof activity due to our renewal process model can resultin arbitrarily high F .For this reason, we prefer an event-based Gillespie al-gorithm, which is equivalent, but advances in time byjumping to the next event , rather than by uniform incre-ments ∆ t . In the remainder of this section we discuss theimplementation of sequences of these events. Algorithm 1
Static network event sequence procedure MonteCarloStatic (Ξ v ) while Ξ v not empty do apply head of Ξ v end while end procedure For a graph G ( V , E ), we define two types of events,node events, defined over the node set V , and edge events,defined over the edge set E . A node event is implementedas the tuple { t v , v } , and edge events { t e , e uv , s } , respec-tively. Node events amount to the infection of node v , ata time t v , and edge events the positive or negative changein the state of edge e uv , depending on the sign of the in-dicator s = ±
1, according to our stochastic temporalnetwork model. Monte Carlo simulation is implementedas two time ordered, dynamic sequences of events, Ξ v andΞ e , for node and edge event types, respectively. WhileΞ e is independent of node dynamics, both node end edgeevents feed back and cause a potential reordering of Ξ v ,as explained below. The node event sequence is initiallyof size N = |V| , since we have one event for each node inthe network, with the leading event being removed when all preceding edge activity has been carried out, i.e., when t v < t e , for the leading terms in each sequence. The ac-tion carried out is best illustrated by considering a staticnetwork, Algorithm 1. Here, the apply head instructionmeans to trigger the node in question, remove it fromΞ v , and update the transmission rates of its neighbours,and in turn, their position in Ξ v . In our model, the edgeevent sequences has approximately constant size, apartfrom some fluctuations due to the fact that η/ (cid:104) τ (cid:105) is onlythe expected edge state, the absolute number of events in η -memory can go up and down. As networks increase insize, the size of the edge event sequence Ξ e converges to(2 + η/ (cid:104) τ (cid:105) ) |E| . Assuming that a non-zero level of noiseis present, p >
0, then the Ξ v will eventually be emptied. Algorithm 2
Temporal network event sequences procedure MonteCarloTemporal (Ξ v , Ξ e ) t v , t e ← dequeue Ξ v , Ξ e while Ξ v not empty do while t e < t v do apply head of Ξ e t v , t e ← dequeue Ξ v , Ξ e end while apply head of Ξ v t v ← dequeue Ξ v end while end procedure Algorithm 2 is the temporal extension of Algorithm 1.In each, the time of infection of every node v is deter-mined at t = 0. This is done by drawing from an expo-nential distribution with mean F v , i.e., taking the naturallogarithm of a uniform random variable on (0 , − F v . A node event is permanently erased from the se-quence if t v < t e , as mentioned. At this time, neighbours u of v have their transmission rates F u recalculated, asin Algorithm 1, as they now have an additional infectedneighbour. Additionally, the infection time of a node isrecalculated if a change in its local neighbourhood takesplace, such as activity on an adjacent edge, or the infec-tion of a neighbouring node. Since event sequences aretime ordered, a node event is first erased from its posi-tion in the sequence, and then reinserted when such acalculation takes place. No such reordering take placefor the edge event sequence, as once an event is insertedhere, it is only removed when its time t e is at the frontof the queue. While t e < t v , where the events in ques-tion are the leading events of each queue, there are twopossible actions for the apply head instruction for Ξ e inAlgorithm 2. The first, if s = +1, the leading edge eventis erased and replaced with two new events on the sameedge e uv , occurring at time t e + τ , where τ is drawn fromthe interevent time distribution ψ . At the same time, anevent is inserted for t e + τ + η , with s = −
1, correspond-ing to the decrementing of that same edge η time stepslater. If an s = − t e < t v , then the event is removed from0the sequence without replacement.In the case of static networks, Gillespie algorithms al-low massive speedup relative to the random selection ap-proach. This is not the case for the node dynamics inour temporal network model, where a Gillespie event se-quence affords very little speedup. Here, the edge activitysequence is a bottleneck, since it still requires |E| dt/ (cid:104) τ (cid:105) edge updates per dt on average. As such, it is not forthe benefits to speed that we use a Gillespie type eventsequence for node updates, it is to account for arbitrar-ily large spikes in node transmission rates, particularlyunder the SI rule for temporal networks. Inverse transform sampling of ψ and Ψ In this section we discuss the numerical pipeline usedto simulate renewal processes, a procedure that amountsto accurately sampling from the interevent time distribu-tions ψ ( τ ) and Ψ( τ ). Due to the large values of the stan-dard deviation σ τ to be examined in this work, currentlyavailable software could not be used to sample values of τ . While the excellent
Sampling probability distribu-tions.
Numerical construction of Ψ, a probability densityfunction that can be accurately evaluated at any point τ , atsome cost, for all distributions used in this work. This grid re-sults from iterating outwards from τ = (cid:104) τ (cid:105) = 1, for increasing τ until Ψ < (cid:15) , blue grid, and for decreasing τ until Ψ > − (cid:15) ,red grid. Inverse transform sampling is then performed on theinterval ( (cid:15), − (cid:15) ) to provide samples of ψ , using a bisectionmethod on the grid, to find the corresponding τ . A third orderspline interpolation on a logarithmic scale provides interme-diate values of τ . Additionally, the grid can be cumulativelysummed, and inverse transform sampling carried out on thespline of the resulting grid, to provide samples of Ψ. required to accurately determine Ψ, and we prefer GNUMPFR, a multiple-precision binary floating-point librarywith correct rounding. See Supplementary Note 9 fordetails regarding the distributions themselves.With such a grid accurately computed, we carry outthird-order spline interpolation on a logarithmic scale toallow rapid evaluation at arbitrary points τ within thegrid. As a first application, the spline can be used torapidly solve Ψ( τ ) = x using a bisection technique. Thisprovides samples τ of ψ . The grid can then be cumu-latively summed to provide the cdf of Ψ over the samedomain. Performing approximate inverse transform sam-pling on the resulting grid returns samples τ of Ψ.It is worthwhile mentioning that rejection samplingtechniques appear to be out of the question here. Thisis due to the extreme bounding values of τ necessary forthe standard deviations σ τ studied in our choice of heavy-tailed distribution. Even for clever choices of envelopingfunctions for ψ and Ψ, it is likely that the acceptancerates will be prohibitively low. Finding such functionsremains an interesting challenge, but since our bisectionapproach converges exponentially quickly, it is hard toimagine that a choice of envelope exists that makes re-jection sampling faster.Finally, we comment on experiments involving verylarge standard deviations σ τ . Here, not even large net-works running for a long time provide an unbiased sampleof ψ . Following the central limit theorem, the standarddeviation of the sampled mean interevent time equals σ τ / √ n s , if n s is the number of samples in question, whichfor the sake of argument is on the order of the numberof edges in the network. High quality experimental re-sults can be obtained by simulating large networks, or1by averaging over realisations. Note further that large σ τ experiments in our model happen to coincide withvery long simulation times. As a result, even for large σ τ there is little noise in our results due to the substan-tial runtime. Finally, ξ E plays a very important role,and amounts to the fraction of samples of Ψ that arelarger than η . A consequence of this is that for large σ τ ,most edges don’t even participate, having drawn residualtimes at t = 0 that are longer than the duration of theexperiment, determined by ρ = 1 − e − pt . Supplementary Note 5. Laplace transform inversion
In the following sections we describe the numericalpipeline for obtaining the rates µ j and ν j , as well as theedge-state distribution E j . Although these values couldbe calculated by hand for the case of the exponential dis-tribution, and maybe even the gamma distribution as itsLaplace transform is known, we would like to be able todo this numerically for arbitrary ψ ( τ ), including those forwhich the Laplace transform of ψ and Ψ are not known.These rates are expressed in terms of convolutions.Since we are interested in j -th order convolutions, for ar-bitrary positive integers j , we prefer to perform productsin frequency space, as permitted by the Laplace trans-form. A j -th order convolution for large j increases ex-ponentially in complexity, and can only be performeddirectly for very small j . Further, E j can be broad when σ τ is large. The Laplace transform is defined asˆ f ( s ) = L{ f } ( s ) = (cid:90) ∞ e − st f ( t ) dt, (42)where f is a real-valued function of time t , and ˆ f a com-plex valued function of the complex variable s = σ + iω .For the Gaver-Stehfest algorithm in the following section, s is always real, so we can set ω = 0 and writeˆ f ( s ) = (cid:90) ∞ e − σt cos ωtf ( t ) dt − i (cid:90) ∞ e − σt sin ωtf ( t ) dt. (43)This is necessary when the Laplace transform of f isnot know, and must be evaluated numerically, as is thecase for the lognormal and Weibull distributions, usedthroughout this work. There exists a useful frameworkthat unify these different algorithms [51], that expresseach approach in terms of a weighted sum of ˆ f values.We shall see that the computation of the forward Laplacetransform is the bottleneck, as opposed to finding theweights. As such, the preferred method depends on howmany numerical inversions of f are required. This is 2 M in the Gaver-Stehfest algorithm, 2 M + 1 in the Euler al-gorithm and M in the Talbot algorithm, with M control-ling the desired accuracy (see [51] for details). However,only the real integral need be found in the Gaver-Stehfestalgorithm, which we prefer for this reason. Gaver-Stehfest algorithm
We assume a Laplace transform ˆ f is known, and thatwe wish to recover the origin unknown function f . In ourcase, f corresponds to convolutions of ψ and Ψ. For any t > M , so-called Salzer summationyields the Gaver-Stehfest inversion formula, f ( t, M ) = ln(2) t − M (cid:88) k =1 ζ k ˆ f ( k ln(2) t − ) (44)where the weights ζ k are given by ζ k = ( − M + k M ! min( k,M ) (cid:88) j = (cid:98) ( k +1) / (cid:99) j M +1 (cid:18) Mj (cid:19)(cid:18) jj (cid:19)(cid:18) jk − j (cid:19) . (45)Conveniently, the weights ζ k are independent of thetransform being inverted. This means that after a preci-sion M is chosen, weights can be stored in a vector ζ ofdimension 2 M to be reused for a number of transformsˆ f . We discuss the numerical implementation further infollowing sections, however mention here that we use theGNU Multiple Precision arithmetic library GMP for ζ k ,in particular its integer summand, and the related MPFRlibrary for floating point arithmetic for manipulating ˆ f .Regarding the weights ζ k , a useful property for bench-marking is the fact that for all M ≥ M (cid:88) k =0 ζ k = 0 , (46)due to the ( − M + k factor in the definition of ζ k , theseweights oscillate around zero. The Gaver-Stehfest algo-rithm requires a system precision of approximately 2 . M tracked bits, which we input to the constructors of vari-ables in MPFR. Numerical pipeline
We now describe specifically how the inversion proce-dure in the previous section can be used to determinethe values E j , µ j and ν j described in preceding sections.For simplicity, we refer to these quantities in this sectionusing vectors E , µ and ν , whose dimension is determinedby the size of the edge state space n in our master equa-tion, which need not be determined a priori . Since E and µ are special cases defined in the Methods sectionof the main text, we write E = ( E , E , . . . , E n ) T for theedge state distribution, and µ = ( µ , µ , . . . , µ n ) T and ν = ( ν , ν , . . . , ν n ) T for positive and negative edge tran-sitions. The function f ( t, M ) in the Gaver-Stehfest algo-rithm corresponds to E j ( η, M ), µ j ( η, M ) and ν j ( η, M ),where M as before is a parameter of the Gaver-Stehfestalgorithm that tunes the accuracy of the approximation.2Calculating the distribution E , and the rate vectors µ and ν amounts to three separate matrix vector products.We require the vector ζ , of dimension 2 M , as per theGaver-Stehfest algorithm whose k -th element is given bySupplementary Eq. (45), and the Laplace transforms of ψ and Ψ evaluated at s = k ln(2) t − , denoted ψ k andΨ k , as per Supplementary Eq. (27). We define n as themaximum edge state, which can be as large as one likeshere, it is limited only by M , which must be increased forincreasing n . These quantities are then used to populatethe n × M dimensional matrices ˆ E , ˆ µ and ˆ ν , whose jk -th elements are given by[ ˆ E ] jk = ˆΨ k · ˆ ψ j − k · ˆΨ k (47a)[ˆ µ ] jk = ˆΨ k · ˆ ψ jk (47b)[ ˆ ν ] jk = ˆΨ k · ˆ ψ j − k , (47c)with 1 ≤ j ≤ n and 1 ≤ k ≤ M . Determining thedistribution of edge states E , and the edge transitionrates µ and ν , then amounts to the matrix-vector productof ˆ E , ˆ µ and ˆ ν , respectively, with ζ . As an illustration,the edge state distribution is given by E = E E ... E n = ˆΨ · ˆ ψ · ˆΨ ˆΨ · ˆ ψ · ˆΨ . . . ˆΨ M · ˆ ψ M · ˆΨ M ˆΨ · ˆ ψ · ˆΨ ˆΨ · ˆ ψ · ˆΨ . . . ˆΨ M · ˆ ψ M · ˆΨ M ... ... . . . ...ˆΨ · ˆ ψ n − · ˆΨ ˆΨ · ˆ ψ n − · ˆΨ . . . ˆΨ M · ˆ ψ n − M · ˆΨ M ζ ζ ... ζ M . (48)These quantities are calculated once at the start of theRunge-Kutta solution of the corresponding system, anddon’t change otherwise. Note that results have to bescaled by ln(2) t − as per the definition of f ( t, M ) in theGaver-Stehfest algorithm, and µ j and ν j normalised by E j , to adhere to their definitions. The complete systemto be solved is E = ˆ E ζ (49a) µ = ˆ µ ζ (49b) ν = ˆ ν ζ . (49c)Examples of µ and ν values are provided in Supplemen-tary Fig. 5 for the case of a gamma distribution ψ .Implementation of the Gaver-Stehfest algorithm inC++ created confusion for some time, not because of thecalculation of the weights ζ k , but in its product with ˆ f .The authors of [51] state that while it is clearly requiredfor the weights ζ k , arbitrary precision is not required forhandling ˆ f . However, we find that double precision isnot sufficient in general for ˆ f , which can be confirmedby using double values in C++, or equivalently, settinga precision of 53 bits in a arbitrary precision library.We use a combination of the GMP library for the bi-nomial coefficients, factorial, and exponential in Supple-mentary Eq. (45). Division is performed after conversionto MPFR. Similarly, ˆ f in Supplementary Eq. (44) is de-termined using MPFR, whether the Laplace transformis known in closed form, or if it is calculated from itsintegral definition. Supplementary Note 6. Diffusion speed and noise
In the Methods section of the main text, we define thefraction of infections that are due to noise as ρ f , andthe spreading time relative to the pure-noise case as t f .As shown in the inset of Fig. 1(a) of the main text,these quantities are very close to interchangeable. Thatis, any result expressed in terms of t f produces an almostidentical picture in ρ f , and vice versa.As a concrete illustration we replot Fig. 2 of the maintext in terms of ρ f , expressed there in terms of relativespreading time t f . Results are shown in SupplementaryFig. 7. Upon inspection we note the landscape of ρ f is almost identical to that for t f , albeit with slightlymore variance in the relative noise measurement com-pared to spreading time, as seen in the reduced sharp-ness of the colours. In particular the percolation transi-tion, indicated by the dashed white line in SupplementaryFig. 7(c), remains accurate. As expected, the slowest dif-fusion implies a complete dependence on external noise,as seen in the annealed regime of each plot where ρ f ≈ /ρ f , the multiplicative effect of external noise. This givethe number of total infections for every noise-induced in-fection, and is as large as 10 , as seen in the quenchedregime of Supplementary Fig. 7.3 Supplementary Note 7. Mean field approximation
In this Supplementary Note we examine the effective-ness of the mean field assumption of edge state transi-tions. The edge transition rates calculated in this workassume an infinitely large ensemble of stationary, uncor-related renewal processes. We expect a carefully imple-mented stochastic temporal network to agree preciselywith theory, in the limit of large networks, which is whatwe confirm. In particular, by viewing the network as asingle ensemble of edges, flux measurements indeed showtheoretical results for E j , µ j and ν j to be exact on a network wide scale. However, our analysis involves par-titioning our network into 2 × | C | classes, namely, an in-fected and uninfected variant for each of the | C | classes( k , m ) allowed by the system. Clearly, the densities ofnodes over these classes, and the transitions betweenthem are highly dynamic, emptying and filling over thecourse of a spreading process. Further, transition ratesare heterogeneous, particularly with respect to the trans-mission rate F k , m which varies from class to class. Themean field approximation is to assume that edge tran-sition rates for individual classes are the same as for acompletely uncorrelated ensemble.The reason that we expect differences in edge transi-tion rates over these scales is as follows. The emergentrates µ j and ν j result from certain assumptions regard-ing edge statistics at a microscopic level. This is formu-lated in terms of the history distribution, or distributionof tuples ( τ , τ , . . . , τ j ) within each η window, on eachedge across a given set. This is illustrated in Supplemen-tary Fig. 3. Expressed in these terms, the mean field ap-proach is to assume that the history distribution within aparticular class ( k , m ) is completely uncorrelated, and isequivalent to any randomly sampled subset of edges, orindeed the network edge set as a whole. To see how thisassumption may break down, consider the flow of unin-fected nodes through C as a survival process, whereby anode enters an uninfected class ( k , m ), only exiting and (a) RT-2 -1 0 1 2 3-4-3-2-10123 log σ τ / h τ i l og η / h τ i -3 -2 -1 0 log ρ f (b) AT-2 -1 0 1 2 3 log σ τ / h τ i -3 -2 -1 0 log ρ f (c) SI annealedquenched-2 -1 0 1 2 3 log σ τ / h τ i -4 -3 -2 -1 0 log ρ f Supplementary Figure 7.
Relative density heat maps.
Same experiment as for Fig. 2 in the main text, but plottingrelative density of infections that are due to external noise, ρ f , rather than the normalised spreading time, t f . Results arealmost identical, with the percolation transition preserved forboth ρ f and t f . See caption in Fig. 2 of the main text forsimulation details. reentering an adjacent uninfected class if it does not be-come infected in the meantime. To further simplify thispicture, consider ( k , m ) to have F k , m = 1, with neigh-bouring uninfected classes having F k , m = 0, as may occurwith a threshold model of infection. In such a survivalprocess, it is nodes with a history distribution ( τ , . . . , τ j )with short intervals in ( k , m ), that are favourable to sur-vival, exiting to adjacent uninfected classes. History dis-tributions leading to long waiting times in ( k , m ) aremore likely to become infected, with survival times fol-lowing an exponential distribution with mean F k , m . Assuch, the history distribution of nodes exiting the classvia infections are different to those that survive, andexit via edge or neighbour transitions. Mechanisms likethis, and the related effect due to neighbour transitions,gradually transform the history distribution of individualclasses, resulting in deviations in the emergent edge tran-sition rates µ and ν . Although the assumption is brokenon the scale of individual classes, it is of course preservedwhen taking all classes together.We confirm this effect in an experiment whose resultsare shown in Supplementary Fig. 8. Here, we simu-late a spreading process, and carefully record the den-sities of nodes in each class ( k , m ) at all times t , as wellthe flows between classes over measurement windows oflength ∆ t = 0 .
02. To allow ∆ t to be as small as possible,and approximate the dt in our analytics, we constrain thesystem to the smallest non-trivial configuration space C .To this end, the degree distribution is 3-regular random,and ψ given by a power law with α = 2 .
5, with lowerand upper cutoffs of τ = 1 .
01 and 50, respectively. Bychoosing η = 1 < .
01, we ensure a two-level edge statespace, where edges are in state j = 0 or 1. The resultingconfiguration space has size | C | = 20, is connected, andresembles a smaller version of Supplementary Fig. 1. Bysetting network size as large as possible, here N = 10 ,the node set is diluted as little as possible over C . Weplot the flux measurements of the uninfected class withdegree vectors k = (2 , T and m = (1 , T . Node dy-namics follow a relative threshold rule with φ = 0 .
4, andbackground noise causing infection at a rate p = 2 × − .As such, the transmission rate for the class in question is F k , m = p .Ego transition measurements are shown in Supple-mentary Fig. 8(a). Since the class initially has density s k , m = 0, with the network initialised to ρ = 0 at t = 0,initial measurements show only a handful of ego transi-tions per ∆ t up to around t = 10, visible here thanksto the logscale. As expected, the rate of ego transitionclosely agrees with the measured value of p = 2 × − ,verified by scaling the total set density by F k , m s k , m ∆ t ,given by the solid black curve. This transition is guar-anteed to agree with theory if the Gillespie algorithm iscorrectly implemented, and serves as a useful benchmarkin flux measurement experiments. Further, neighbourtransition rates are verified in Supplementary Fig. 8(b)and (c), for edges of type j = 0 and 1 respectively. Time-dependent rates β j are calculated as per Supplementary4 -8-7-6-5-4 l og ( • · s k , m ∆ t ) F k , m flux β k − m flux β k − m flux -8-7-6-5-4-3-2 t l og ( • · s k , m ∆ t ) µ k − m flux t µ m flux t ν k − m flux (a) Ego (b) Neighbour, I (c) Neighbour, II(d) Positive edge, I (e) Positive edge, II (f) Negative edge, I Supplementary Figure 8.
Examination of mean field assumption for class transition rates.
Flux measurement for theset S k , m with k = (2 , T and m = (1 , T during Monte Carlo simulation. Network size is N = 10 , with a 3-regular randomdegree distribution. Background noise is p = 2 × − , with a relative threshold of φ = 0 .
4. Interevent time distribution ψ ispower law, with α = 2 . τ = 1 .
01 and 50. Memory duration is η = 1. Flux is measured over intervals of ∆ t = 0 . s k , m scaled by the indicated theoretical coefficients, shown as solid lines. Eq. (10), using the set of | C | = 20 empirical densities s k , m . Scaling s k , m for the class in question by β ( k − m )and β ( k − m ) shows remarkable agreement with mea-sured fluctuations.Flux measurements of positive edge transitions for un-infected and infected neighbours are shown in Supple-mentary Fig. 8(d) and (e), respectively. While agree-ment is excellent in (d), a clear deviation emerges in (e).Although the disagreement appears minor on a logarith-mic scale, the theoretical µ is off by roughly 20%. Therate of negative edge transition in (f) is in good agree-ment with theory. A study of the complete state spaceshows that deviations as in (e) are common, but not sys-tematic, with mean field estimates µ and ν sometimesoverestimating, and sometimes underestimating the mea-sured fluxes. The master equation solution of ρ for thisexperiment, not shown here, miscalculated the overallspreading speed by about 15%. It is likely that the smallconfiguration space here contributed to the error. Inmuch larger systems, like in the main text, this effect islikely diluted, especially since µ and ν do not systemati-cally over or underestimate the class-level edge transitionrates. That is, a cancellation effect might emerge.We confirm that the mean field assumption breaksdown due to heterogeneities in transmission rates bystudying a purely noise driven variant of the above ex-periments. That is, F k , m = p for all classes ( k , m ). Thesame flux measurements, not shown here, are in perfectagreement with theory in such a setting. Further, therelative set sizes for constant m rows of configurationspace are exactly what one would expect given a degreedistribution p k , and a probability distribution E j that arandomly selected edge is in state j . Supplementary Note 8. Skewness and entropy
In this Supplementary Note we examine the skewnessand differential entropy of the interevent time distribu-tions used in this work, namely the lognormal, Weibulland gamma distributions. Our goal is to understandhow these distributions differ after carefully controllingfor their mean and standard deviation, µ and σ respec-tively, and to this end propose skewness and entropy asmeasures that can be understood as ranking various dis-tributions ψ by the value of their effective sparsity ξ E ,discussed in the main text.We control for both the mean and standard deviationwhen comparing distributions in this work. For a randomvariable X , this is equivalent to controlling the first andsecond raw moments, E [ X ] = µ and E [ X ] = µ + σ . Wewish to examine the differences that remain between ourchosen distributions after imposing these constraints. Tothis end, it is logical to examine the third raw moment E [ X ]. This is usually done by calculating the skewness γ , typically defined as the third standardised moment, or E [( X − µ ) ] normalised by σ . The normalisation rendersthe moment scale invariant, meaning the information en-coded in the skewness relates only to the “shape” of thedistribution in question, and not its variance. Since we’reinterested in comparing skewness for constant mean andstandard deviation, we use the expression γ = E [( X − µ ) ] σ = E [ X ] − µσ − µ σ . (50)Clearly, skewness γ differs from one distribution to an-other only in the third raw moment E [ X ], since we keep µ and σ constant. Plotting skewness γ as a function of σ for constant µ = 1 leads to the plot in SupplementaryFig. 9, for the lognormal, Weibull and gamma distribu-5 (a) (b) (c) (d)0 1 2 3 4 5 6 7 80510152025 log σ l og γ f ( x ) log σ l og E [ X ] lognormalWeibullgamma − − − −
200 log σ h -1 0 1 2 3 4 5 6 7 800.20.40.60.81 log σ ξ E Supplementary Figure 9.
Higher order moments of probability distributions.
Comparing skewness and entropy ofprobability distributions as a function of standard deviation. Mean is fixed to µ = 1. (a) The third raw moment E [ X ], andskewness γ in (b), for the lognormal, Weibull and gamma distributions. The third raw moment is the only term that differs inthe calculation of skewness. (c) Entropy follows the same trend as (a) and (b). We are interested in the connection betweenthese quantities and ξ E in (d), defined in the main text, with η = 1. tions. The large differences in skewness here correspondto observations regarding effective sparsity ξ E in earlierparts of this work. As such, the skewness of a distribu-tion provides a good rule-of-thumb for comparing ξ E fordifferent distributions.In addition, we calculate the differential, or informa-tion entropy for each distribution, defined as h = − (cid:90) ∞ f ( x ) ln f ( x ) dx, (51)for distributions defined for x ∈ (0 , ∞ ), as is the casefor the lognormal, Weibull and gamma distribution. Themotivation for considering the differential entropy is theobservation that it was the lognormal distribution, themaximum entropy distribution for which the mean andvariance of ln( X ) are specified, that produced the fastestdiffusion times in the main text.As shown in Supplementary Fig. 9, the relative valuesof skewness and entropy agree qualitatively with expec-tation. Simply put, the relative values of skewness γ ,shown in plot (b), and differential entropy h , shown inplot (c), agree qualitatively with the relative values of ef-fective sparsity ξ E , shown in plot (d). This supports therule-of-thumb that the greater the skewness and entropy,the lower the effective sparsity. Supplementary Note 9. Probability distributions
In this Supplementary Note we detail the probabilitydistributions used in this work, in particular those usedfor the interevent time distribution ψ , and its tail dis-tribution Ψ. For simplicity, the notation used in thisSupplementary Note is entirely self contained, and weassociate pdfs f ( x ) with ψ ( τ ), and their cdfs F ( x ) with1 − Ψ( τ ). Lognormal distribution . The lognormal distribution isdefined with respect to an underlying normal distributionwith mean ˜ µ and variance ˜ σ . These are related to thelognormal mean and variance µ and σ by the relations given in Supplementary Table I, and can be straightfor-wardly inverted. Provided µ and σ , it has the largestskewness γ of all distributions studied here, as well asthe largest differential entropy h . This is not surprising,given that the lognormal can be derived using maximumentropy principles, as discussed in the previous section,and illustrated in Supplementary Fig. 9. Gamma distribution . The gamma distribution is re-lated to the gamma function, Γ( s ), described below. Theexpressions for its mean and variance Supplementary Ta-ble I can be easily inverted, resulting in expressions for α and β the provide the desired moments. Note that we set µ = 1 for the temporal network experiments in this work,meaning that when the shape parameter α = β = 1, co-inciding with σ = 1, we recover the exponential distribu-tion. For values α <
1, meaning σ >
1, the qualitativeshape of the exponential is maintained, with an increas-ingly heavy tail. In contrast, when α >
1, meaning σ < f goes to zeroin the small x limit. For large α , meaning small σ , thegamma distribution tends to the Dirac delta function. InSupplementary Fig. 9 we observe that the gamma distri-bution is the least skewed of the distributions consideredhere, for comparable µ and σ . Entropy is given using ψ ,the digamma function, defined in the following sectionunder the Weibull distribution. Variants of the gamma function . We describe here thenecessary approximations in order to numerically con-struct the cdf of the gamma distribution. Unfortunately,the cdf as stated here is circular, with no closed formexpression available. In this section, we discuss a num-ber of series expansions that are necessary to numericallyevaluate γ ( α, xβ ), the lower incomplete gamma function.The gamma function Γ( s ) is defined, and related to itsincomplete variants, asΓ( s ) = (cid:90) ∞ t s − e − t dt (52a)= (cid:90) x t s − e − t dt + (cid:90) ∞ x t s − e − t dt (52b)= γ ( s, x ) + Γ( s, x ) . (52c)6 Supplementary Table I.
Properties of probability distributions.
Properties of the two-parameter probability distributionsused to model interevent time in this work. For each we note the probability density function f X ( x ), the cumulative densityfunction F X ( x ), the mean µ and variance σ , as well as the third moment E [ X ] and skewness γ , and differential entropy h .lognormal † Weibull ‡ gamma f X ( x ) 1 x √ π ˜ σ exp (cid:20) − (ln x − ˜ µ ) √ π ˜ σ (cid:21) kλ (cid:16) xλ (cid:17) k − e − ( x/λ ) k α ) β α x α − e − xβ F X ( x ) 12 + 12 erf (cid:18) ln x − ˜ µ √ σ (cid:19) − e − ( x/λ ) k α ) γ ( α, xβ ) µ ln (cid:32) µ (cid:112) µ + σ (cid:33) λ Γ(1 + 1 /k ) αβσ ln (cid:18) µ + σ µ (cid:19) λ (cid:2) Γ(1 + 2 /k ) − Γ (1 + 1 /k ) (cid:3) αβ E [ X ] (cid:0) µ + σ (cid:1) µ λ Γ(1 + 3 /k ) µ + 3 µσ + 2 σ µγ µ σ + σ µ - 2 σµh ˜ µ + ln(2 πe ˜ σ ) (1 − /k ) γ e + ln ( λ/k ) + 1 α + ln( α Γ( α )) + (1 − α ) ψ ( α ) † Note that the provided mean and variance correspond to the underlying normal distribution with mean ˜ µ and variance ˜ σ ‡ We omit skewness since it doesn’t simplify like the other distributions. Further, γ e is the Euler-Mascheroni constant. In other words, the upper and lower incomplete gammafunctions are defined by partitioning the integral accord-ing to (0 , ∞ ) = (0 , x ] ∪ [ x, ∞ ), given by γ ( s, x ) andΓ( s, x ), respectively. Sampling from the lower incompletegamma function will be crucial when initialising our tem-poral network system at steady state. The choice of seriesapproximation for the lower-incomplete gamma functiondepends on the shape parameter α , and in turn the stan-dard deviation σ . For small α , meaning large σ , we use γ ( s, x ) = e − x x s Γ( s ) ∞ (cid:88) n =0 x n Γ( s + n + 1) . (53)This well known approximation, although efficient forlarge values of standard deviation, becomes prohibitivelyslow for very large values of α , meaning very small valuesof σ . At this scale, specifically when α > σ <
1, weapproximate the lower-incomplete gamma function as γ ( s, x ) = e − x x s ∞ (cid:88) n =0 x n s n +1 (54)which can be computed recursively using a small numberof multiplication and division operations at each step.Unfortunately this approximation fails for large σ , andmust be only be used in the small σ , which we do purelyfor efficiency, since the preceding approximation con-verges everywhere. Here, s n +1 is the Pochhammer sym-bol. Weibull distribution . Closely related to the gammadistribution is the Weibull distribution. As can be seen in Supplementary Table I, the Weibull distribution can-not be easily parameterised by its mean and standarddeviation, as was the case for the lognormal and gammadistribution. The tuple ( k, λ ) providing desired mean andstandard deviation can be found using gradient descent.Note that like the gamma distribution, when k = λ = 1,we recover the exponential distribution. Like α in thegamma distribution, k controls the shape, with large k tending towards the Dirac delta function, and small k anincreasingly right skewed, heavy tailed form. Tuning parameters using gradient descent . We couldnot find a reference to address the problem of parameter-ising the Weibull distribution by its mean µ and standarddeviation σ . Not finding an existing solution to this prob-lem, we find the values of k and λ giving the desired µ and σ using gradient descent. In the following calcula-tions, we actually use the variance denoted by ν = σ for simplicity, given the form of Supplementary Eq. ( ?? ).We find k and λ by locating the minimum of the losssurface defined by the function l ( µ i , ν i ) = (cid:18) µ i − µµ (cid:19) + (cid:18) ν i − νν (cid:19) , (55)where µ i = µ i ( k i , λ i ) and ν i = ν i ( k i , λ i ), as per Supple-mentary Table I, are the values of the mean and varianceat the i -th step of the procedure, and are continuous vari-ables here. The values µ and ν are the target, and areconsidered constant in the following. As such, µ i and ν i are continuous variables. Due to the nature of the ex-periments in the main text, it is crucial to normalise therelative error in each term, by µ and ν respectively. This7is because ν varies over orders of magnitude, while µ re-mains fixed. For the same reason, the displacement at each step of the algorithm is determined on logarithmicscales. We perform this optimisation once, and store theresulting ( µ, ν, k, λ ) tuple in a table for reuse.The gradient of l ( µ i , ν i ) is ∇ l ( µ i , ν i ) = ∂ k l ( µ i , ν i ) ˆ k + ∂ λ l ( µ i , ν i ) ˆ λ , (56)with k and λ components given by ∂ k l ( µ i , ν i ) = 2 (cid:18) µ i − µµ (cid:19) µ ∂ k µ i + 2 (cid:18) ν i − νν (cid:19) ν ∂ k ν i (57)and ∂ λ l ( µ i , ν i ) = 2 (cid:18) µ i − µµ (cid:19) µ ∂ λ µ i + 2 (cid:18) ν i − νν (cid:19) ν ∂ λ ν i , (58)with partial derivatives ∂ λ µ i = Γ(1 + 1 /k ) , (59) ∂ k µ i = − λk Γ(1 + 1 /k ) ψ (1 + 1 /k ) , (60)and ∂ λ ν i = 2 λ (cid:2) Γ(1 + 2 /k ) − Γ (1 + 1 /k ) (cid:3) , (61) ∂ k ν i = − λ k (cid:2) Γ(1 + 2 /k ) ψ (1 + 2 /k ) − Γ (1 + 1 /k ) ψ (1 + 1 /k ) (cid:3) , (62)respectively. Here ψ is the so-called digamma function,which was previously required for the calculation of theentropy of the gamma distribution. It is defined as thelogarithmic derivative of the gamma function, or ψ ( z ) = ddz ln(Γ( z )) = Γ (cid:48) ( z )Γ( z ) . (63)It can be found using a simple series approximation, ψ ( z ) = − γ + ∞ (cid:88) n =0 (cid:18) n + 1 − n + z (cid:19) , (64)for z (cid:54) = − , − , − , . . . , where γ here denotes the Euler-Mascheroni constant. Although the series is infinite, inpractice it converges quite rapidly, even when applyingstrict thresholds. Along with an inputted step size, thisprovides the next point ( µ i , ν i , k i , λ i ) in the procedure.Adaptive step size is incorporated in gradient descent,where step size h i = l i if l i < .
1, and h i = 0 . k i , λ i . Stochasticgradient descent techniques may be incorporated to im-prove the convergence rate.Note that a multiple precision library appears to be necessary even when the desired precision in the loss func-tion l not beyond the bounds of default machine preci-sion, such as double and long double . As always, thisinduces a cost in terms of computation time, so we cal-culate all the tuples ( µ, σ, k, λµ, σ, k, λ