[PDF] Dynamics of cascades on burstiness-controlled temporal networks

Abstract

Burstiness, the tendency of interaction events to be heterogeneously distributed in time, is critical to information diffusion in physical and social systems. However, an analytical framework capturing the effect of burstiness on generic dynamics is lacking. We develop a master equation formalism to study cascades on temporal networks with burstiness modelled by renewal processes. Supported by numerical and data-driven simulations, we describe the interplay between heterogeneous temporal interactions and models of threshold-driven and epidemic spreading. We find that increasing interevent time variance can both accelerate and decelerate spreading for threshold models, but can only decelerate epidemic spreading. When accounting for the skewness of different interevent time distributions, spreading times collapse onto a universal curve. Our framework uncovers a deep yet subtle connection between generic diffusion mechanisms and underlying temporal network structures that impacts on a broad class of networked phenomena, from spin interactions to epidemic contagion and language dynamics.

Full PDF

DDynamics of cascades on burstiness-controlled temporal networks

Samuel Unicomb, ∗ Gerardo I˜niguez,

2, 3, 4

James P. Gleeson, and M´arton Karsai

2, 1 Universit´e de Lyon, ENS de Lyon, INRIA, CNRS, UMR 5668, IXXI, 69364 Lyon, France Department of Network and Data Science, Central European University, A-1100 Vienna, Austria Department of Computer Science, Aalto University School of Science, FI-00076 Aalto, Finland IIMAS, Universidad Nacional Auton´oma de M´exico, 01000 Ciudad de M´exico, Mexico MACSI and Insight Centre for Data Analytics, University of Limerick, Limerick V94 T9PX, Ireland

Burstiness, the tendency of interaction events to be heterogeneously distributed in time, is criticalto information diﬀusion in physical and social systems. However, an analytical framework capturingthe eﬀect of burstiness on generic dynamics is lacking. We develop a master equation formalism tostudy cascades on temporal networks with burstiness modelled by renewal processes. Supported bynumerical and data-driven simulations, we describe the interplay between heterogeneous temporalinteractions and models of threshold-driven and epidemic spreading. We ﬁnd that increasing in-terevent time variance can both accelerate and decelerate spreading for threshold models, but canonly decelerate epidemic spreading. When accounting for the skewness of diﬀerent interevent timedistributions, spreading times collapse onto a universal curve. Our framework uncovers a deep yetsubtle connection between generic diﬀusion mechanisms and underlying temporal network struc-tures that impacts on a broad class of networked phenomena, from spin interactions to epidemiccontagion and language dynamics.

Temporal networks provide a representation of real-world complex systems where interactions between com-ponents vary in time [1–3]. Although they were initiallymodelled as Poisson processes, where independent eventsare homogeneously distributed in time, real-world net-work interactions have been found to be heterogeneouslydistributed and to exhibit temporal correlations [4–6].In particular, interaction events in real systems concen-trate within short periods of intense activity followed bylong intervals of inactivity, an eﬀect known as burstiness.Bursty dynamics appear in diverse physical phenomenaincluding earthquakes [7] and solar ﬂares [8], biologicalprocesses like neuron ﬁring [9], and even the dynamics ofhuman social interaction [5, 10].Burstiness in temporal interactions has profound im-plications for the diﬀusion of information over tempo-ral networks, as demonstrated in a growing number ofworks [11–17]. This is true in the case of epidemic pro-cesses, often referred to as simple contagion, where theprobability of infection of an uninfected node dependslinearly on the number of exposures, i.e., temporal in-teractions with infected neighbours in the network [18].Epidemic models successfully describe the spread of bi-ological disease [19], and have been shown to criticallydepend on burstiness and other patterns of temporal in-teractions [12, 20–23]. Epidemic spreading over tempo-ral networks appears to be slowed due to burstiness insome cases [11, 24–26], while accelerated in others [27].Threshold mechanisms provide another class of phenom-ena where bursty temporal networks play a crucial role.Threshold dynamics, also known as complex contagion,are used to model the spread of information where infec-tion requires the reinforced inﬂuence of at least a certainfraction of neighbours in the network [28]. Threshold ∗ [email protected] driven dynamics over static networks have been exten-sively studied both empirically [29] and theoretically [29–33], but analysis of their behaviour on temporal networkshas been limited to a small number of empirical stud-ies [34–37]. Here we propose an analytical framework tosystematically describe the relationship between the dif-fusion of information and bursty temporal interactions,thus providing the theoretical foundation necessary touncover the role of burstiness in generic diﬀusion pro-cesses, including simple and complex contagion modelsof physical, biological and social phenomena.We incorporate the most widely documented featuresof temporal interactions into a framework of binary statedynamics and benchmark its behaviour with standardmodels of threshold driven and epidemic spreading. Al-though stochastic bursty interactions are likely emergentphenomena [4, 38], their dynamics are well approximatedby renewal processes [39]. Temporal heterogeneity innetwork interactions can then be characterised by thevariability in interevent times τ (the time between con-secutive events on a given edge), parameterised by theinterevent time distribution ψ ( τ ), while other featuresof the temporal network are considered maximally ran-dom. Renewal processes represent the simplest model ofbursty, non-Markovian dynamics, and a departure fromthe memoryless assumption implicit in Poisson processes.Nevertheless, we are able to show that such a systemcan be accurately captured by a master equation formal-ism, which is essentially memoryless, implying the exis-tence of a purely Markovian system with almost identicalbehaviour. We show both analytically and numericallythat bursty temporal interactions give rise to a percola-tion transition in the connectivity of the temporal net-work, separating phases of slow and rapid dynamics forboth epidemic and threshold models of information diﬀu-sion. We ﬁnd that diﬀusion dynamics are sensitive to thechoice of interevent time distribution, particularly in re- a r X i v : . [ phy s i c s . s o c - ph ] J u l gard to its skewness, and we demonstrate a data collapseacross distributions when controlling for this eﬀect. Temporal network model.

To model a temporal net-work, we consider an undirected, unweighted static net-work of N nodes as the underlying structure, which actsas a skeleton on top of which temporal interactions takeplace. The degree of a node (how many neighbours it has)takes discrete values k = 0 , . . . , N − p k . Pairwise temporal interactions, or events ,occur independently at random on each static edge via arenewal process with interevent time distribution ψ ( τ ).Time is continuous and events are instantaneous, whileconsecutive interevent times are uncorrelated. We alsoassume that the renewal process is stationary (for fur-ther details see Methods and Supplementary Note 4).By using a static underlying network, we assume the timescales of edge formation and node addition or removal arefar longer and thus negligible relative to the time scaleof event dynamics over existing edges.In its simplest form, information diﬀusion is a binary-state process where each node occupies one of two mu-tually exclusive states, which we term uninfected and in-fected . The probability of a node changing state is afunction of the state of its neighbours, as well as thestrength of their interactions. Interaction strength, alsoreferred to as mutual inﬂuence , is a non-negative scalarthat we consider to be a function of the elapsed renewalprocess time series. We desire that the mean of the emer-gent distribution of interaction strengths be stationary,and invariant to the underlying burstiness of the system.This is achieved when the contribution of a single eventto interaction strength (i) goes to zero as the event ages,and (ii) is additive, meaning a spike in edge activity leadsto a spike in the interaction strength between neighbours.Under these assumptions, the simplest such coupling is astep function, i.e., the contribution of an event to interac-tion strength is constant for a duration η , after which itgoes to zero. As such we deﬁne the interaction strength w j of an edge at time t (or state j for short), as the num-ber j of events having occurred in the preceding timewindow of width η .It follows that the local conﬁguration of a node is de-termined by the number k j of its neighbours connectedvia edges in state j , with the degree k of the node relatedto its k j values by k = (cid:80) j k j at any time t . We introduce m j as the number of infected neighbours of a node con-nected via edges in state j . Consequently, 0 ≤ m j ≤ k j with m = (cid:80) j m j the total number of infected neigh-bours. For each node, we store k j and m j for all j invectors k and m , providing a description of edge andnode states in the local neighbourhood of a node. Nodesin class ( k , m ) become infected at a rate F k , m , and arestatistically identical in this sense. We also store the in-teraction strength w j = j in the vector w for all j . Thedynamics of the inﬂuence received by a node is thus fullydetermined by ( k , m ) and w . Models of information diﬀusion.

To examine the ef-fect of temporal interactions on information diﬀusion,

TABLE I. Transmission rate F k , m for nodes in conﬁguration( k , m ) with interaction strength w and infection rate p dueto external noise. In complex contagion models with rela-tive (RT) and absolute (AT) thresholds, infection is regulatedby parameters φ and M φ , respectively. In the Susceptible-Infected (SI) model, infection is determined by the rate λ . thresholdrelative absolute SI (cid:40) , m · w ≥ φ k · w p, otherwise (cid:40) , m · w ≥ M φ p, otherwise max( p, m · λ )we explore three widely known models of transmission.We consider both relative (RT) and absolute (AT) vari-ants of a threshold mechanism [28, 30, 40], as well asthe Susceptible-Infected (SI) model of epidemic spread-ing [41] (see Table I for details). All models are non-recovery, meaning the uninfected state cannot be reen-tered, and we consider infection due to external noise ata low, but nonzero rate p .The study of threshold dynamics focuses on the con-ditions leading to cascades , or large avalanches of infec-tions that sweep through the network. In the simplestimplementation of threshold dynamics, infection occurswhen the number m of infected neighbours of an unin-fected node exceeds a fraction φ of its degree k [28, 30].Generalising this rule to the case or arbitrary interactionstrength [32, 42], in the RT model infection occurs whenthe inﬂuence of infected neighbours, m · w , exceeds afraction φ of all potential inﬂuence, k · w . The RT modelcaptures instances of real-world diﬀusion where interac-tion between elements aﬀect the probability of infectiononly in aggregate, similar to the response of individu-als to new behavioural patterns or transmission in bio-logical neural networks [43, 44]. When considering theRT model over temporal networks, the probability of in-fection may increase during bursts of interaction eventswith infected neighbours or, conversely, bursts of activ-ity with uninfected neighbours may temporarily main-tain a node in the uninfected state. In the AT model,inﬂuence from infected neighbours is not normalised, butcompared to some absolute value M φ [40]. In contrastto the RT model, infection is not hindered by interactionactivity with uninfected neighbours, and bursts can onlyincrease the probability of infection. In the SI model, ﬁ-nally, each interaction event with an infected neighbourtriggers infection at a rate λ . In our framework of tem-poral networks, infected neighbours trigger infection viaedges in state j at a rate λj . Writing λ = λ w , the in-fection rate for a node with a neighbourhood of infectednodes described by m is m · λ . Similar to the AT model,bursts can only increase the probability of infection inthe SI model. Binary dynamics over temporal networks.

We extenda master equation formalism [32, 45] to account for net-work temporality. We introduce the state space of allconﬁgurations ( k , m ) allowed by the underlying degreedistribution p ( k ), under the condition that each edge is inone of a ﬁnite number of possible edge states (see Meth-ods, and Supplementary Note 1 for lattice diagrams ofthis space). We introduce the state vector s ( t ) contain-ing the probability that a randomly selected node withunderlying degree k is uninfected and in class ( k , m ) attime t . The time evolution of s is governed by the ma-trix W ( s , t ), containing the transition rate W ij from the i -th to the j -th conﬁguration ( k , m ) at time t . Transi-tions arise from three mechanisms. First, ego transitions,contained in the matrix W ego , describe the loss to conﬁg-uration ( k , m ) due to its nodes becoming infected. Thisoccurs at a rate F k , m , as per Table I, so the diagonalterms of W ego are given by − F k , m and oﬀ-diagonals arezero. Second, neighbour transitions, contained in ma-trix W neigh , describe the gain or loss to conﬁguration( k , m ) due to the infection of neighbours of nodes in thisclass. This transition is determined by β j dt , the proba-bility of an uninfected neighbour in conﬁguration j be-coming infected over an interval dt (see Methods for anexplicit calculation). Taken together, W ego and W neigh accurately describe diﬀusion dynamics over static andheterogeneously distributed edges, such as weighted andmultiplex networks [32, 33].Temporal networks require a third component, edge transitions, contained in the matrix W edge , describingthe gain or loss to conﬁguration ( k , m ) due to changesin an edge’s state j . This applies to any temporal net-work model that can be formulated in terms of discrete,dynamic edge states. We denote by µ j dt and ν j dt theprobabilities that a randomly selected edge in state j un-dergoes a positive or negative transition and enters state j +1 or j −

1, respectively, over an interval dt . Combiningthese terms gives the master equation ddt s = ( W ego + W neigh + W edge ) s = W ( s , t ) s . (1)Modelling temporal network dynamics amounts to solv-ing Eq. (1), which along with the initial condition s (0),determine the evolution of the system.To apply this formalism we derive the edge transitionrates µ j and ν j in the case of renewal processes. We ﬁrstnote that microscopically, on the scale of a single edge,transitions from state j to j ± j dependence at this scale. A renewal process may thenseem at odds with a Markovian master equation [where s ( t + dt ) depends only on s ( t ), as per Eq. (1)]. Macroscop-ically however, on the scale of large ensembles of edges,the renewal process exhibits eﬀective j -dependent ratesthat are constant in time. We can calculate the prob- regular Poisson bursty σk/ h k i (a)-4 -2 0 2 4 6 8-3-2-10 log σ τ / h τ i l og ρ f tf l og ρ f (b) Generated (c)

Email (d)

Forum φ -6 -4 -2 0 2-3-2-10 log η/ h τ i l og ρ f φ -6 -4 -2 0 2log η/ h τ i φ -6 -4 -2 0 2log η/ h τ i FIG. 1. Normalised density of noise-induced infections ρ f as afunction of interevent time standard deviation σ τ and memory η . Normalised diﬀusion time t f produces an almost identicaleﬀect [see (a), inset]. The small σ τ limit, leading to regularpatterns in τ , comprises the quenched limit in (a) where η = 1and the network is eﬀectively static. The large σ τ limit pro-duces large bursts in activity, comprising the annealed regimewhere the network is eﬀectively sparsiﬁed and plays no role ininformation diﬀusion ( ρ f = 1). Mirroring results are achievedby varying memory η for ﬁxed σ τ in generated (b) and empir-ical (c-d) temporal networks. Analytic solution is denoted bydashed lines, and Monte Carlo results by solid lines. Gener-ated networks have lognormal degree distribution with mean (cid:104) k (cid:105) = 7 and standard deviation σ k = 2. We use Weibull-distributed interevent times with mean (cid:104) τ (cid:105) = 1. Plot (b)uses σ τ = 1. For empirical data description see Methods.Node dynamics correspond to the RT model with threshold φ = 0 .

15 and external noise p = 2 × − . Cutoﬀ densityis ρ c = 0 .

4. Monte Carlo simulations are averaged over 10 realisations. Network size is 10 in (a), and 5 × in (b). ability E j that a randomly selected edge is in state j ,and the probability that it transitions to state j ± dt , giving µ j and ν j [see Methods for explicitexpressions for j >

0, with the j = 0 case of E j and µ j comprising a special case that we deﬁne in Eqs. (2) and(3) below].Since the rates µ j and ν j are heterogeneous in terms of j , they can be viewed as a signature of the model param-eters ψ ( τ ) and η , and of the non-Markovianity inherentat the scale of a single edge. On a macroscopic scale, µ j , ν j , and E j are constant in time, meaning our sys-tem is indistinguishable from a continuous-time Markovchain model of edge state. That is, a random walk onthe non-negative integers, with transition rates given by µ j and ν j , and a stationary distribution of walkers givenby E j [see Supplementary Note 2 for an illustration of µ j and ν j in the case of a gamma distribution ψ ( τ )].Applying the system-wide rates µ j and ν j at the ﬁner-grained level of conﬁgurations ( k , m ) amounts to a meanﬁeld approximation. Monte Carlo simulations (see Sup-plementary Fig. 8) demonstrate that the actual edgetransition rates deviate slightly from µ j and ν j for eachconﬁguration ( k , m ), even if they are exact for the net-work as a whole, in the limit of large N . The accuracy ofthe master equation solution provides a measure of theremarkable similarity between a renewal process, whereEq. (1) is an approximation, and the biased random walkinterpretation of edge state, where Eq. (1) is exact. Burstiness and information diﬀusion.

We validate ouranalytical framework with Monte Carlo simulations ofdiﬀusion dynamics over temporal networks. Simulationsuse an underlying static, conﬁguration-model networkwith lognormal degree distribution of mean (cid:104) k (cid:105) and stan-dard deviation σ k . We measure the time t c required toreach an arbitrary density ρ c of infected nodes, in thepresence of background noise at rate p . We also measure ρ f , the relative frequency of infections due to externalnoise, such that 0 < ρ f ≤

1, with 1 /ρ f the ratio of allto noise-induced infections, measuring the catalytic ef-fect of external noise (for a detailed description of ρ f seeMethods). We normalise t c by the time taken to reachthe desired density by noise only, providing t f , such that0 < t f ≤

1. Remarkably, ρ f and t f are almost equivalent,with a value of ρ f = t f = 1 indicating slow diﬀusion withcomplete reliance on external noise, and small ρ f and t f representing rapid diﬀusion with external noise produc-ing a substantial catalytic eﬀect. Together, they measurethe extent to which the temporal network, rather thanexternal noise, drives the diﬀusion of information.We ﬁrst examine the eﬀect of varying interevent timestandard deviation σ τ for ﬁxed memory η = (cid:104) τ (cid:105) = 1 (a) RT-2 -1 0 1 2 3-4-3-2-10123 log σ τ / h τ i l og η / h τ i -3 -2 -1 0 log t f (b) AT-2 -1 0 1 2 3 log σ τ / h τ i -3 -2 -1 0 log t f (c) SI annealedquenched-2 -1 0 1 2 3 log σ τ / h τ i -4 -3 -2 -1 0 log t f FIG. 2. Monte Carlo simulation of the normalised diﬀusiontime t f as a function of interevent time standard deviation σ τ and memory η . Diﬀusion dynamics correspond to the RTmodel with φ = 0 .

15 (a), the AT model with M φ = 2 (b), andthe SI model with λ = 0 .

02 (c). Red dashed lines indicatethe σ τ value producing the minimum diﬀusion time t f , fora given η . This demonstrates, for example at large η in (a),that small values of burstiness maybe be accelerative withrespect to the Poisson case. This is evidenced by the minimumextending beyond σ τ = 1. Oscillations in the line between 0 < log η < j = 0 edges[identical but not shown in (a) and (b)]. Degree distribution islognormal with (cid:104) k (cid:105) = 7 and σ k = 0 .

5. Results correspond to asingle realisation of each ( σ τ , η ) value in networks of size N =10 . The interevent time distribution ψ ( τ ) is Weibull withmean (cid:104) τ (cid:105) = 1 (see Supplementary Note 6 for corresponding ρ f values). [Fig. 1(a)]. We choose a Weibull interevent time distri-bution ψ ( τ ), used widely to model behavioural bursts inboth human [46] and animal [47] dynamics. A Weibulldistribution reduces to the exponential distribution for σ τ = (cid:104) τ (cid:105) = 1. Node dynamics follow the RT model forthreshold φ = 0 .

15 and background noise p = 2 × − .Approaching the small σ τ limit from above, events ar-rive in an increasingly regular pattern, and an increasingfraction of edges are frozen in the mean state η/ (cid:104) τ (cid:105) = 1.We refer to this as the quenched regime, whereby edgesconverge to a single state and the network is eﬀectivelystatic. In the opposing limit of large σ τ , burstiness meansthat at any given time, edge activity is concentratedamong an arbitrarily small fraction of edges that undergolarge spikes in activity, with the remainder in state j = 0.We refer to this as the annealed regime, where the net-work is maximally sparse and has a vanishingly small rolein information diﬀusion ( ρ f and t f approach one).Both quenched and annealed regimes lead to slow,noise-reliant diﬀusion, where the expected edge state η/ (cid:104) τ (cid:105) is preserved [Fig. 1(a)]. For intermediate values of σ τ there is a well-mixed regime where relatively rapid dif-fusion is due to edge state ﬂuctuations that are ultimatelyfavourable to transmission. In the RT model this impliesa spike of activity on an infected neighbour overcominga node’s threshold, or decreased activity on uninfectededges lowering the relative inﬂuence to be overcome. Thedecelerative eﬀect of quenching is increased for narrowerunderlying degree distributions, since an increasing frac-tion of nodes are frozen in a state unfavourable to trans-mission, a static network eﬀect already reported in [30].A mirroring eﬀect can be obtained by varying mem-ory η for constant σ τ = (cid:104) τ (cid:105) = 1 [Fig. 1(b)]. Thequenched limit is recovered for large η , as large sam-ples of events on each edge result in edges convergingto a mean state, η/ (cid:104) τ (cid:105) , with an increasingly narrow dis-tribution, due to the central limit theorem. As for thecase of ﬁxed η , quenching may be decelerative if cas-cades on the corresponding static network are noise de-pendent. For example, increasing φ can cause slower dif-fusion in the quenched limit [Fig. 1(b)]. The annealed(noise-driven) regime is eﬀectively recovered when η isvanishingly small, meaning almost all edges are in state j = 0 and the role of the network in information diﬀu-sion vanishes ( ρ f = t f = 1). The correspondence be-tween σ τ and η suggests data-driven experiments thatallow an indirect inference of the eﬀects of varying σ τ inreal systems, an open problem in the study of informa-tion diﬀusion. We simulate the RT model on two em-pirical temporal networks and vary only the memory η ,recovering qualitatively the eﬀects observed on syntheticnetworks [Fig. 1(c-d), see Methods for data description].This suggests the accelerative and decelerative eﬀects ofburstiness may well be a feature of real-world informationdiﬀusion.We systematically explore the RT, AT, and SI modelswith Monte Carlo simulations over ( σ τ , η )-space (Fig. 2).The underlying degree distribution is lognormal with Absolute threshold modelSusceptible-Infected model-1 0 1 2 3 4 5 6-3-2-10 Mφ Poisson (a)log σ τ / h τ i l og ρ f ψ ( τ )Poisson (b)log σ τ / h τ i lognormalWeibullgamma -3 -2 -1 0(c) [legend (b)] log ξ E -1 0 1 2 3 4 5 6-2-10 z , λ Poisson (d)log σ τ / h τ i l og ρ f

9, 0.029, 0.013, 0.023, 0.01 ψ ( τ )Poisson (e)log σ τ / h τ i lognormalWeibullgamma -3 -2 -1 0(f) [legend (e)] log ξ E ξE ρ f ξE ξ µ ξE ρ f ξE ξ µ FIG. 3. Relative density of noise-driven infections ρ f as a function of interevent time standard deviation σ τ for the AT (a-c)and the SI (d-f) models. Analytic solution is denoted by dashed lines, and Monte Carlo results by solid lines. (a) Eﬀect ofrelative threshold M φ . (b) Dependence of ρ f on choice of ψ ( τ ). (c) Data collapse of (b) after controlling for eﬀective sparsity ξ E . Left inset is a closeup of the main plot in linear scale, revealing diﬀerences in ρ f remain after controlling for ξ E . This isexplained by the diﬀering mixing rates ξ µ (inset right). (e-f) Similar results for the SI model. Cutoﬀ node density is ρ = 0 . η = (cid:104) τ (cid:105) = 1. Analytic solution is shown by dashed lines, Monte Carlo simulations by solid lines. Degree distribution islognormal with (cid:104) k (cid:105) = 7 and σ k = 1. We use a threshold M φ = 2 for the AT model, and external noise p = 2 × − . MonteCarlo simulations involve 10 realisations on networks of size N = 10 . (cid:104) k (cid:105) = 7 and σ k = 0 .

5, and interevent times are Weibull-distributed with (cid:104) τ (cid:105) = 1. Our aim is to understand howthe temporal connectivity evolves over ( σ τ , η )-space. Aspreviously observed, the quenched regime appears eitherin the small σ τ limit for constant (but suﬃciently large) η , or in the large η limit for constant σ τ . The temporalnetwork enters the annealed regime in two ways, eitherby taking the small η limit for constant σ τ , or the large σ τ limit for constant η . The two regimes are separatedby a percolation transition, i.e., the emergence of a giantconnected component in the subgraph formed by edgesin state j > ξ E = (cid:90) ∞ η Ψ( τ ) dτ (2)and ξ µ = (cid:82) ∞ η ψ ( τ ) dτ (cid:82) ∞ η Ψ( τ ) dτ , (3)where ξ E equals E , the density of edges in state zero,and ξ µ equals µ , the probability that a randomly se-lected edge in state zero enters state one over an interval dt . We may refer to ξ E as the eﬀective sparsiﬁcation,or alternatively, the eﬀective annealing. Here Ψ( τ ) isthe complementary cumulative distribution relating to ψ ( τ ). We denote by q ( k ) the degree distribution ob-tained by randomly removing a fraction ξ E of edges ina static conﬁguration model network with degree distri- bution p ( k ), which is identical to the expected subgraphformed by removing state zero edges in the stochastictemporal network. The percolation transition for q ( k )can be computed analytically (see Supplementary Note3), and despite the static assumption, provides an ex-cellent estimate of the boundary between quenched andannealed regimes (Fig. 2), indicating the onset of slow,noise-dependent diﬀusion for all diﬀusion dynamics con-sidered.Even if the outcome of information diﬀusion (as mea-sured by t f ) is qualitatively similar across diﬀusion mod-els with respect to the features of the temporal network( σ τ and η ), we can identify diﬀerences due to node dy-namics by measuring the values of σ τ that produce aminimum diﬀusion time for given η (see dashed linesin Fig. 2). In the RT model, both quenched and an-nealed regimes produce relatively slow diﬀusion. Betweenthe two regimes, the minimum diﬀusion time shifts tolarger σ τ for increasing memory η , and eventually ex-ceeds σ τ = 1, meaning burstiness is accelerative. As η increases, larger and larger ﬂuctuations in τ are requiredto exit the quenched regime and enter the mixing phasewhere rapid diﬀusion occurs, a consequence of the centrallimit theorem [Fig. 2(a)]. Increasing σ τ also produces anaccelerative eﬀect in the AT model [Fig. 2(b)]. In con-trast to the RT model, burstiness is accelerative only forsmall values of memory η . Since in the AT model wedo not normalise infectious inﬂuence by total inﬂuence,increasing η is always favourable to transmission, andquenching never slows down diﬀusion [see Fig. 3(a)]. Inthe SI model burstiness does not have an accelerative ef-fect [Fig. 2(c) and Fig. 3(d)], since its infection rate isunbounded (as opposed to threshold models, as per Ta-ble I). The annealing eﬀect of burstiness overwhelms theincreased rate of local diﬀusion aﬀorded by unboundedtransmission rates, which increases proportionally to thesize of the burst. As such, acceleration due to bursti-ness appears to be a hallmark of threshold mechanisms,whether relative or absolute.Finally, we examine the eﬀect of the choice of in-terevent time distribution ψ ( τ ) (Fig. 3). We measure thenoise dependence ρ f for lognormal, Weibull and gammadistributions, controlling for both (cid:104) τ (cid:105) and σ τ . Considerﬁrst the AT model with M φ = 2 and η = (cid:104) τ (cid:105) = 1[Fig. 3(b)]. Here, we observe a striking dependence on ψ ( τ ), with the lognormal distribution leading to the mostrapid diﬀusion, outpacing the gamma distribution in dif-fusion speed and relative noise dependence by up to afactor of 83, and the Weibull distribution by up to afactor of 14. These diﬀerences can be accounted for bycomparing the rate of onset of annealing in terms of ξ E as we increase σ τ . The gamma distribution rapidly an-neals the network, yielding the largest ξ E values of allchoices of distribution, meaning the most edges in state j = 0. As a result, it exhibits the slowest, most noisereliant diﬀusion. In terms of the value of ξ E induced, thegamma is followed by the Weibull distribution, then thelognormal distribution. In fact, the lognormal requiresorder-of-magnitude larger σ τ to produce equal values of ξ E as the Weibull and gamma distributions. By plot-ting ρ f against ξ E we observe the data to collapse ap-proximately onto a single curve, revealing ξ E to be a farbetter predictor of dynamics than σ τ [see Fig. 3(c) in con-trast to Fig. 3(b)]. Some disagreement persists, however[Fig. 3(c), left inset], which can be explained by notingthat increased rates of mixing ξ µ [Fig. 3(c), right inset]ensure that the small number of active edges redistributeabout the network at a greater rate, thus mediating cas-cades more eﬀectively. An identical eﬀect is observed forthe SI model [Fig. 3(e-f)].The data collapse in Fig. 3(c) and (f) conﬁrms thatabove all it is ξ E , the density of edges in state j = 0,that ultimately determines the diﬀusion dynamics in ourframework. It remains to determine why the value of ξ E is so sensitive to the choice of interevent time distribu-tion ψ , and in particular, what the properties are of agiven distribution ψ that most contribute to the value of ξ E , beyond its mean and standard deviation. We havefound two properties that correlate with our observationsin Fig. 3(b) and (e), at least qualitatively, as shown inSupplementary Fig. 9. They are the third raw moment, (cid:104) τ (cid:105) , which is closely related to skewness, and diﬀeren-tial entropy (as deﬁned in Supplementary Note 8). Thesemeasures provide a rule of thumb such that, for instance,given two distributions ψ with equal mean and variance,it is the one with the greater skewness that produces thelowest ξ E , and the most rapid diﬀusion. Discussion . Our study shows that generic dynamicsof information diﬀusion are closely tied to the level of burstiness in the underlying temporal network. By con-sidering three binary-state models of transmission, wehave demonstrated that they diﬀer in their response toburstiness only in their details. For instance, while hav-ing a purely decelerative eﬀect on SI models, increasingburstiness at intermediate values can be accelerative forthreshold models. Nevertheless, the prevailing trend isthat increasing burstiness is strongly decelerative overall,with the onset of the decelerative phase heavily depen-dent on the choice of interevent time distribution. Thekey assumptions here are that the underlying network isﬁxed, and that due to a memory mechanism, a fractionof edges enter a non-interacting state due to long waitingtimes. These assumptions result in a temporal networktopology that has profound implications for many dy-namical processes. It is likely that structural featuresof the temporal network, such as the percolation transi-tion separating slow and fast diﬀusion, and the data col-lapse observed when controlling for the eﬀective sparsity,will also be critical for the more general class of binary-state dynamics, including not only threshold models andmodels of disease, but language, voter, and Ising models,among others.Our master equation formalism can be extended to abroad class of temporal network models. In particular,any model that can be formulated in terms of discrete,dynamic edge states is a candidate for our approach. Thisincludes growing, decaying and adaptive networks, aswell as models of rewiring. In line with our use of renewalprocesses, a large family of point processes have natu-ral descriptions in terms of discrete edge states, such ascascading Poisson and Cox processes. Extensions to thePoisson process in general suggest promising applicationsof our approach. In particular, our treatment of non-Markovianity could be applied to other systems. Thatis, while a single component in a large system may bestrongly non-Markovian, as was the case in our renewalprocess, stationary statistics may emerge at an ensemblelevel that act as a signature of the non-Markovianity oc-curring microscopically. Our biased random walk inter-pretation of the renewal process model shows that strik-ingly similar Markovian counterparts may be availablefor analysis. Incidentally, biased random walk modelsof edge state suggest a broad class of Markovian mod-els to which our master equation applies exactly. Thesemay be extended, for example, to L´evy ﬂights, and usedas a probe of various complex systems where memory iscritical.

Acknowledgements

S.U. acknowledges the Pˆole Scientiﬁque deMod´elisation Num´erique from ENS Lyon for theircomputing support, as well as L. Taulelle and V. Lef`evrefor useful technical advice. G.I. acknowledges partialfunding by the European Commission through H2020project HumanE AI under G.A. No. 761758. J.P.G. issupported by Science Foundation Ireland (grant num-bers 16/IA/4470, 16/RC/3918, 12/RC/2289 P2 and18/CRT/6049) with co-funding from the EuropeanRegional Development Fund. M.K. is supported bythe DataRedux (ANR-19-CE46-0008) and SoSweet(ANR-15-CE38-0011) projects funded by ANR and theSoBigData++ (H2020-871042) project.

Author contributions

S.U. derived the analytical solution, and performed allrelated calculations and numerical experiments. S.U.,G.I., J.P.G., and M.K. designed the research and con-tributed to the manuscript.

Competing interests

The authors declare no competing interests.

Methods

Master equation conﬁguration space . We provide herea minimal description of the master equation formalism,with a focus on class transition rates, with a completedescription provided in Supplementary Note 1. We intro-duce C k , m , the set of all nodes in the network with localconﬁguration ( k , m ), such that ≤ m ≤ k . Whereas C k , m is a set of nodes, we deﬁne C k as the set of allsets C k , m with total degree k . This can be written C k = { C k , m | (cid:80) j k j = k } . Then, we refer to the conﬁgu-ration space C as the set of all possible sets C k , m . Givena degree distribution p k , we deﬁne C = { ( k , m ) | k ∈ supp p ( k ) and ≤ m ≤ k } , (4)which partitions the network at any given time. Writ-ten this way, C is potentially inﬁnite. To ensure that itbe ﬁnite in numerical constructions, we assume an uppercutoﬀ in the degree distribution p ( k ), and the set of edgestates to be of a ﬁnite size n . Note that C includes any setfor which C k , m is empty at a given time. The cardinality | C | of conﬁguration space is thus determined entirely bythe support of p k , along with n . Since ( k , m ) does notconvey ego state, just edge and neighbour conﬁguration,we partition C k , m into sets of uninfected and infectednodes, such that C k , m = S k , m ∪ I k , m . Similar deﬁnitionsallow us to introduce S k and I k , S k and I k , as well as S and I . Although in general | S k , m | (cid:54) = | I k , m | , the struc-ture of the uninfected and infected conﬁguration spacesis identical, such that | C | = | S | = | I | , | C k | = | S k | = | I k | and | C k | = | S k | = | I k | .The evolution of a dynamical process over a networkamounts to a ﬂow of nodes through the sets S k , m and I k , m over time. Since the number of nodes N in thenetwork is conserved, it is their distribution over the sets S k , m and I k , m that evolves in time. These distributionsprovide the state of the network at time t . Since ourformalism is independent of network size, we deal withthe densities of nodes rather than the absolute sizes ofthese sets. To this end we introduce (cid:107) C k (cid:107) ≡ (cid:88) C k , m ∈ C k | C k , m | (5)as shorthand for the number of nodes with underlyingdegree k . This is in contrast to | C k | and | C k | whichgive the number of conﬁgurations with degrees k and k , respectively. To convert from absolute node countto densities of nodes, we need to normalise S k , m and I k , m by some non-zero quantity that is conserved overthe course of a dynamical process. Since our temporalnetwork models assume a static underlying network, anode’s underlying degree k is preserved, and as a result,so is (cid:107) C k (cid:107) , deﬁned in Eq. (5). The density of uninfectednodes in class ( k , m ) in this case is given by s k , m = | S k , m |(cid:107) C k (cid:107) , (6)with i k , m deﬁned analogously. The node conservationprinciple leads to the condition (cid:80) C k ( s k , m + i k , m ) = 1,which is to say that the sum of all densities s k , m and i k , m with underlying degree k , is one. We then have ρ k = 1 − (cid:88) C k s k , m (7)and ρ = (cid:88) k p ( k ) ρ k , (8)where the sum in the ﬁrst expression is over all conﬁgu-rations ( k , m ) that satisfy (cid:80) j k j = k . The term ρ k givesthe probability that a randomly selected node with un-derlying degree k will be infected, and ρ the probabilitythat any randomly selected node will be infected.As discussed in the main text, s is the | C | -dimensionalvector storing the densities s k , m . In practice, we uselexicographic ordering of the tuples in C to deﬁne a one-to-one mapping ( k , m ) (cid:55)→ i , for some i ∈ { , . . . , | C |} to deﬁne the i -th element s i of s . Finally, it is possibleto show that for ﬁxed n and limiting k , the size of C behaves like Θ( k n ). Now that we have deﬁned the spaceof allowed conﬁgurations, we turn to its dynamics. Master equation transition rates . Ego transitions occurat rates F k , m , and involve the ﬂow of nodes from set S k , m to I k , m . As such, no change to the ego’s local neighbour-hood ( k , m ) takes place, and the transition represents atype of self-edge, or loop, in the lattice representation ofconﬁguration space, illustrated in Supplementary Note1. The rates F k , m are encoded in transmission functionssuch as those shown in Table I. Flux measurements ofthese transitions, such as those in Supplementary Note7, are expected to be exact, and are an important bench-mark for veriﬁcation of experiments. The rates F k , m arecontained in the matrix W ego .Neighbour transitions are based on the probability β j dt that an uninfected neighbour of an uninfected node be-comes infected over an interval dt . To calculate β j we usea straightforward ensemble average over S . To obtain theexpected fraction of neighbours undergoing transitions,we observe the number of nodes undergoing ego transi-tions at time t , and count the number of neighbour tran-sitions produced as a result. That is, when an uninfectednode in class ( k , m ) becomes infected, which occurs withprobability F k , m dt , it has k j − m j uninfected neighboursthat observe this transition, or k j − m j nodes undergoingneighbour transitions. The number of such edges acrossthe entire network is given by (cid:80) S p k ( k j − m j ) F k , m s k , m ,where the sum is over all uninfected classes. We comparethis to the total number of uninfected-uninfected edges, (cid:80) S p k ( k j − m j ) s k , m , giving the neighbour transition rate β j dt = (cid:80) S p k ( k j − m j ) F k , m s k , m (cid:80) S p k ( k j − m j ) s k , m dt, (9)which has previously been used in master equation solu-tions of binary-state dynamics on static networks. Therates β j are contained in the matrix W neigh , weighted bythe values k j and m j of the relevant classes ( k , m ), asdetailed in Supplementary Note 1.Edge transitions occur at rates µ j and ν j , and give theprobability of edges in state j transitioning to state j + 1or j −

1, respectively, over an interval dt . Their valuedepends upon the temporal network model in question.In this work, edge transition rates are determined by re-newal processes following interevent time distributions ψ ( τ ), with complementary cumulative distributions Ψ.If the state of an edge is determined by the number ofevents j having occurred in the preceding time windowof duration η due to a renewal process, edge transitionrates are µ j dt = Ψ ∗ ψ ∗ j Ψ ∗ ψ ∗ ( j − ∗ Ψ dt (10)and ν j dt = Ψ ∗ ψ ∗ ( j − Ψ ∗ ψ ∗ ( j − ∗ Ψ dt, (11)with E j = Ψ ∗ ψ ∗ ( j − ∗ Ψ (12)giving the probability that a randomly selected edge is instate j . It is this quantity that provides the normalisingconstant for the rates µ j and ν j . Here, ψ ∗ j is the j -thconvolution power of ψ . A complete derivation is givenin Supplementary Note 1. The Gaver-Stehfest algorithmis used to compute the inverse Laplace transforms, and an eﬃcient numerical procedure reducing µ j and ν j toa matrix-vector product is developed in SupplementaryNote 5. These expressions hold for j >

0, with Eqs. (2)and (3) in the main text giving the special case of j = 0for E j and µ j , respectively. Regardless of the form of ψ , the mean edge state η/ (cid:104) τ (cid:105) is always conserved on anetwork-wide level. Applying Eqs. (10) and (11) at thelevel of class transitions amounts to a mean ﬁeld approx-imation, since ﬂux measurements of Monte Carlo simu-lation show edge transition rates to deviate slightly from µ j and ν j at the class level ( k , m ), even if exact for thenetwork as a whole, as shown in Supplementary Note 7. Simulation . We simulate networks G = ( V , E ) com-posed of a node set V of size N , and an underlying edgeset E . The edge set is produced by a desired degree dis-tribution, wired according to the conﬁguration model.Overlying temporal network activity is initialised to thesteady state, such that at time t = 0, the time to the ﬁrstevent follows exactly the residual distribution Ψ, in thelimit of large networks. Speciﬁcally, we set the time to t = − η , and draw |E| residual times from Ψ, or one foreach edge. Subsequent interevent times are drawn from ψ . Advancing in time from − η ensures that a stationarydistribution of edge states E j is achieved exactly at t = 0,when we begin to allow node dynamics to evolve. Dueto the large values of interevent time standard deviationstudied in this work, out-of-the-box sampling routineswere either ineﬃcient or broke down for large σ τ . Assuch, we develop a simple, yet eﬃcient routine in Sup-plementary Note 4 based on approximate inverse trans-form sampling of ψ and Ψ, using a bisection method.This is performed on a numerical grid of Ψ values, withrelevant details of the probability distributions outlinedin detail in Supplementary Note 9. A third-order splineinterpolation on a logarithmic scale provides intermedi-ate values of the grid, such that the resultant underlyingdistribution is close to exact.Node dynamics are implemented via a Gillespie algo-rithm, which uses the fact that the waiting time to in-fection for an uninfected node in class ( k , m ) follows anexponential distribution with mean 1 /F k , m . Initially allnodes are in the uninfected state, and the diﬀusion pro-cess is triggered by low-level background noise at rate p .To simulate the temporal network itself, a time-orderedsequence of edge events is implemented in parallel withthe node update sequence. This amounts to two sepa-rate time-ordered sequences of events executed simulta-neously. Algorithms are described in detail in Supple-mentary Note 4 with pseudocode.We use the normalised density of noise-induced infec-tions, ρ f , and normalised diﬀusion time, t f , as measuresof the diﬀusion process. We deﬁne these quantities asfollows. The probability that a randomly selected nodehas been infected as a result of external noise is˜ ρ ( t ) = p (cid:90) t (1 − ρ ( τ )) dτ, (13)meaning 0 < ˜ ρ ≤ ρ . We deﬁne ρ f as the fraction ofinfections that are due to noise ρ f = ˜ ρ/ρ , such that0 < ρ f ≤

1. This value cannot equal zero since theremust be at least one noise induced infection, namely, theﬁrst infection in the diﬀusion process. A value approach-ing ρ f = 1 means almost all infection is due to externalnoise. This occurs in the annealed limit, when almost alledges are in state j = 0, and network interactions playa vanishingly small role in the diﬀusion process. As aconsequence, the time evolution of the diﬀusion processis governed by ˙ ρ = p (1 − ρ ) (14)whose solution ρ = 1 − e − pt can be inverted to give thetime required to the achieve a given density ρ of infectionsrelying solely on noise, that is, t = − ln(1 − ρ ) p . (15)If t c is the time required in the general case to reach acutoﬀ density of infections ρ c , normalising t c by Eq. (15)evaluated at ρ c deﬁnes t f , such that 0 < t f ≤

1. A valueof t f = 1 means the system is driven entirely by noise,and a value approaching 0 a rapid diﬀusion process. Animportant feature of this work is that t f and ρ f seem tobe interchangeable, as per the inset of Fig. 1(a), and anyresult shown in terms of ρ f produces an identical picturein t f . Data description . In this work we use two empiricaltemporal networks used by [48] and references therein,which we describe below. To simulate diﬀusion processes on these networks we use periodic boundary conditions,starting at a randomly selected point in time.The ﬁrst dataset is a temporal network of email ex-change [48, 49], extracted from the log ﬁles of a univer-sity email server. The sender, recipient and the times-tamp are used to form the network. The dataset consistsof N = 3188 nodes, and |E| = 31857 underlying edges,such that the average degree is 19 .

99. A total of 308730events were recorded, with a resolution of one secondover a period of 81 . .

691 eventsoccur per edge. We determine the interevent time distri-bution by taking the the subset of edges observing morethan one event, of which there are 21199. The mean in-terevent time is then calculated to be (cid:104) τ (cid:105) = 3 .

125 days,with standard deviation σ τ = 6 .

620 days. This yields acoeﬃcient of variation σ τ / (cid:104) τ (cid:105) = 2 . N = 7083 nodes, and |E| = 138144 underlying edges, such that the average de-gree is 39 .

01. A total of 1428493 events were recorded,with a resolution of one second over a period of 3133days. An average of 10 .

34 events occur per edge. We de-termine the interevent time distribution by taking the thesubset of edges observing more than one event, of whichthere are 70902. The mean interevent time is then cal-culated to be (cid:104) τ (cid:105) = 16 .

60 days, with standard deviation σ τ = 76 .

53 days. This yields a coeﬃcient of variation σ τ / (cid:104) τ (cid:105) = 4 . [1] Holme, P. & Saram¨aki, J. Temporal networks. Phys.Rep. , 97–125 (2012).[2] Masuda, N. & Lambiotte, R.

A Guide to Temporal Net-works (World Scientiﬁc, 2016).[3] Holme, P. Modern temporal network theory: a collo-quium.

Eur. Phys. J. B , 234 (2015).[4] Barab´asi, A.-L. The origin of bursts and heavy tails inhuman dynamics. Nature , 207 (2005).[5] Goh, K.-I. & Barab´asi, A.-L. Burstiness and memory incomplex systems.

EPL , 48002 (2008).[6] Karsai, M., Kaski, K., Barab´asi, A.-L. & Kert´esz, J. Uni-versal features of correlated bursty behaviour. Sci. Rep. , 1–7 (2012).[7] Davidsen, J. & Kwiatek, G. Earthquake interevent timedistribution for induced micro-, nano-, and picoseismic-ity. Phys. Rev. Lett. , 068501 (2013).[8] de Arcangelis, L., Godano, C., Lippiello, E. & Nicodemi,M. Universality in solar ﬂare and earthquake occurrence.

Phys. Rev. Lett. , 051102 (2006).[9] Turnbull, L., Dian, E. & Gross, G. The string method ofburst identiﬁcation in neuronal spike trains. J. Neurosci.Methods , 23–35 (2005).[10] Karsai, M., Jo, H.-H. & Kaski, K.

Bursty Human Dy-namics (Springer, 2018). [11] Karsai, M. et al.

Small but slow world: how networktopology and burstiness slow down spreading.

Phys. Rev.E , 025102 (2011).[12] Lambiotte, R., Tabourier, L. & Delvenne, J.-C. Bursti-ness and spreading on temporal networks. Eur. Phys. J.B , 320 (2013).[13] Jo, H.-H., Perotti, J. I., Kaski, K. & Kert´esz, J. Ana-lytically solvable model of spreading dynamics with non-Poissonian processes. Phys. Rev. X , 011041 (2014).[14] Horv´ath, D. X. & Kert´esz, J. Spreading dynamics onnetworks: the role of burstiness, topology and non-stationarity. New J. Phys. , 073037 (2014).[15] Williams, O. E., Lillo, F. & Latora, V. Eﬀects of mem-ory on spreading processes in non-Markovian temporalnetworks. New J. Phys. , 043028 (2019).[16] Vazquez, A., R´acz, B., Luk´acs, A. & Barab´asi, A.-L.Impact of non-Poissonian activity patterns on spreadingprocesses. Phys. Rev. Lett. , 158702 (2007).[17] Mancastroppa, M., Vezzani, A., Mu˜noz, M. A. & Buri-oni, R. Burstiness in activity-driven networks and theepidemic threshold. J. Stat. Mech.: Theory Exp. ,053502 (2019).[18] Pastor-Satorras, R., Castellano, C., Van Mieghem, P. &Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. , 925–979 (2015).[19] Vespignani, A. et al. Modelling COVID-19.

Nat. Rev.Phys.

Phys. Rev. Lett. , 128301(2017).[21] Liu, S., Perra, N., Karsai, M. & Vespignani, A. Con-trolling contagion processes in activity driven networks.

Phys. Rev. Lett. , 118702 (2014).[22] Masuda, N. & Holme, P.

Temporal Network Epidemiology (Springer, 2017).[23] Masuda, N. & Holme, P. Small inter-event times gov-ern epidemic spreading on networks.

Phys. Rev. Res. ,023163 (2020).[24] Miritello, G., Moro, E. & Lara, R. Dynamical strengthof social ties in information spreading. Phys. Rev. E ,045102 (2011).[25] Hiraoka, T. & Jo, H.-H. Correlated bursts in temporalnetworks slow down spreading. Sci. Rep. , 1–12 (2018).[26] Min, B., Goh, K.-I. & Vazquez, A. Spreading dynamicsfollowing bursty human activity patterns. Phys. Rev. E , 036102 (2011).[27] Rocha, L. E., Liljeros, F. & Holme, P. Simulated epi-demics in an empirical spatiotemporal network of 50,185sexual contacts. PLoS Comput. Biol. , e1001109 (2011).[28] Granovetter, M. Threshold models of collective behavior. Am. J. Sociol. , 1420–1443 (1978).[29] Karsai, M., I˜niguez, G., Kikas, R., Kaski, K. & Kert´esz,J. Local cascades induced global contagion: how hetero-geneous thresholds, exogenous eﬀects, and unconcernedbehaviour govern online adoption spreading. Sci. Rep. ,27178 (2016).[30] Watts, D. J. A simple model of global cascades on ran-dom networks. Proc. Natl. Acad. Sci. U.S.A. , 5766–5771 (2002).[31] Gleeson, J. P. Cascades on correlated and modular ran-dom networks. Phys. Rev. E , 046117 (2008).[32] Unicomb, S., I˜niguez, G. & Karsai, M. Threshold drivencontagion on weighted networks. Sci. Rep. , 3094 (2018).[33] Unicomb, S., I˜niguez, G., Kert´esz, J. & Karsai, M. Reen-trant phase transitions in threshold driven contagion onmultiplex networks. Phys. Rev. E , 040301 (2019).[34] Karimi, F. & Holme, P. Threshold model of cascades inempirical temporal networks.

Physica A , 3476–3483(2013).[35] Karimi, F. & Holme, P. A temporal network versionof Watts’s cascade model. In Holme, P. & Saram¨aki,J. (eds.)

Temporal Networks , 315–329 (Springer BerlinHeidelberg, 2013). [36] Takaguchi, T., Masuda, N. & Holme, P. Bursty communi-cation patterns facilitate spreading in a threshold-basedepidemic dynamics.

PloS One (2013).[37] Backlund, V.-P., Saram¨aki, J. & Pan, R. K. Eﬀects oftemporal correlations on cascades: threshold models ontemporal networks. Phys. Rev. E , 062815 (2014).[38] V´azquez, A. et al. Modeling bursts and heavy tails inhuman dynamics.

Phys. Rev. E , 036127 (2006).[39] Whitt, W. Approximating a point process by a renewalprocess, I: two basic methods. Oper. Res. , 125–147(1982).[40] Centola, D. & Macy, M. Complex contagions and theweakness of long ties. Am. J. Soc. , 702–734 (2007).[41] Valdano, E., Ferreri, L., Poletto, C. & Colizza, V. Ana-lytical computation of the epidemic threshold on tempo-ral networks.

Phys. Rev. X , 021005 (2015).[42] Ya˘gan, O. & Gligor, V. Analysis of complex contagionsin random multiplex networks. Phys. Rev. E , 036103(2012).[43] Gerstner, W., Kistler, W. M., Naud, R. & Paninski, L. Neuronal Dynamics: from Single Neurons to Networksand Models of Cognition (Cambridge University Press,2014).[44] Iyer, R., Menon, V., Buice, M., Koch, C. & Mihalas, S.The inﬂuence of synaptic weight distribution on neuronalpopulation dynamics.

PLoS Comput. Biol. , e1003248(2013).[45] Gleeson, J. P. High-accuracy approximation of binary-state dynamics on networks. Phys. Rev. Lett. ,068701 (2011).[46] Jiang, Z.-Q. et al.

Calling patterns in human commu-nication dynamics.

Proc. Natl. Acad. Sci. U.S.A. ,1600–1605 (2013).[47] Sorribes, A., Armendariz, B. G., Lopez-Pigozzi, D.,Murga, C. & de Polavieja, G. G. The origin of behavioralbursts in decision-making circuitry.

PLoS Comput. Biol. (2011).[48] Saram¨aki, J. & Holme, P. Exploring temporal networkswith greedy walks. Eur. Phys. J. B , 334 (2015).[49] Eckmann, J.-P., Moses, E. & Sergi, D. Entropy of dia-logues creates coherent structures in e-mail traﬃc. Proc.Natl. Acad. Sci. U.S.A. , 14333–14337 (2004).[50] Karimi, F., Ramenzoni, V. C. & Holme, P. Structuraldiﬀerences between open and direct communication inan online community.

Physica A , 263–273 (2014).[51] Abate, J. & Whitt, W. A uniﬁed framework for numer-ically inverting Laplace transforms.

INFORMS Journalon Computing , 408–421 (2006). Contents

Supplementary Note 1. Master equation solution 1Supplementary Note 2. Random walk equivalence 7Supplementary Note 3. Edge-state distribution 7Supplementary Note 4. Monte Carlo simulation 8Supplementary Note 5. Laplace transform inversion 11Supplementary Note 6. Diﬀusion speed and noise 12Supplementary Note 7. Mean ﬁeld approximation 12Supplementary Note 8. Skewness and entropy 14Supplementary Note 9. Probability distributions 15

Supplementary Note 1. Master equation solution

In this Supplementary Note we detail the master equa-tion solution used to solve for binary-state dynamics onour temporal network model. The general approach willbe to assign one of a ﬁnite number of types to each edge,and allow this quantity to evolve over time. To formulatea master equation solution, one deﬁnes a state space of al-lowed node conﬁgurations, which we term conﬁgurationspace . The second step is to deﬁne the allowed transi-tions between node conﬁgurations. The time evolutionof a probability density over this state space amountsto a set of ﬁrst-order diﬀerential equations, or rate equa-tions, that among other things, provides the total densityof infected nodes at a given time.

Conﬁguration space

As discussed in the main text, a network can be par-titioned by the conﬁgurations ( k , m ), where each nodeis assigned exactly one conﬁguration, at any point intime. As a reminder, k and m are n -dimensional vec-tors storing k j and m j , the number of neighbours alongedges of type j , and the number of infected neighboursalong edges of type j , respectively. As a consequence,we have 0 ≤ m j ≤ k j , with 1 ≤ j ≤ n . Now considera network of size N , following a degree distribution p k ,where k = (cid:80) j k j is the total degree, in a system allow-ing a maximum of n edge states. We introduce C k , m , theset of all nodes in the network with local conﬁguration( k , m ). Whereas C k , m is a set of nodes , we introduce C k to deﬁne the set of all sets C k , m with total degree k ,that is C k = { C k , m | (cid:80) j k j = k } . Finally, C is the setof all possible sets C k , m . Provided a distribution of totaldegrees p k , and edge dimension n , we deﬁne C = { ( k , m ) | k ∈ supp( p k ) and ≤ m ≤ k } , (1)which partitions the network at any given time. Thisincludes sets for which C k , m = ∅ at a given time. Thecardinality of this universal set, | C | , is determined by thesupport of p k , in addition to n . Since ( k , m ) does notconvey ego state, just edge and neighbour conﬁguration, we partition C k , m into uninfected and infected nodes,such that C k , m = S k , m ∪ I k , m . Similar deﬁnitions allowus to introduce S k and I k , S k and I k , as well as S and I .Although in general | S k , m | (cid:54) = | I k , m | , the structure of theuninfected and infected conﬁguration spaces is identical,such that | C | = | S | = | I | , | C k | = | S k | = | I k | as well as | C k | = | S k | = | I k | .The evolution of a dynamical process over a networkamounts to a ﬂow of nodes through the sets S k , m and I k , m over time. Since the number of nodes N in the net-work is conserved, it is just their distribution over thesets S k , m and I k , m that evolves in time. These distribu-tion provide the state of the network at time t . Since ourformalism is independent of network size, we deal withthe densities of nodes rather than the absolute sizes ofthese sets. As such, we deﬁne (cid:107) C k (cid:107) ≡ (cid:88) C k , m ∈ C k | C k , m | , (2)in order to give the number of nodes with degree vectors k and m , and total degree k . This is in contrast to | C k | and | C k | which give the number of conﬁgurations withdegrees k and k . To convert from absolute node count todensities of nodes, we need to normalise S k , m and I k , m bysome non-zero quantity that is conserved over the courseof a dynamical process. For the temporal network modelsin question, the desired quantity is (cid:107) C k (cid:107) , deﬁned above.The density of uninfected nodes in class ( k , m ) in thiscase is given by s k , m = | S k , m |(cid:107) C k (cid:107) , (3)with i k , m deﬁned analogously. In the case of temporalnetworks, the node conservation principle leads to thenormalisation condition (cid:80) k , m | k ( s k , m + i k , m ) = 1, for agiven k class. We then have ρ k = 1 − (cid:88) k , m | k s k , m (4)and ρ = (cid:88) k p k ρ k , (5)where the sum in the ﬁrst expression is over all conﬁgura-tions ( k , m ) that satisfy (cid:80) j k j = k . The time-dependentterm ρ k gives the probability that a randomly selectednode with total degree k will be infected, and ρ the prob-ability that any randomly selected node will be infected.Finally, we deﬁne s as the | C | -dimensional vector stor-ing the densities s k , m . In practice, we use lexicographicordering of the tuples in C to deﬁne a one-to-one map-ping ( k , m ) (cid:55)→ i , for some i ∈ { , . . . , | C |} to deﬁne the i -th element s i of s . Finally, it is possible to show thatfor ﬁxed n and limiting k , the size of C behaves like (0 ,

0) (1 ,

0) (2 ,

0) (3 ,

0) (4 , ,

0) (1 ,

0) (2 ,

0) (3 , ,

1) (1 ,

1) (2 ,

1) (3 , ,

0) (1 ,

0) (2 , ,

1) (1 ,

1) (2 , ,

2) (1 ,

2) (2 , ,

0) (1 , ,

1) (1 , ,

2) (1 , ,

3) (1 , ,

0) (0 ,

1) (0 ,

2) (0 ,

3) (0 , k = ( , ) T k = ( , ) T k = ( , ) T k = ( , ) T k = ( , ) T m = m = m = m = m = Supplementary Figure 1.

Lattice diagram of temporal network conﬁguration space.

Temporal network conﬁgurationspace for a node of total degree k = 4, with n = 2 allowed edge states. Right-diagonal transitions indicate neighbour infection,and left-diagonals the increments and decrements between edge-states j = 0 and 1 that may occur in models of temporalnetworks. Nodes in this lattice are labelled by their infected degree vector m T , with corresponding k shown. Right-diagonaltransitions preserve the degree vector k , indicated at the bottom of these diagonals. Left-diagonal transitions preserve totalinfected neighbour count m , shown at the top of these diagonals. Θ( k n ). Now that we have deﬁned the space of allowedconﬁgurations, we turn to its dynamics. Conﬁguration transitions

As outlined in the preceding section, the state of thesystem at time t is given by the | C | dimensional vector s ( t ). After providing an initial condition s (0), the evolu-tion of the system can be approximated with the matrix W ( s , t ), such that ddt s = W ( s , t ) s = ( W ego + W neigh + W edge ) s , (6)where W can be decomposed into separate | C | -dimensional square matrices corresponding to ﬂowsdriven by ego , neighbour , and edge transitions, respec-tively. We outline these transitions in the following para-graphs. In the following we assume that besides the tran-sitions speciﬁed, entries in W ego , W neigh and W edge arezero. Despite W being sparse, and the numerical imple-mentation ultimately being in the form of dictionaries,we prefer the matrix form for exposition. Ego transitions . In a non-recovery node dynamics, egotransitions of a node in class ( k , m ) involve an uninfectednode becoming infected, thereby exiting class S and en-tering class I . These transitions drive the actual node dynamics that overlie the temporal network substrate.The transitions between these two classes are deﬁned byinfection and recovery rates F k , m and R k , m that dependon the dynamical model of interest. Examples of F k , m ,the relative and absolute threshold rules, are given inthe main text. We set R k , m = 0 as we are interested innon-recovery dynamics here, where infected nodes cannotreenter the uninfected state, although this is straightfor-ward to generalise. We assume that node transitions oc-cur homogeneously in time regardless of the underlyingdynamics. In other words, a node in the uninfected statebecomes infected over an interval [ t, t + dt ] with proba-bility F k , m dt . If W ( A, B ) is the rate of transition fromclass A to B , ego transitions in a non-recovery systemare given by W ( S k , m , I k , m ) = F k , m . (7)Since a node’s egocentric network ( k , m ) doesn’t changeduring such a transition, the oﬀ-diagonal terms of W ego are zero, with the i -th diagonal term being − F i , withego transitions being a net loss to the S set. In contrast,the following transitions correspond to oﬀ-diagonal ma-trices, since nodes undergoing these transitions remainin the S class, and are compensated for elsewhere in W . These transitions appear as a type of self-loop inlattice diagrams of conﬁguration space, SupplementaryFigs. (1) and (2). Further, ﬂux measurements of these (0, 0) (1, 0) (2, 0)(0, 0) (1, 0)(0, 1) (1, 1)(0, 0) (0, 1) (0, 2) k = ( , ) T k = ( , ) T k = ( , ) T m = m = m = (a) static (0, 0) (1, 0) (2, 0)(0, 0) (1, 0)(0, 1) (1, 1)(0, 0) (0, 1) (0, 2) k = ( , ) T k = ( , ) T k = ( , ) T m = m = m = (b) temporal Supplementary Figure 2.

Comparison of static and temporal conﬁguration space.

Allowed transitions in static networks,(a), and temporal networks, (b), for a k = 2 degree node with an n = 2 level edge-state set. See caption in SupplementaryFig. 1 for interpretation. Note that the number of conﬁgurations ( k , m ) is identical in each case, | C | = 10, it is just the numberof allowed transitions that diﬀer. transitions in Monte Carlo simulation ought to be exact,in the limit of large networks, and therefore act as a usefulbenchmark in transition rate studies. See SupplementaryFig. 8(a) for an illustration. Neighbour transitions . Neighbor transitions refer tothe change in nodes class due to the change in state ofone of its neighbours, reﬂected in the value of the partialdegree vector m . We distinguish neighbour transitions bythe type j of the corresponding edge. The rates at whichnodes leave the class ( k , m ) due to neighbour infectionare given by W ( S k , m , S k , m + e j ) = β j ( k j − m j ) . (8)The coeﬃcient β j gives the rates at which uninfectedneighbours of uninfected nodes become infected. Thisquantity is derived below. Inﬂux to ( k , m ) from the class( k , m − e j ) due to the same mechanism is W ( S k , m − e j , S k , m ) = β j ( k j − m j + 1) . (9)To calculate β j we use a straightforward ensemble av-erage, or mean-ﬁeld approximation, over the set of alluninfected nodes. To obtain the expected fraction ofneighbours undergoing transitions, we observe the num-ber of egos undergoing transitions at time t , and countthe number of neighbour transitions thus produced. Thatis, when an uninfected node in class ( k , m ) becomesinfected, which occurs with probability F k , m dt , it pro-duces k j − m j uninfected nodes that observe neighbourtransitions. The number of such edges across the en-tire network is given by (cid:80) S p k ( k j − m j ) F k , m s k , m , wherethe sum is over all uninfected classes. We comparethis to the total number of uninfected-uninfected edges, (cid:80) S p k ( k j − m j ) s k , m , giving the neighbour transition rate β j dt = (cid:80) S p k ( k j − m j ) F k , m s k , m (cid:80) S p k ( k j − m j ) s k , m dt, (10) which has previously been used in master equation so-lutions of binary-state dynamics on static networks, seemain text for references. At this point in the derivation,we could stop and write W = W ego + W neigh in order torecover the static network transition matrix. These tran-sitions appear as the right diagonals in lattice diagramsof conﬁguration space, Supplementary Figs. (1) and (2),where they preserve the degree vector k . Further, ﬂuxmeasurements of these transitions in Monte Carlo sim-ulation appear to show that Supplementary Eq. (10) isexact, in the limit of large temporal networks. See Sup-plementary Fig. 8(b) and (c) for an illustration. Positive edge transitions . The temporal nature of theunderlying network is implemented using changes to edgestates in the network. A positive edge transition refersto the change in a node’s conﬁguration due to an incre-ment in the state of one of its edges over an interval dt .Regardless of the interpretation of edge state and themechanism driving the transition, the probability of thisoccurring on a randomly selected edge of type j is µ j dt .In fact, we delay until the next section our discussionsof models of temporal networks - for now it suﬃces toassume that they can be represented by networks withdynamic edge state. For a conﬁguration ( k , m ), a pos-itive edge transition on a j -type edge means losing anedge of that type, and gaining an edge of type j + 1. Forbrevity, we introduce the term ∆ ± j = − e j + e j ± , corre-sponding to the change in the degree vector k imposedby such a transition. That is, an adjacent node loses a j -type edge, and gains a j ± k . The symmetry relations∆ + j = − ∆ − j +1 and ∆ − j = − ∆ + j − clearly hold.The conﬁguration that a ( k , m ) node enters whenundergoing a positive transition on a j -type edge is( k +∆ + j , m ), or ( k +∆ + j , m +∆ + j ), depending on whetherthe neighbouring node was uninfected or infected, so thatwe have W ( S k , m , S k +∆ + j , m ) = µ j ( k j − m j ) (11)and W ( S k , m , S k +∆ + j , m +∆ + j ) = µ j m j , (12)respectively. Similarly, nodes may enter the conﬁguration( k , m ) through a positive transition on a j − k − ∆ − j , m ) and ( k − ∆ − j , m − ∆ − j ). We have W ( S k − ∆ − j , m , S k , m ) = µ j ( k j − m j + 1) (13)and W ( S k − ∆ − j , m − ∆ − j , S k , m ) = µ j ( m j + 1) , (14)if the neighbour is uninfected or infected, respectively.Combining these terms gives the ﬂow through the con-ﬁguration ( k , m ) due to positive transitions on j -typeedges. Note that we typically impose boundary condi-tions, if not because sharp cutoﬀs arise naturally in manytemporal network models, because we require the conﬁg-uration space to remain ﬁnite. If n remains the numberof allowed edge states, we impose the condition that onecannot observe a positive edge transition on an n -typeedge. As such, a node cannot lose an n -type edge througha positive edge transition, and we write µ n = 0. Negative edge transitions . A negative edge transitionrefers to the change in a node’s conﬁguration when anadjacent event is forgotten over an interval dt , causinga decrease in the number of memorable events on thatedge. As deﬁned above, this occurs with probability ν j dt .When an event terminates on an edge of type j , it gainsan edge of type j −

1, and loses and edge of type j , pre-serving the total degree k . The relations between nodeclasses due to negative edge transitions mirror their pos-itive counterparts, and we include them here for com-pleteness. A node in conﬁguration ( k , m ) moves to class( k + ∆ − j , m ) and ( k + ∆ − j , m + ∆ − j ), if an event on a j -type edge terminates while connected to an uninfectedor infected neighbour, respectively. This occurs at rates W ( S k , m , S k +∆ − j , m ) = ν j ( k j − m j ) (15)and W ( S k , m , S k +∆ − j , m +∆ − j ) = ν j m j . (16)Similarly, nodes may enter the conﬁguration ( k , m ) witha negative edge transition on a j +1 edge, from the classes( k − ∆ + j , m ) and ( k − ∆ + j , m − ∆ + j ) as follows. If the eventterminates between a node and an uninfected neighbour, we have W ( S k − ∆ + j , m , S k , m ) = ν j ( k j − m j + 1) (17)and W ( S k − ∆ + j , m − ∆ + j , S k , m ) = ν j ( m j + 1) , (18)if the neighbour is infected. Combining these terms givesthe ﬂow through the conﬁguration ( k , m ) due to negativeedge transitions. Note the boundary condition, namelythat a j = 0 edge is the case where no events have takenplace in the last η interval. As such, a node cannot losea 0-type edge through a negative edge transition, and wereﬂect this by writing ν = 0. These transitions appear asleft diagonals in lattice diagrams of conﬁguration space,Supplementary Figs. (1) and (2), and preserve the totalnumber of infected neighbours m . Further, ﬂux mea-surements of these transitions in Monte Carlo simulationshow that constant µ j and ν j can be excellent approxi-mations of non-Markovian systems. See SupplementaryFig. 8(d) to (f) for an illustration. Calculating µ j and ν j for renewal processes In this section, we calculate the rates of positive andnegative edge transitions µ j and ν j in the stochastic tem-poral network model discussed in the previous section.Here, µ j dt gives the probability that at time t , for a re-newal process having already produced j events in thepreceding time window of duration η , a ( j + 1)-th eventis observed between time t and t + dt . Conversely, ν j dt gives the probability of an event exiting the η window.This is illustrated in Supplementary Fig. 3. At the outset,describing such a process with constant rates µ j and ν j seems inappropriate, as it is memoryless only in so-called event space . In this representation, a renewal process isnothing other than a sequence of trials, { τ , τ , . . . } , orthe random sampling of a value τ from a distribution ψ ( τ ). In this sense, the process is memoryless. However,in the resulting time-series for continuous t , the process isnon-Markovian, as the time of the next event was decidedat the time of the previous event, and the probability ofan event occurring at a time between these points is zero.This is in contrast to a Poisson process, where the changein edge state due to the occurrence of an event is constantin time. Thus, on a microscopic level, where we observea stochastic process on a single edge, a rate descriptionis nonsensical.We note, however, that in our master equation formal-ism, classes ( k , m ) really represent ensembles of nodes,and although constant rates cannot be identiﬁed on a mi-croscopic level, useful quantities do exist on a network-wide macroscopic level. That is, calculating the fractionof j -type edges that change state over an interval dt turnsout to be strongly heterogeneous for varying j , which isclearly not the case for a Poisson process. The hetero- (a)(b) tt − η t j +1 t j τ j . . .t t τ t τ τ tt − η t j τ . . .t t τ t τ τ Supplementary Figure 3.

Enumeration of edge state con-ﬁgurations.

Conﬁguration of j events occurring over an in-terval of length η , where η amounts to memory of observedevents. In plot (a) we enumerate all conﬁgurations of edgesin state j at time t , with a ( j + 1)-th event occurring over theinterval t < t j +1 < t + dt . In plot (b) we enumerate all theconﬁgurations of edges in state j . We use this to calculate therate of positive edge transition µ j , and edge state distribution E j , respectively. Interevent times τ , . . . , τ j are drawn from ψ , whereas τ (cid:48) and τ (cid:48)(cid:48) are drawn from Ψ, deﬁned in the text. geneity of the distribution ψ is reﬂected in the hetero-geneity of µ , ν and E .The rate of positive edge transition µ j dt is calculatedby ﬁnding the probability of a ( j + 1)-th event occurringover a given interval dt , on the condition that j eventshave already been produced in the preceding time inter-val of duration η . This is illustrated in SupplementaryFig. 3(a), where we set t = η for convenience. In Supple-mentary Fig. 3, the interevent times τ , . . . , τ j are drawnfrom the distribution ψ , and the times τ (cid:48) and τ (cid:48)(cid:48) from itscomplementary cumulative distribution Ψ, also known asthe residual time distribution. It is deﬁned asΨ( τ ) = (cid:90) ∞ τ ψ ( t ) dt, (19)and gives the probability that the time between eventsis of duration at least τ . We introduce the domain T oftimes spanned by the conﬁgurations allowed in Supple-mentary Fig. 3, or the times t < t < ... < t j in an inter-val of duration η , such that t j − t < η and t j +1 − t < η .Note that consecutive event times t j and t j +1 cannot co-incide, with interevent times drawn from distributions ψ and Ψ, which deﬁned over positive τ . We write T = (0 , η ] × (0 , η − t ] × (0 , η − t ] × . . .. . . × (0 , η − t j − ] × (0 , η − t j − ] ⊆ R j + , (20)where R j + is the j -dimensional space of positive real num-bers. The probability of observing the conﬁguration inSupplementary Fig. 3(a) is Ψ( τ (cid:48) ) ψ ( τ ) . . . ψ ( τ j ), which isthe same as Ψ( t ) ψ ( t − t ) . . . ψ ( η − t j ) given that we’veset the time t to η for simplicity. Similarly, the conﬁgu-ration in Supplementary Fig. 3(b) is observed with prob-ability Ψ( τ (cid:48) ) ψ ( τ ) . . . ψ ( τ j − )Ψ( τ (cid:48)(cid:48) ), which is the same asΨ( t ) ψ ( t − t ) . . . ψ ( t j − t j − )Ψ( η − t j ). The weighted sum of all such conﬁgurations yields the probability ofobserving a j type edge undergoing a transition to state j + 1 over an interval dt , and the probability of randomlyselecting an edge in state j , as shown in SupplementaryFig. 3(a) and (b), respectively. The only diﬀerence is theﬁnal term, which is drawn either form ψ or Ψ. Withrespect to the domain T , these sums can be written (cid:90) T dt . . . dt j Ψ( t ) ψ ( t − t ) ψ ( t − t ) × . . .. . . × ψ ( t j − t j − ) ψ ( η − t j ) = Ψ ∗ ψ ∗ j , (21)and (cid:90) T dt . . . dt j Ψ( t ) ψ ( t − t ) ψ ( t − t ) × . . .. . . × ψ ( t j − t j − )Ψ( η − t j ) = Ψ ∗ ψ ∗ ( j − ∗ Ψ , (22)respectively. Here, ψ ∗ j is the j -th convolution power of ψ , and is discussed at length in following sections. Toobtain the rate µ j at which edges in state j transition tostate j + 1, Supplementary Eq. (21) must be normalisedby Supplementary Eq. (22), the probability E j that arandomly selected edge is in state j . Schematically, thiscorresponds to normalising the transition in Supplemen-tary Fig. 3(a) by those in Supplementary Fig. 3(b). Notethat ν j dt , the probability that an edge in state j forgetsan event over an interval dt is the same as µ j − undertime reversal, up to the normalising constant. As such,the rates µ j and ν j , along with the distribution E j , canbe written compactly as µ j dt = Ψ ∗ ψ ∗ j Ψ ∗ ψ ∗ ( j − ∗ Ψ dt (23)and ν j dt = Ψ ∗ ψ ∗ ( j − Ψ ∗ ψ ∗ ( j − ∗ Ψ dt, (24)with E j = Ψ ∗ ψ ∗ ( j − ∗ Ψ . (25)These quantities can be calculated either numerically oranalytically, depending on the tractability of the chosendistribution ψ . In general, if ψ ( τ ) is locally integrable,then the Laplace transform of ψ and Ψ exists and al-lows us to calculate the convolution as a product in thefrequency domain, which will be useful especially if j islarge. Since we don’t impose any cutoﬀs on ψ and Ψin the text, j indeed can grow arbitrarily large, underbursty dynamics. If we denote the Laplace transform of ψ ( τ ) by L { ψ ( τ ) } = ˆ ψ ( s ) = (cid:90) ∞ ψ ( τ ) e − sτ dτ (26)then the transform of Supplementary Eq. (19) can bewritten as ˆΨ( s ) = 1 − ˆ ψ ( s ) s . (27)Finally, we can argue that by induction from j = 0, andusing the fact that the distribution E is constant at sta-tionarity of the renewal process, that µ j E j = ν j +1 E j +1 ,meaning that along with E j , the rates µ j and ν j are sta-tionary. The observed experimental rates µ j , ν j and E j match exactly the predicted values, for increasingly largenetworks. Illustration using the exponential distribution

In this section we explicitly calculate the edge transi-tion rates µ j and ν j for an exponential interevent timedistribution ψ . This is an exercise to illustrate theLaplace inversion procedure, in general we calculate thesequantities numerically, as described in SupplementaryNote 5. The exponential distribution is an importantbenchmark in this work, and we consider two alternativegeneralisations, namely the gamma and Weibull distribu-tions. Both reduce to the exponential distribution when σ τ = (cid:104) τ (cid:105) = 1, and recover the rates given here. Considersuch a distribution with average (cid:104) τ (cid:105) deﬁned by ψ ( τ ) = 1 (cid:104) τ (cid:105) e − τ/ (cid:104) τ (cid:105) (28)with Ψ( τ ) = e − τ/ (cid:104) τ (cid:105) , (29)having transforms ˆ ψ ( s ) = 1 (cid:104) τ (cid:105) s + 1 (30)and ˆΨ( s ) = 1 − ˆ ψs = (cid:104) τ (cid:105)(cid:104) τ (cid:105) s + 1 , (31)respectively. Substituting these transforms into Supple-mentary Eq. (25), and applying the convolution theoremallows us to calculate the expected size of the set of edgesin state j , which is also the normalising constant in therates µ j and ν j , as the expression for L { E j } simpliﬁesto to ˆΨ · ˆ ψ j − · ˆΨ = (cid:104) τ (cid:105) ( (cid:104) τ (cid:105) s + 1) j +1 . (32)We use the fact that L − { s j +1 } = η j /j !, for integer j ,a known Laplace transform relating to the gamma func-tion. We use also the translation property L − { ˆ ψ ( s + a ) } = e − a ψ , which directly results from the deﬁnitionSupplementary Eq. (26). The inverse Laplace transformof the above expression can then be written explicitly asΨ ∗ ψ ∗ ( j − ∗ Ψ = (cid:104) τ (cid:105) e − η/ (cid:104) τ (cid:105) j ! (cid:18) η (cid:104) τ (cid:105) (cid:19) j , (33)meaning E is simply the Poisson distribution with mean η/ (cid:104) τ (cid:105) . This is expected, due to our construction of thememory window, and the fact that an exponential in-terevent time distribution recovers a Poisson process.Similarly, the Laplace transform of the numerator in Sup-plementary Eqs. (23) and (24) can be used to calculateΨ ∗ ψ ∗ j = e − η/ (cid:104) τ (cid:105) j ! (cid:18) η (cid:104) τ (cid:105) (cid:19) j (34)and Ψ ∗ ψ ∗ ( j − = e − η/ (cid:104) τ (cid:105) ( j − (cid:18) η (cid:104) τ (cid:105) (cid:19) j − , (35)yielding µ j dt = 1 (cid:104) τ (cid:105) dt (36)and ν j dt = jη dt, (37)after normalising by Ψ ∗ ψ ∗ ( j − ∗ Ψ. In this special case ofexponentially distributed τ , we are able to derive theserates using much simpler arguments, namely with thedeﬁnition of the Poisson process, and the Poisson distri-bution. Crucially, µ j has no j dependence here, whichclearly expresses the memoryless property of the Poissonprocess. These rates may be veriﬁed by simulating an en-semble of independent, stationary renewal processes, andobserving the ﬂux in the system over an interval ∆ t . Theﬂow through the set E j over that interval, scaled by thesize of that set and the size of the measurement window,give the rates µ j and ν j . Alternatively, by simulating asingle renewal process for a suﬃciently long time, andmeasuring its change in behaviour over each interval ∆ t ,one obtains rates µ j and ν j that are identical to thosecalculated in the ensemble. Convolution powers, an aside

In order to calculate the distribution of edge states E j , as well as the mean ﬁeld edge transition rates µ j and ν j , we need an eﬃcient method for computing con-volution powers. In general a convolution is deﬁned . . . j − j j + 1 . . .ν j µ j Supplementary Figure 4.

Random walk interpretation ofedge state.

Positive and negative edge transition rates, µ j and ν j , act as a signature of the non-Markovianity in the re-newal process model. On a macroscopic level, this model isindistinguishable from a random walk as illustrated above,where the transition rates are provided by µ j and ν j by con-struction. In contrast, this equivalence is broken when takinginto account node dynamics on the level of classes ( k , m ), aswe shown in Supplementary Fig. 8. See also SupplementaryFig. 5 for illustrative values of the rates µ j and ν j for varying σ τ . for two real valued functions f and g over the domain f, g : ( −∞ , ∞ ) → R . However, in the case where f and g take non-negative values, as is the case in our study, theconvolution reduces to( f ∗ g )( t ) = (cid:90) ∞−∞ f ( τ ) g ( t − τ ) dτ = (cid:90) t f ( τ ) g ( t − τ ) dτ. (38)We use this to express edge-state properties, Supplemen-tary Eqs. (21) and (22), as convolutions over the non-negative reals. It is worthwhile noting the conventionthat if ψ ∗ j is the j -th convolution power, or ψ ∗ j = ψ ∗ ψ ∗ . . . ∗ ψ (cid:124) (cid:123)(cid:122) (cid:125) j terms , (39)then the zeroth order convolution, ψ ∗ = δ , is simply theDirac delta function, the identity of convolution. This isused in the j = 1 case of Eqs. 23, 24 and 25. Supplementary Note 2. Random walk equivalence

In this Supplementary Note, we discuss the expectedsteady-state network topology that emerges due to ourstochastic temporal model. In particular, we describethe existence of a pure Markovian system with identi-cal macroscopic dynamics to our non-Markovian renewalprocess model. We consider the dependence of the edgetransition rates µ j and ν j on our choice of interevent timedistribution ψ , and in particular, its parameterisation interms of standard deviation σ τ .We assume an arbitrarily large network consisting ofindependent, stationary renewal processes, such that thetime to the next event at any time t follows the resid-ual distribution Ψ( τ ). We illustrate in SupplementaryFig. 5 the edge transitions rates µ j and ν j that emergefrom a gamma interevent time distribution ψ ( τ ), for in-creasing values of standard deviation σ τ . Since µ j and ν j are heterogeneous, they can be interpreted as pro-viding a signature of the non-Markovianity inherent to the renewal process microscopically. For instance, when σ τ = (cid:104) τ (cid:105) = 1, the gamma distribution reduces to an ex-ponential distribution, corresponding to a Poisson pro-cess. That this process is memoryless is reﬂected in theedge transition rate µ j = 1 / (cid:104) τ (cid:105) , being homogeneous in j .The greater the departure from the Poisson process, thegreater the heterogeneity in j , illustrated in Supplemen-tary Fig. 5(a).A central result in this work is to note that foran uncorrelated ensemble of edges, the renewal pro-cess model is indistinguishable from a continuous-timeMarkov chain, namely, a one-dimensional biased randomwalk, as illustrated in Supplementary Fig. 4. In otherwords, the probability of a randomly selected edge under-going a transition over an interval dt in the renewal pro-cess model is trivially identical to a Markov chain whereby construction, transitions occur at rates µ j and ν j . In-deed, since all temporal network information is stored in µ j , ν j and E j in the master equation, any class of systemproducing a given set of µ j , ν j and E j values has the samepredicted dynamics. Since in the special case of a Markovchain transition rates are exact at all scales (both locallyon the scale of a single edge, and globally on the scaleof an uncorrelated ensemble), the master equation solu-tion is exact here. Since Supplementary Eqs. (23) and(24) represent a mean ﬁeld approximation on the scale ofclasses ( k , m ) in the renewal process model, deviations inthe master equation solution emerge here. These errorsprovide a measure of the extent to which non-Markoviandynamics can be captured by the heterogenity of a simpleMarkov chain. Supplementary Note 3. Edge-state distribution

In this Supplementary Note we mention some basicproperties of the edge state distribution. First, for allchoices of ψ ( τ ) and η in this work, a useful conservedquantity is the expected edge state, or the expected num-ber of events per edge across the entire network. If (cid:104) τ (cid:105) is the mean of this distribution, and η is observer mem-ory, the expected edge state is (cid:104) E (cid:105) = η/ (cid:104) τ (cid:105) , and is usefulfor monitoring the accuracy of the implementation. Fur-ther, note that if E is the set of underlying edges in thenetwork, the superposition of |E| renewal processes con-verges to a exponential distribution with mean (cid:104) τ (cid:105) / |E| ,for large E . As such, we expect |E| dt/ (cid:104) τ (cid:105) events per timewindow dt in simulation. Temporal percolation transition

As discussed in the main text, the probability that arandomly selected edge is in state zero is ξ E . One canuse ξ E to determine whether active edges in the network,the set of all edges in state one or higher, are expected toform a giant component at any given time. If a giant com-ponent exists, we say that percolation has taken place. j l og µ j σ τ / h τ i log j l og ν j Supplementary Figure 5.

Illustration of edge transition rates.

Positive and negative edge transition rates, µ j and ν j ,respectively, for a gamma distributed interevent time with mean (cid:104) τ (cid:105) = 1. On a macroscopic level, our renewal process modelis indistinguishable from a random walk model of edge state, where the state j increments and decrements at rates µ j and ν j . Legend in (b) applies also to (a). Observer memory and mean interevent time are given by η = (cid:104) τ (cid:105) = 1, according toour stochastic temporal network model. The exponential distribution is recovered in the case of σ τ = (cid:104) τ (cid:105) = 1, producing amemoryless system, as indicated. If percolation does not occur, nodes form ﬁnite activeclusters whose size goes to zero in the limit of large net-works. This is true even if the underlying network itselfconsists of a giant component. The percolation transitionhelps to explain the sharp transition in dynamics in theupper right corners of Fig. 2 in the main text, separat-ing regimes of fast and slow information diﬀusion. Thepercolation condition for a conﬁguration model networkwith degree distribution q ( k ) is given by ∞ (cid:88) k =0 k ( k − q ( k ) = 0 , (40)where q ( k ) = ∞ (cid:88) l ≥ k p ( l ) (cid:18) lk (cid:19) (1 − ξ E ) k ξ l − kE , (41)is the degree distribution obtained by randomly remov-ing a fraction ξ E of edges from a network with degree dis-tribution p ( k ), which is eﬀectively the network inducedwhen removing edges in state zero in out temporal model. Maximum edges state

In our analytic solution, we introduce n to denote themaximum edge state j . In experiment, no such restric-tions are imposed when sampling the interevent time dis-tributions ψ and Ψ, in contrast to related work [13] whereit is common to introduces upper lower bounds on τ . Asa consequence, bursts in activity can lead to arbitrarilylarge edge states j . However, as we see in the structureof conﬁguration space, if n is the maximum edge state al-lowed in the system, the number of equations grows like Θ( k n ). Clearly, in the interest of the numerical imple-mentation of the master equation solution, n cannot bearbitrarily large.It is noteworthy that despite edge state being unre-stricted in simulation, the resultant diﬀusion dynamicscan be accurately solved using relatively small values of n . Consider that large bursts are most common whenthe interevent time standard deviation σ τ is large. Inthis regime, the fraction of edges in state j = 0 is sig-niﬁcant. As a consequence, nodes observing bursts ofactivity on some edges frequently observe no activity onothers. Indeed for large enough σ τ , it is rare for a node toobserve more than on active neighbour, with that activeneighbours generally being in a very large state j . Forthe relative threshold (RT) model, such a node is infectedwith high probability if the neighbouring node is active,since the threshold φ is guaranteed to be overcome here.Similarly, for the absolute threshold (AT) model, any in-fected neighbour in state j = (cid:100) M φ (cid:101) or higher is likely toadopt. Finally, for the susceptible-infected (SI) model,consider that the duration of a spike in activity is on theorder of η , i.e., the duration of observer memory. Sincethe infection rate increases proportionally to the size ofthe burst, a large burst leads to infection soon after itsobservation, and long before they have left the time win-dow. Since the local neighbourhood of the newly infectednode is likely sparse in this setting, a much smaller burstwould lead to an identical diﬀusion outcome. Eﬀects suchas these mean that surprisingly low value of n are suﬃ-cient to accurately model the diﬀusion process. Supplementary Note 4. Monte Carlo simulation

In this Supplementary Note we discuss the MonteCarlo simulation methods used in this work. Since nodedynamics do not feed back into edge dynamics, we as-sume a steady state renewal process is in place at t = 0.This involves initialising the system with one τ drawnfrom the tail distribution Ψ. We start the simulation at t = − η , so that at time t = 0, the system is at steadystate. Parallel event sequences

Our model of node dynamics is Markovian, since anuninfected node v becomes infected at a constant rate F v . In the case of threshold models, the transmissionrate is a step function, whose upper value is set conven-tionally to one. A straightforward Monte Carlo simula-tion in this case is to advance in time with ﬁxed intervals∆ t = N , where N is network size, and randomly select-ing a node v to trigger with probability F v at each step.This approach is unsuitable when transmission rates areunbounded, since the time step ∆ t cannot be rescaled apriori to preserve the random node selection approach.This is the case for the SI rule in our model. We deﬁne thetransmission rate here to be proportional to edge state F k , m = max( p, m · λ ), where λ = λ w , for a node inclass ( k , m ). Since m and λ are unbounded in our sim-ulations, as we impose no restriction on ψ and Ψ, burstsof activity due to our renewal process model can resultin arbitrarily high F .For this reason, we prefer an event-based Gillespie al-gorithm, which is equivalent, but advances in time byjumping to the next event , rather than by uniform incre-ments ∆ t . In the remainder of this section we discuss theimplementation of sequences of these events. Algorithm 1

Static network event sequence procedure MonteCarloStatic (Ξ v ) while Ξ v not empty do apply head of Ξ v end while end procedure For a graph G ( V , E ), we deﬁne two types of events,node events, deﬁned over the node set V , and edge events,deﬁned over the edge set E . A node event is implementedas the tuple { t v , v } , and edge events { t e , e uv , s } , respec-tively. Node events amount to the infection of node v , ata time t v , and edge events the positive or negative changein the state of edge e uv , depending on the sign of the in-dicator s = ±

1, according to our stochastic temporalnetwork model. Monte Carlo simulation is implementedas two time ordered, dynamic sequences of events, Ξ v andΞ e , for node and edge event types, respectively. WhileΞ e is independent of node dynamics, both node end edgeevents feed back and cause a potential reordering of Ξ v ,as explained below. The node event sequence is initiallyof size N = |V| , since we have one event for each node inthe network, with the leading event being removed when all preceding edge activity has been carried out, i.e., when t v < t e , for the leading terms in each sequence. The ac-tion carried out is best illustrated by considering a staticnetwork, Algorithm 1. Here, the apply head instructionmeans to trigger the node in question, remove it fromΞ v , and update the transmission rates of its neighbours,and in turn, their position in Ξ v . In our model, the edgeevent sequences has approximately constant size, apartfrom some ﬂuctuations due to the fact that η/ (cid:104) τ (cid:105) is onlythe expected edge state, the absolute number of events in η -memory can go up and down. As networks increase insize, the size of the edge event sequence Ξ e converges to(2 + η/ (cid:104) τ (cid:105) ) |E| . Assuming that a non-zero level of noiseis present, p >

0, then the Ξ v will eventually be emptied. Algorithm 2

Temporal network event sequences procedure MonteCarloTemporal (Ξ v , Ξ e ) t v , t e ← dequeue Ξ v , Ξ e while Ξ v not empty do while t e < t v do apply head of Ξ e t v , t e ← dequeue Ξ v , Ξ e end while apply head of Ξ v t v ← dequeue Ξ v end while end procedure Algorithm 2 is the temporal extension of Algorithm 1.In each, the time of infection of every node v is deter-mined at t = 0. This is done by drawing from an expo-nential distribution with mean F v , i.e., taking the naturallogarithm of a uniform random variable on (0 , − F v . A node event is permanently erased from the se-quence if t v < t e , as mentioned. At this time, neighbours u of v have their transmission rates F u recalculated, asin Algorithm 1, as they now have an additional infectedneighbour. Additionally, the infection time of a node isrecalculated if a change in its local neighbourhood takesplace, such as activity on an adjacent edge, or the infec-tion of a neighbouring node. Since event sequences aretime ordered, a node event is ﬁrst erased from its posi-tion in the sequence, and then reinserted when such acalculation takes place. No such reordering take placefor the edge event sequence, as once an event is insertedhere, it is only removed when its time t e is at the frontof the queue. While t e < t v , where the events in ques-tion are the leading events of each queue, there are twopossible actions for the apply head instruction for Ξ e inAlgorithm 2. The ﬁrst, if s = +1, the leading edge eventis erased and replaced with two new events on the sameedge e uv , occurring at time t e + τ , where τ is drawn fromthe interevent time distribution ψ . At the same time, anevent is inserted for t e + τ + η , with s = −

1, correspond-ing to the decrementing of that same edge η time stepslater. If an s = − t e < t v , then the event is removed from0the sequence without replacement.In the case of static networks, Gillespie algorithms al-low massive speedup relative to the random selection ap-proach. This is not the case for the node dynamics inour temporal network model, where a Gillespie event se-quence aﬀords very little speedup. Here, the edge activitysequence is a bottleneck, since it still requires |E| dt/ (cid:104) τ (cid:105) edge updates per dt on average. As such, it is not forthe beneﬁts to speed that we use a Gillespie type eventsequence for node updates, it is to account for arbitrar-ily large spikes in node transmission rates, particularlyunder the SI rule for temporal networks. Inverse transform sampling of ψ and Ψ In this section we discuss the numerical pipeline usedto simulate renewal processes, a procedure that amountsto accurately sampling from the interevent time distribu-tions ψ ( τ ) and Ψ( τ ). Due to the large values of the stan-dard deviation σ τ to be examined in this work, currentlyavailable software could not be used to sample values of τ . While the excellent library for C++ allowsrapid sampling from the lognormal, Weibull and gammadistributions ψ , it appears to become ineﬃcient for large σ τ , especially for the gamma distribution. In any case,directly sampling from Ψ, which often involves specialfunctions, is beyond the scope of this library. Because ofthis, we build our own sampling routine.We favour an inverse transform sampling technique,where random values are sampled on an interval (0 , x ∈ (0 , − ( x ) provides a sample τ value from ψ . Further, random sampling of the residual distribution requires ﬁnding the cdf of Ψ, and being ableto approximate its inverse.First, the problem of sampling from ψ is that while Ψis known, its inverse generally is not. Second, we can’teasily sample from Ψ since its cdf is generally unknownto begin with. In fact, we often have enough diﬃcultysimply evaluating Ψ, as is the case when ψ is the gammadistribution, where Ψ has no closed form. As a result,we can only determine Ψ − approximately. We do thisby ﬁrst generating a grid of Ψ values as shown in Sup-plementary Fig. 6. Due to the large σ τ values of interestto us, it is diﬃcult to estimate a priori the desired upperand lower limits τ of the grid, which vary signiﬁcantlydepending on the choice of ψ . Said diﬀerently, we want (cid:15) to be as small as possible, in order to allow for thesampling of extreme values of τ , which are critical to thedynamics of our system. Further, it goes without say-ing that taking the small (cid:15) limit ensures that (cid:104) τ (cid:105) and σ τ are respected, which is crucial since the latter is acontrol parameter in our experiment. As shown in Sup-plementary Fig. 6, we iterate outward from τ = (cid:104) τ (cid:105) = 1uniformly on a logarithmic scale until a desired interval( (cid:15), − (cid:15) ) is obtained. An arbitrary precision library is − − ...... Ψ τ − (cid:15)(cid:15) Supplementary Figure 6.

Sampling probability distribu-tions.

Numerical construction of Ψ, a probability densityfunction that can be accurately evaluated at any point τ , atsome cost, for all distributions used in this work. This grid re-sults from iterating outwards from τ = (cid:104) τ (cid:105) = 1, for increasing τ until Ψ < (cid:15) , blue grid, and for decreasing τ until Ψ > − (cid:15) ,red grid. Inverse transform sampling is then performed on theinterval ( (cid:15), − (cid:15) ) to provide samples of ψ , using a bisectionmethod on the grid, to ﬁnd the corresponding τ . A third orderspline interpolation on a logarithmic scale provides interme-diate values of τ . Additionally, the grid can be cumulativelysummed, and inverse transform sampling carried out on thespline of the resulting grid, to provide samples of Ψ. required to accurately determine Ψ, and we prefer GNUMPFR, a multiple-precision binary ﬂoating-point librarywith correct rounding. See Supplementary Note 9 fordetails regarding the distributions themselves.With such a grid accurately computed, we carry outthird-order spline interpolation on a logarithmic scale toallow rapid evaluation at arbitrary points τ within thegrid. As a ﬁrst application, the spline can be used torapidly solve Ψ( τ ) = x using a bisection technique. Thisprovides samples τ of ψ . The grid can then be cumu-latively summed to provide the cdf of Ψ over the samedomain. Performing approximate inverse transform sam-pling on the resulting grid returns samples τ of Ψ.It is worthwhile mentioning that rejection samplingtechniques appear to be out of the question here. Thisis due to the extreme bounding values of τ necessary forthe standard deviations σ τ studied in our choice of heavy-tailed distribution. Even for clever choices of envelopingfunctions for ψ and Ψ, it is likely that the acceptancerates will be prohibitively low. Finding such functionsremains an interesting challenge, but since our bisectionapproach converges exponentially quickly, it is hard toimagine that a choice of envelope exists that makes re-jection sampling faster.Finally, we comment on experiments involving verylarge standard deviations σ τ . Here, not even large net-works running for a long time provide an unbiased sampleof ψ . Following the central limit theorem, the standarddeviation of the sampled mean interevent time equals σ τ / √ n s , if n s is the number of samples in question, whichfor the sake of argument is on the order of the numberof edges in the network. High quality experimental re-sults can be obtained by simulating large networks, or1by averaging over realisations. Note further that large σ τ experiments in our model happen to coincide withvery long simulation times. As a result, even for large σ τ there is little noise in our results due to the substan-tial runtime. Finally, ξ E plays a very important role,and amounts to the fraction of samples of Ψ that arelarger than η . A consequence of this is that for large σ τ ,most edges don’t even participate, having drawn residualtimes at t = 0 that are longer than the duration of theexperiment, determined by ρ = 1 − e − pt . Supplementary Note 5. Laplace transform inversion

In the following sections we describe the numericalpipeline for obtaining the rates µ j and ν j , as well as theedge-state distribution E j . Although these values couldbe calculated by hand for the case of the exponential dis-tribution, and maybe even the gamma distribution as itsLaplace transform is known, we would like to be able todo this numerically for arbitrary ψ ( τ ), including those forwhich the Laplace transform of ψ and Ψ are not known.These rates are expressed in terms of convolutions.Since we are interested in j -th order convolutions, for ar-bitrary positive integers j , we prefer to perform productsin frequency space, as permitted by the Laplace trans-form. A j -th order convolution for large j increases ex-ponentially in complexity, and can only be performeddirectly for very small j . Further, E j can be broad when σ τ is large. The Laplace transform is deﬁned asˆ f ( s ) = L{ f } ( s ) = (cid:90) ∞ e − st f ( t ) dt, (42)where f is a real-valued function of time t , and ˆ f a com-plex valued function of the complex variable s = σ + iω .For the Gaver-Stehfest algorithm in the following section, s is always real, so we can set ω = 0 and writeˆ f ( s ) = (cid:90) ∞ e − σt cos ωtf ( t ) dt − i (cid:90) ∞ e − σt sin ωtf ( t ) dt. (43)This is necessary when the Laplace transform of f isnot know, and must be evaluated numerically, as is thecase for the lognormal and Weibull distributions, usedthroughout this work. There exists a useful frameworkthat unify these diﬀerent algorithms [51], that expresseach approach in terms of a weighted sum of ˆ f values.We shall see that the computation of the forward Laplacetransform is the bottleneck, as opposed to ﬁnding theweights. As such, the preferred method depends on howmany numerical inversions of f are required. This is 2 M in the Gaver-Stehfest algorithm, 2 M + 1 in the Euler al-gorithm and M in the Talbot algorithm, with M control-ling the desired accuracy (see [51] for details). However,only the real integral need be found in the Gaver-Stehfestalgorithm, which we prefer for this reason. Gaver-Stehfest algorithm

We assume a Laplace transform ˆ f is known, and thatwe wish to recover the origin unknown function f . In ourcase, f corresponds to convolutions of ψ and Ψ. For any t > M , so-called Salzer summationyields the Gaver-Stehfest inversion formula, f ( t, M ) = ln(2) t − M (cid:88) k =1 ζ k ˆ f ( k ln(2) t − ) (44)where the weights ζ k are given by ζ k = ( − M + k M ! min( k,M ) (cid:88) j = (cid:98) ( k +1) / (cid:99) j M +1 (cid:18) Mj (cid:19)(cid:18) jj (cid:19)(cid:18) jk − j (cid:19) . (45)Conveniently, the weights ζ k are independent of thetransform being inverted. This means that after a preci-sion M is chosen, weights can be stored in a vector ζ ofdimension 2 M to be reused for a number of transformsˆ f . We discuss the numerical implementation further infollowing sections, however mention here that we use theGNU Multiple Precision arithmetic library GMP for ζ k ,in particular its integer summand, and the related MPFRlibrary for ﬂoating point arithmetic for manipulating ˆ f .Regarding the weights ζ k , a useful property for bench-marking is the fact that for all M ≥ M (cid:88) k =0 ζ k = 0 , (46)due to the ( − M + k factor in the deﬁnition of ζ k , theseweights oscillate around zero. The Gaver-Stehfest algo-rithm requires a system precision of approximately 2 . M tracked bits, which we input to the constructors of vari-ables in MPFR. Numerical pipeline

We now describe speciﬁcally how the inversion proce-dure in the previous section can be used to determinethe values E j , µ j and ν j described in preceding sections.For simplicity, we refer to these quantities in this sectionusing vectors E , µ and ν , whose dimension is determinedby the size of the edge state space n in our master equa-tion, which need not be determined a priori . Since E and µ are special cases deﬁned in the Methods sectionof the main text, we write E = ( E , E , . . . , E n ) T for theedge state distribution, and µ = ( µ , µ , . . . , µ n ) T and ν = ( ν , ν , . . . , ν n ) T for positive and negative edge tran-sitions. The function f ( t, M ) in the Gaver-Stehfest algo-rithm corresponds to E j ( η, M ), µ j ( η, M ) and ν j ( η, M ),where M as before is a parameter of the Gaver-Stehfestalgorithm that tunes the accuracy of the approximation.2Calculating the distribution E , and the rate vectors µ and ν amounts to three separate matrix vector products.We require the vector ζ , of dimension 2 M , as per theGaver-Stehfest algorithm whose k -th element is given bySupplementary Eq. (45), and the Laplace transforms of ψ and Ψ evaluated at s = k ln(2) t − , denoted ψ k andΨ k , as per Supplementary Eq. (27). We deﬁne n as themaximum edge state, which can be as large as one likeshere, it is limited only by M , which must be increased forincreasing n . These quantities are then used to populatethe n × M dimensional matrices ˆ E , ˆ µ and ˆ ν , whose jk -th elements are given by[ ˆ E ] jk = ˆΨ k · ˆ ψ j − k · ˆΨ k (47a)[ˆ µ ] jk = ˆΨ k · ˆ ψ jk (47b)[ ˆ ν ] jk = ˆΨ k · ˆ ψ j − k , (47c)with 1 ≤ j ≤ n and 1 ≤ k ≤ M . Determining thedistribution of edge states E , and the edge transitionrates µ and ν , then amounts to the matrix-vector productof ˆ E , ˆ µ and ˆ ν , respectively, with ζ . As an illustration,the edge state distribution is given by E =  E E ... E n  =  ˆΨ · ˆ ψ · ˆΨ ˆΨ · ˆ ψ · ˆΨ . . . ˆΨ M · ˆ ψ M · ˆΨ M ˆΨ · ˆ ψ · ˆΨ ˆΨ · ˆ ψ · ˆΨ . . . ˆΨ M · ˆ ψ M · ˆΨ M ... ... . . . ...ˆΨ · ˆ ψ n − · ˆΨ ˆΨ · ˆ ψ n − · ˆΨ . . . ˆΨ M · ˆ ψ n − M · ˆΨ M   ζ ζ ... ζ M  . (48)These quantities are calculated once at the start of theRunge-Kutta solution of the corresponding system, anddon’t change otherwise. Note that results have to bescaled by ln(2) t − as per the deﬁnition of f ( t, M ) in theGaver-Stehfest algorithm, and µ j and ν j normalised by E j , to adhere to their deﬁnitions. The complete systemto be solved is E = ˆ E ζ (49a) µ = ˆ µ ζ (49b) ν = ˆ ν ζ . (49c)Examples of µ and ν values are provided in Supplemen-tary Fig. 5 for the case of a gamma distribution ψ .Implementation of the Gaver-Stehfest algorithm inC++ created confusion for some time, not because of thecalculation of the weights ζ k , but in its product with ˆ f .The authors of [51] state that while it is clearly requiredfor the weights ζ k , arbitrary precision is not required forhandling ˆ f . However, we ﬁnd that double precision isnot suﬃcient in general for ˆ f , which can be conﬁrmedby using double values in C++, or equivalently, settinga precision of 53 bits in a arbitrary precision library.We use a combination of the GMP library for the bi-nomial coeﬃcients, factorial, and exponential in Supple-mentary Eq. (45). Division is performed after conversionto MPFR. Similarly, ˆ f in Supplementary Eq. (44) is de-termined using MPFR, whether the Laplace transformis known in closed form, or if it is calculated from itsintegral deﬁnition. Supplementary Note 6. Diﬀusion speed and noise

In the Methods section of the main text, we deﬁne thefraction of infections that are due to noise as ρ f , andthe spreading time relative to the pure-noise case as t f .As shown in the inset of Fig. 1(a) of the main text,these quantities are very close to interchangeable. Thatis, any result expressed in terms of t f produces an almostidentical picture in ρ f , and vice versa.As a concrete illustration we replot Fig. 2 of the maintext in terms of ρ f , expressed there in terms of relativespreading time t f . Results are shown in SupplementaryFig. 7. Upon inspection we note the landscape of ρ f is almost identical to that for t f , albeit with slightlymore variance in the relative noise measurement com-pared to spreading time, as seen in the reduced sharp-ness of the colours. In particular the percolation transi-tion, indicated by the dashed white line in SupplementaryFig. 7(c), remains accurate. As expected, the slowest dif-fusion implies a complete dependence on external noise,as seen in the annealed regime of each plot where ρ f ≈ /ρ f , the multiplicative eﬀect of external noise. This givethe number of total infections for every noise-induced in-fection, and is as large as 10 , as seen in the quenchedregime of Supplementary Fig. 7.3 Supplementary Note 7. Mean ﬁeld approximation

In this Supplementary Note we examine the eﬀective-ness of the mean ﬁeld assumption of edge state transi-tions. The edge transition rates calculated in this workassume an inﬁnitely large ensemble of stationary, uncor-related renewal processes. We expect a carefully imple-mented stochastic temporal network to agree preciselywith theory, in the limit of large networks, which is whatwe conﬁrm. In particular, by viewing the network as asingle ensemble of edges, ﬂux measurements indeed showtheoretical results for E j , µ j and ν j to be exact on a network wide scale. However, our analysis involves par-titioning our network into 2 × | C | classes, namely, an in-fected and uninfected variant for each of the | C | classes( k , m ) allowed by the system. Clearly, the densities ofnodes over these classes, and the transitions betweenthem are highly dynamic, emptying and ﬁlling over thecourse of a spreading process. Further, transition ratesare heterogeneous, particularly with respect to the trans-mission rate F k , m which varies from class to class. Themean ﬁeld approximation is to assume that edge tran-sition rates for individual classes are the same as for acompletely uncorrelated ensemble.The reason that we expect diﬀerences in edge transi-tion rates over these scales is as follows. The emergentrates µ j and ν j result from certain assumptions regard-ing edge statistics at a microscopic level. This is formu-lated in terms of the history distribution, or distributionof tuples ( τ , τ , . . . , τ j ) within each η window, on eachedge across a given set. This is illustrated in Supplemen-tary Fig. 3. Expressed in these terms, the mean ﬁeld ap-proach is to assume that the history distribution within aparticular class ( k , m ) is completely uncorrelated, and isequivalent to any randomly sampled subset of edges, orindeed the network edge set as a whole. To see how thisassumption may break down, consider the ﬂow of unin-fected nodes through C as a survival process, whereby anode enters an uninfected class ( k , m ), only exiting and (a) RT-2 -1 0 1 2 3-4-3-2-10123 log σ τ / h τ i l og η / h τ i -3 -2 -1 0 log ρ f (b) AT-2 -1 0 1 2 3 log σ τ / h τ i -3 -2 -1 0 log ρ f (c) SI annealedquenched-2 -1 0 1 2 3 log σ τ / h τ i -4 -3 -2 -1 0 log ρ f Supplementary Figure 7.

Relative density heat maps.

Same experiment as for Fig. 2 in the main text, but plottingrelative density of infections that are due to external noise, ρ f , rather than the normalised spreading time, t f . Results arealmost identical, with the percolation transition preserved forboth ρ f and t f . See caption in Fig. 2 of the main text forsimulation details. reentering an adjacent uninfected class if it does not be-come infected in the meantime. To further simplify thispicture, consider ( k , m ) to have F k , m = 1, with neigh-bouring uninfected classes having F k , m = 0, as may occurwith a threshold model of infection. In such a survivalprocess, it is nodes with a history distribution ( τ , . . . , τ j )with short intervals in ( k , m ), that are favourable to sur-vival, exiting to adjacent uninfected classes. History dis-tributions leading to long waiting times in ( k , m ) aremore likely to become infected, with survival times fol-lowing an exponential distribution with mean F k , m . Assuch, the history distribution of nodes exiting the classvia infections are diﬀerent to those that survive, andexit via edge or neighbour transitions. Mechanisms likethis, and the related eﬀect due to neighbour transitions,gradually transform the history distribution of individualclasses, resulting in deviations in the emergent edge tran-sition rates µ and ν . Although the assumption is brokenon the scale of individual classes, it is of course preservedwhen taking all classes together.We conﬁrm this eﬀect in an experiment whose resultsare shown in Supplementary Fig. 8. Here, we simu-late a spreading process, and carefully record the den-sities of nodes in each class ( k , m ) at all times t , as wellthe ﬂows between classes over measurement windows oflength ∆ t = 0 .

02. To allow ∆ t to be as small as possible,and approximate the dt in our analytics, we constrain thesystem to the smallest non-trivial conﬁguration space C .To this end, the degree distribution is 3-regular random,and ψ given by a power law with α = 2 .

5, with lowerand upper cutoﬀs of τ = 1 .

01 and 50, respectively. Bychoosing η = 1 < .

01, we ensure a two-level edge statespace, where edges are in state j = 0 or 1. The resultingconﬁguration space has size | C | = 20, is connected, andresembles a smaller version of Supplementary Fig. 1. Bysetting network size as large as possible, here N = 10 ,the node set is diluted as little as possible over C . Weplot the ﬂux measurements of the uninfected class withdegree vectors k = (2 , T and m = (1 , T . Node dy-namics follow a relative threshold rule with φ = 0 .

4, andbackground noise causing infection at a rate p = 2 × − .As such, the transmission rate for the class in question is F k , m = p .Ego transition measurements are shown in Supple-mentary Fig. 8(a). Since the class initially has density s k , m = 0, with the network initialised to ρ = 0 at t = 0,initial measurements show only a handful of ego transi-tions per ∆ t up to around t = 10, visible here thanksto the logscale. As expected, the rate of ego transitionclosely agrees with the measured value of p = 2 × − ,veriﬁed by scaling the total set density by F k , m s k , m ∆ t ,given by the solid black curve. This transition is guar-anteed to agree with theory if the Gillespie algorithm iscorrectly implemented, and serves as a useful benchmarkin ﬂux measurement experiments. Further, neighbourtransition rates are veriﬁed in Supplementary Fig. 8(b)and (c), for edges of type j = 0 and 1 respectively. Time-dependent rates β j are calculated as per Supplementary4 -8-7-6-5-4 l og ( • · s k , m ∆ t ) F k , m flux β k − m flux β k − m flux -8-7-6-5-4-3-2 t l og ( • · s k , m ∆ t ) µ k − m flux t µ m flux t ν k − m flux (a) Ego (b) Neighbour, I (c) Neighbour, II(d) Positive edge, I (e) Positive edge, II (f) Negative edge, I Supplementary Figure 8.

Examination of mean ﬁeld assumption for class transition rates.

Flux measurement for theset S k , m with k = (2 , T and m = (1 , T during Monte Carlo simulation. Network size is N = 10 , with a 3-regular randomdegree distribution. Background noise is p = 2 × − , with a relative threshold of φ = 0 .

4. Interevent time distribution ψ ispower law, with α = 2 . τ = 1 .

01 and 50. Memory duration is η = 1. Flux is measured over intervals of ∆ t = 0 . s k , m scaled by the indicated theoretical coeﬃcients, shown as solid lines. Eq. (10), using the set of | C | = 20 empirical densities s k , m . Scaling s k , m for the class in question by β ( k − m )and β ( k − m ) shows remarkable agreement with mea-sured ﬂuctuations.Flux measurements of positive edge transitions for un-infected and infected neighbours are shown in Supple-mentary Fig. 8(d) and (e), respectively. While agree-ment is excellent in (d), a clear deviation emerges in (e).Although the disagreement appears minor on a logarith-mic scale, the theoretical µ is oﬀ by roughly 20%. Therate of negative edge transition in (f) is in good agree-ment with theory. A study of the complete state spaceshows that deviations as in (e) are common, but not sys-tematic, with mean ﬁeld estimates µ and ν sometimesoverestimating, and sometimes underestimating the mea-sured ﬂuxes. The master equation solution of ρ for thisexperiment, not shown here, miscalculated the overallspreading speed by about 15%. It is likely that the smallconﬁguration space here contributed to the error. Inmuch larger systems, like in the main text, this eﬀect islikely diluted, especially since µ and ν do not systemati-cally over or underestimate the class-level edge transitionrates. That is, a cancellation eﬀect might emerge.We conﬁrm that the mean ﬁeld assumption breaksdown due to heterogeneities in transmission rates bystudying a purely noise driven variant of the above ex-periments. That is, F k , m = p for all classes ( k , m ). Thesame ﬂux measurements, not shown here, are in perfectagreement with theory in such a setting. Further, therelative set sizes for constant m rows of conﬁgurationspace are exactly what one would expect given a degreedistribution p k , and a probability distribution E j that arandomly selected edge is in state j . Supplementary Note 8. Skewness and entropy

In this Supplementary Note we examine the skewnessand diﬀerential entropy of the interevent time distribu-tions used in this work, namely the lognormal, Weibulland gamma distributions. Our goal is to understandhow these distributions diﬀer after carefully controllingfor their mean and standard deviation, µ and σ respec-tively, and to this end propose skewness and entropy asmeasures that can be understood as ranking various dis-tributions ψ by the value of their eﬀective sparsity ξ E ,discussed in the main text.We control for both the mean and standard deviationwhen comparing distributions in this work. For a randomvariable X , this is equivalent to controlling the ﬁrst andsecond raw moments, E [ X ] = µ and E [ X ] = µ + σ . Wewish to examine the diﬀerences that remain between ourchosen distributions after imposing these constraints. Tothis end, it is logical to examine the third raw moment E [ X ]. This is usually done by calculating the skewness γ , typically deﬁned as the third standardised moment, or E [( X − µ ) ] normalised by σ . The normalisation rendersthe moment scale invariant, meaning the information en-coded in the skewness relates only to the “shape” of thedistribution in question, and not its variance. Since we’reinterested in comparing skewness for constant mean andstandard deviation, we use the expression γ = E [( X − µ ) ] σ = E [ X ] − µσ − µ σ . (50)Clearly, skewness γ diﬀers from one distribution to an-other only in the third raw moment E [ X ], since we keep µ and σ constant. Plotting skewness γ as a function of σ for constant µ = 1 leads to the plot in SupplementaryFig. 9, for the lognormal, Weibull and gamma distribu-5 (a) (b) (c) (d)0 1 2 3 4 5 6 7 80510152025 log σ l og γ f ( x ) log σ l og E [ X ] lognormalWeibullgamma − − − −

200 log σ h -1 0 1 2 3 4 5 6 7 800.20.40.60.81 log σ ξ E Supplementary Figure 9.

Higher order moments of probability distributions.

Comparing skewness and entropy ofprobability distributions as a function of standard deviation. Mean is ﬁxed to µ = 1. (a) The third raw moment E [ X ], andskewness γ in (b), for the lognormal, Weibull and gamma distributions. The third raw moment is the only term that diﬀers inthe calculation of skewness. (c) Entropy follows the same trend as (a) and (b). We are interested in the connection betweenthese quantities and ξ E in (d), deﬁned in the main text, with η = 1. tions. The large diﬀerences in skewness here correspondto observations regarding eﬀective sparsity ξ E in earlierparts of this work. As such, the skewness of a distribu-tion provides a good rule-of-thumb for comparing ξ E fordiﬀerent distributions.In addition, we calculate the diﬀerential, or informa-tion entropy for each distribution, deﬁned as h = − (cid:90) ∞ f ( x ) ln f ( x ) dx, (51)for distributions deﬁned for x ∈ (0 , ∞ ), as is the casefor the lognormal, Weibull and gamma distribution. Themotivation for considering the diﬀerential entropy is theobservation that it was the lognormal distribution, themaximum entropy distribution for which the mean andvariance of ln( X ) are speciﬁed, that produced the fastestdiﬀusion times in the main text.As shown in Supplementary Fig. 9, the relative valuesof skewness and entropy agree qualitatively with expec-tation. Simply put, the relative values of skewness γ ,shown in plot (b), and diﬀerential entropy h , shown inplot (c), agree qualitatively with the relative values of ef-fective sparsity ξ E , shown in plot (d). This supports therule-of-thumb that the greater the skewness and entropy,the lower the eﬀective sparsity. Supplementary Note 9. Probability distributions

In this Supplementary Note we detail the probabilitydistributions used in this work, in particular those usedfor the interevent time distribution ψ , and its tail dis-tribution Ψ. For simplicity, the notation used in thisSupplementary Note is entirely self contained, and weassociate pdfs f ( x ) with ψ ( τ ), and their cdfs F ( x ) with1 − Ψ( τ ). Lognormal distribution . The lognormal distribution isdeﬁned with respect to an underlying normal distributionwith mean ˜ µ and variance ˜ σ . These are related to thelognormal mean and variance µ and σ by the relations given in Supplementary Table I, and can be straightfor-wardly inverted. Provided µ and σ , it has the largestskewness γ of all distributions studied here, as well asthe largest diﬀerential entropy h . This is not surprising,given that the lognormal can be derived using maximumentropy principles, as discussed in the previous section,and illustrated in Supplementary Fig. 9. Gamma distribution . The gamma distribution is re-lated to the gamma function, Γ( s ), described below. Theexpressions for its mean and variance Supplementary Ta-ble I can be easily inverted, resulting in expressions for α and β the provide the desired moments. Note that we set µ = 1 for the temporal network experiments in this work,meaning that when the shape parameter α = β = 1, co-inciding with σ = 1, we recover the exponential distribu-tion. For values α <

1, meaning σ >

1, the qualitativeshape of the exponential is maintained, with an increas-ingly heavy tail. In contrast, when α >

1, meaning σ < f goes to zeroin the small x limit. For large α , meaning small σ , thegamma distribution tends to the Dirac delta function. InSupplementary Fig. 9 we observe that the gamma distri-bution is the least skewed of the distributions consideredhere, for comparable µ and σ . Entropy is given using ψ ,the digamma function, deﬁned in the following sectionunder the Weibull distribution. Variants of the gamma function . We describe here thenecessary approximations in order to numerically con-struct the cdf of the gamma distribution. Unfortunately,the cdf as stated here is circular, with no closed formexpression available. In this section, we discuss a num-ber of series expansions that are necessary to numericallyevaluate γ ( α, xβ ), the lower incomplete gamma function.The gamma function Γ( s ) is deﬁned, and related to itsincomplete variants, asΓ( s ) = (cid:90) ∞ t s − e − t dt (52a)= (cid:90) x t s − e − t dt + (cid:90) ∞ x t s − e − t dt (52b)= γ ( s, x ) + Γ( s, x ) . (52c)6 Supplementary Table I.

Properties of probability distributions.

Properties of the two-parameter probability distributionsused to model interevent time in this work. For each we note the probability density function f X ( x ), the cumulative densityfunction F X ( x ), the mean µ and variance σ , as well as the third moment E [ X ] and skewness γ , and diﬀerential entropy h .lognormal † Weibull ‡ gamma f X ( x ) 1 x √ π ˜ σ exp (cid:20) − (ln x − ˜ µ ) √ π ˜ σ (cid:21) kλ (cid:16) xλ (cid:17) k − e − ( x/λ ) k α ) β α x α − e − xβ F X ( x ) 12 + 12 erf (cid:18) ln x − ˜ µ √ σ (cid:19) − e − ( x/λ ) k α ) γ ( α, xβ ) µ ln (cid:32) µ (cid:112) µ + σ (cid:33) λ Γ(1 + 1 /k ) αβσ ln (cid:18) µ + σ µ (cid:19) λ (cid:2) Γ(1 + 2 /k ) − Γ (1 + 1 /k ) (cid:3) αβ E [ X ] (cid:0) µ + σ (cid:1) µ λ Γ(1 + 3 /k ) µ + 3 µσ + 2 σ µγ µ σ + σ µ - 2 σµh ˜ µ + ln(2 πe ˜ σ ) (1 − /k ) γ e + ln ( λ/k ) + 1 α + ln( α Γ( α )) + (1 − α ) ψ ( α ) † Note that the provided mean and variance correspond to the underlying normal distribution with mean ˜ µ and variance ˜ σ ‡ We omit skewness since it doesn’t simplify like the other distributions. Further, γ e is the Euler-Mascheroni constant. In other words, the upper and lower incomplete gammafunctions are deﬁned by partitioning the integral accord-ing to (0 , ∞ ) = (0 , x ] ∪ [ x, ∞ ), given by γ ( s, x ) andΓ( s, x ), respectively. Sampling from the lower incompletegamma function will be crucial when initialising our tem-poral network system at steady state. The choice of seriesapproximation for the lower-incomplete gamma functiondepends on the shape parameter α , and in turn the stan-dard deviation σ . For small α , meaning large σ , we use γ ( s, x ) = e − x x s Γ( s ) ∞ (cid:88) n =0 x n Γ( s + n + 1) . (53)This well known approximation, although eﬃcient forlarge values of standard deviation, becomes prohibitivelyslow for very large values of α , meaning very small valuesof σ . At this scale, speciﬁcally when α > σ <

1, weapproximate the lower-incomplete gamma function as γ ( s, x ) = e − x x s ∞ (cid:88) n =0 x n s n +1 (54)which can be computed recursively using a small numberof multiplication and division operations at each step.Unfortunately this approximation fails for large σ , andmust be only be used in the small σ , which we do purelyfor eﬃciency, since the preceding approximation con-verges everywhere. Here, s n +1 is the Pochhammer sym-bol. Weibull distribution . Closely related to the gammadistribution is the Weibull distribution. As can be seen in Supplementary Table I, the Weibull distribution can-not be easily parameterised by its mean and standarddeviation, as was the case for the lognormal and gammadistribution. The tuple ( k, λ ) providing desired mean andstandard deviation can be found using gradient descent.Note that like the gamma distribution, when k = λ = 1,we recover the exponential distribution. Like α in thegamma distribution, k controls the shape, with large k tending towards the Dirac delta function, and small k anincreasingly right skewed, heavy tailed form. Tuning parameters using gradient descent . We couldnot ﬁnd a reference to address the problem of parameter-ising the Weibull distribution by its mean µ and standarddeviation σ . Not ﬁnding an existing solution to this prob-lem, we ﬁnd the values of k and λ giving the desired µ and σ using gradient descent. In the following calcula-tions, we actually use the variance denoted by ν = σ for simplicity, given the form of Supplementary Eq. ( ?? ).We ﬁnd k and λ by locating the minimum of the losssurface deﬁned by the function l ( µ i , ν i ) = (cid:18) µ i − µµ (cid:19) + (cid:18) ν i − νν (cid:19) , (55)where µ i = µ i ( k i , λ i ) and ν i = ν i ( k i , λ i ), as per Supple-mentary Table I, are the values of the mean and varianceat the i -th step of the procedure, and are continuous vari-ables here. The values µ and ν are the target, and areconsidered constant in the following. As such, µ i and ν i are continuous variables. Due to the nature of the ex-periments in the main text, it is crucial to normalise therelative error in each term, by µ and ν respectively. This7is because ν varies over orders of magnitude, while µ re-mains ﬁxed. For the same reason, the displacement at each step of the algorithm is determined on logarithmicscales. We perform this optimisation once, and store theresulting ( µ, ν, k, λ ) tuple in a table for reuse.The gradient of l ( µ i , ν i ) is ∇ l ( µ i , ν i ) = ∂ k l ( µ i , ν i ) ˆ k + ∂ λ l ( µ i , ν i ) ˆ λ , (56)with k and λ components given by ∂ k l ( µ i , ν i ) = 2 (cid:18) µ i − µµ (cid:19) µ ∂ k µ i + 2 (cid:18) ν i − νν (cid:19) ν ∂ k ν i (57)and ∂ λ l ( µ i , ν i ) = 2 (cid:18) µ i − µµ (cid:19) µ ∂ λ µ i + 2 (cid:18) ν i − νν (cid:19) ν ∂ λ ν i , (58)with partial derivatives ∂ λ µ i = Γ(1 + 1 /k ) , (59) ∂ k µ i = − λk Γ(1 + 1 /k ) ψ (1 + 1 /k ) , (60)and ∂ λ ν i = 2 λ (cid:2) Γ(1 + 2 /k ) − Γ (1 + 1 /k ) (cid:3) , (61) ∂ k ν i = − λ k (cid:2) Γ(1 + 2 /k ) ψ (1 + 2 /k ) − Γ (1 + 1 /k ) ψ (1 + 1 /k ) (cid:3) , (62)respectively. Here ψ is the so-called digamma function,which was previously required for the calculation of theentropy of the gamma distribution. It is deﬁned as thelogarithmic derivative of the gamma function, or ψ ( z ) = ddz ln(Γ( z )) = Γ (cid:48) ( z )Γ( z ) . (63)It can be found using a simple series approximation, ψ ( z ) = − γ + ∞ (cid:88) n =0 (cid:18) n + 1 − n + z (cid:19) , (64)for z (cid:54) = − , − , − , . . . , where γ here denotes the Euler-Mascheroni constant. Although the series is inﬁnite, inpractice it converges quite rapidly, even when applyingstrict thresholds. Along with an inputted step size, thisprovides the next point ( µ i , ν i , k i , λ i ) in the procedure.Adaptive step size is incorporated in gradient descent,where step size h i = l i if l i < .

1, and h i = 0 . k i , λ i . Stochasticgradient descent techniques may be incorporated to im-prove the convergence rate.Note that a multiple precision library appears to be necessary even when the desired precision in the loss func-tion l not beyond the bounds of default machine preci-sion, such as double and long double . As always, thisinduces a cost in terms of computation time, so we cal-culate all the tuples ( µ, σ, k, λµ, σ, k, λ