[PDF] Combinatorial approach to spreading processes on networks

Abstract

Stochastic spreading models defined on complex network topologies are used to mimic the diffusion of diseases, information, and opinions in real-world systems. Existing theoretical approaches to the characterization of the models in terms of microscopic configurations rely on some approximation of independence among dynamical variables, thus introducing a systematic bias in the prediction of the ground-truth dynamics. Here, we develop a combinatorial framework based on the approximation that spreading may occur only along the shortest paths connecting pairs of nodes. The approximation overestimates dynamical correlations among node states and leads to biased predictions. Systematic bias is, however, pointing in the opposite direction of existing approximations. We show that the combination of the two biased approaches generates predictions of the ground-truth dynamics that are more accurate than the ones given by the two approximations if used in isolation. We further take advantage of the combinatorial approximation to characterize theoretical properties of some inference problems, and show that the reconstruction of microscopic configurations is very sensitive to both the place where and the time when partial knowledge of the system is acquired.

Full PDF

CCombinatorial approach to spreading processes on networks

Dario Mazzilli and Filippo Radicchi ∗ Center for Complex Networks and Systems Research,Luddy School of Informatics, Computing, and Engineering,Indiana University, Bloomington, Indiana 47408, USA

Stochastic spreading models deﬁned on complex network topologies are used to mimic the di ﬀ usion of dis-eases, information, and opinions in real-world systems. Existing theoretical approaches to the characterizationof the models in terms of microscopic conﬁgurations rely on some approximation of independence among dy-namical variables, thus introducing a systematic bias in the prediction of the ground-truth dynamics. Here,we develop a combinatorial framework based on the approximation that spreading may occur only along theshortest paths connecting pairs of nodes. The approximation overestimates dynamical correlations among nodestates and leads to biased predictions. Systematic bias is, however, pointing in the opposite direction of exist-ing approximations. We show that the combination of the two biased approaches generates predictions of theground-truth dynamics that are more accurate than the ones given by the two approximations if used in isolation.We further take advantage of the combinatorial approximation to characterize theoretical properties of some in-ference problems, and show that the reconstruction of microscopic conﬁgurations is very sensitive to both theplace where and the time when partial knowledge of the system is acquired. I. INTRODUCTION

Stochastic spreading models running on top of networktopologies have been used to study a large variety of real-world dynamical processes [1–4]. Examples include thespread of diseases [5, 6], the di ﬀ usion of information andopinions [7–10], the propagation of bank failures [11],blackout cascades [12], development of countries [13], andavalanche dynamics in neural networks [14].In spite of their simplicity, many spreading models can besolved exactly on very speciﬁc network topologies only; ex-tensive simulations and / or theoretical approximations are gen-erally required to characterize their properties on arbitrary net-works [1]. A complete solution of a spreading model on anetwork consists in associating a probability to every possi-ble microscopic conﬁguration of the system at each instantof time. Such a detailed knowledge may not be required inapplications where the interest is centered around the macro-scopic behavior of the system, e.g., outbreak size and / or du-ration [15–18]. It is however required in many other ap-plications of central importance for spreading processes onnetworks as for example the problems of inferring the pa-tient zero identity [19, 20], optimal sampling [21], and in-ﬂuence maximization [22]. Existing theoretical approachesto the microscopic description of spreading processes on net-works rely on some approximation of independence amongthe state variables of the individual nodes. For example, theindividual-node mean-ﬁeld approximation assumes completedynamical independence among state variables of the individ-ual nodes [1, 23, 24]. Message-passing approximations, suchas those considered in Refs. [25–28], improve over the mean-ﬁeld approximation by relying on conditional independenceamong pairs of variables, providing exact predictions on tree-structured networks and excellent predictive power on arbi-trary networks. These approximations have the common bias ∗ ﬁ[email protected] of neglecting, to some extent, dynamical correlations that areessential for the exact description of unidirectional spreadingprocesses. As a consequence, their predictions are systemati-cally biased towards the overestimation of the infection prob-ability of individual nodes. From the computational point ofview, these approximations allow for the quick computation ofmarginal probabilities. However, they lack of ﬂexibility withrespect to changes in the dynamical and / or topological detailsof the process. If the initial conditions of the dynamics, the pa-rameter values of the spreading model, or the topology of thenetwork are changed, solutions of the approximations shouldbe computed afresh by iteration.In this paper, we introduce an approximation based on acombinatorial calculation of the spreading probability alongthe shortest path between pairs of nodes. The approximationneglects that an infection may propagate along paths longerthan the shortest one. We derive close-form expressions of theapproximation for the Susceptible-Infected and Susceptible-Infected-Recovered models started from a single source of in-fection. On trees, our approach is exact, providing a geometricinterpretation of correlations and joint probabilities betweenpairs of nodes. We leverage such an intuitive interpretationof the approach to study properties of the patient-zero prob-lem and to compare di ﬀ erent strategies of acquiring informa-tion about network conﬁgurations from partial observations.In networks with loops, the approach systematically underes-timates the probability of infection of individual nodes. Thebias goes in the opposite direction of existing approximationsfor spreading processes on networks. The simultaneous use ofthe two types of approximations allows us to deﬁne a regionwhere the true value of probabilities lies. Their combinationimproves the accuracy of each individual approximation. Westress that the computationally demanding component of ourapproximation consists in ﬁnding the shortest path betweenpairs of nodes. Probabilities of node states are then deter-mined solely on such a geometric knowledge. As a result,exploring the parameters’ space of a spreading model is verye ﬃ cient. We are aware of existing approaches that approx-imate spreading as happening on the shortest paths among a r X i v : . [ phy s i c s . s o c - ph ] J a n pairs of nodes only [29]. However, we are not aware of afull theoretical development of the approximation, consistingof closed-form combinatorial solutions, as the one presentedhere.The paper is organized as follows. In section II, we intro-duce the spreading models considered in the paper. We fur-ther describe the individual-based mean-ﬁeld approximationand our combinatorial approximation. We compare the accu-racy of the approximations in predicting ground-truth spread-ing dynamics. In the comparisons, we include also the dy-namic message-passing approximation. In section III, we ap-ply our approximation to trees, and characterize some proper-ties of inference problems associated to spreading, includingthe identiﬁcation of the patient zero and the maximization ofsystem information from the local observation of the statesof some nodes. In section IV, we summarize our results andindicate viable extensions of our work. II. THEORETICAL APPROXIMATIONS FORSPREADING DYNAMICS ON NETWORKS

We assume that spreading occurs on a quenched, undirectedand unweighted, network composed of N nodes. The topol-ogy of the network is fully speciﬁed by the N × N adjacencymatrix A , whose generic element A i j = A ji = i and j are connected, whereas A i j = A ji =

0, otherwise. We assumethat the network does not contain any self-connection, so that A ii = ∀ i . Without loss of generality, we further assumethat the network is composed of a single connected compo-nent, so that every node is reachable from any other node, andspreading from any single initial source node has the potentialto involve the entire system. We indicate with (cid:96) i j the distancebetween nodes i and j in the network, equal to the minimalnumber of edges that separate the two nodes. Please note thatthe symmetry of the network implies that (cid:96) i j = (cid:96) ji .We consider the discrete-time version of two very popularmodels of spreading dynamics: the Susceptible-Infected (SI)and the Susceptible-Infected-Recovered (SIR) models [1, 30,31].In the SI model, every node i at time t can be found in twodi ﬀ erent states, either the susceptible state σ ( t ) i = S or the in-fected state σ ( t ) i = I . At each discrete stage of the dynamics t >

0, every node i such that σ ( t − i = I tries to infect everyneighbor j , i.e., A i j =

1, in the susceptible state. The in-fection is successfully transmitted with spreading probability0 ≤ β ≤

1. A successful spreading event consists in changingthe state of node j as σ ( t − j = S → σ ( t ) j = I . This meansthat the newly infected node j can attempt to further spreadthe infection from time t + t → t +

1. The dynamics is such that, as long as at least onenode is initially infected and β >

0, all nodes will, sooner orlater, end up in the infected state.The SIR model is a slightly more sophisticated model thanthe SI model, and the generic node i may be also found inthe recovered state σ ( t ) i = R . Spreading events happen in the same exact way as for the SI model. However, after allspreading attempts of stage t have been considered, then ev-ery node i such that σ ( t − i = I may recover with probability0 ≤ γ ≤

1. Recovery of node i consists in the change of state σ ( t − i = I → σ ( t ) i = R . After all recovery attempts have beenconsidered, time increases as t → t +

1. Recovered nodes donot participate in the spreading dynamics, in the sense thatthey cannot infect their susceptible neighbors nor they canbe re-infected by their infected neighbors. Potentially manydi ﬀ erent ﬁnal conﬁgurations are reachable depending on thechoice of the parameters β and γ , and the initial conﬁgurationof the system. A. The susceptible-infected model

In this section, we derive analytical expressions for the SImodel on arbitrary network topologies. We will ﬁrst considerthe individual-based mean-ﬁeld approximation (IBMFA) [1,24, 32]. Then, we will derive a novel approximation basedon combinatorial arguments. The novel approximation is ex-act on trees, and is expected to perform well on sparse tree-like networks. We name the method as the shortest-path com-binatorial approximation (SPCA). We will characterize someproperties of SPCA, and compare the prediction accuracy ofthe novel approximation against IBMFA and the so-called dy-namic message-passing approximation (DMPA) [26]. DMPAis the best approximation on the market for the prediction ofmarginal probabilities in the SI (and SIR) model. However,given their complicated form, we will not report DMPA equa-tions below. The interested reader can ﬁnd the equations, andtheir derivation, in Ref. [26].

Problem setting

We assume that spreading is initiated by a single infectednode s , i.e., σ (0) s = I . Node s is the source of the infection orthe patient zero. All other nodes j are initially in the suscepti-ble state, i.e., σ (0) ∀ j (cid:44) s = S . Our goal is to fully characterize theprobability of the microscopic state of every individual nodeat each stage of the dynamics. The main quantity that we fo-cus on is P ( t ) s → i = Prob. (cid:20) σ ( t − i = S → σ ( t ) i = I (cid:12)(cid:12)(cid:12)(cid:12) σ (0) ∀ j (cid:44) s = S , σ (0) s = I (cid:21) , (1)i.e., the probability that the infection, started from the sourcenode s at time t =

0, reaches node i after exactly t stages of thedynamics. We will consider di ﬀ erent expressions for P ( t ) s → i onthe basis of the above-mentioned approximations. Once P ( t ) s → i is given, several other quantities useful in the characterizationof the model dynamics can be immediately computed. Forexample, to obtain the probability Q ( t ) s → i that node i has beeninfected at time t or earlier, we simply perform the sum Q ( t ) s → i = t (cid:88) r = P ( r ) s → i . (2)Based on our assumption on the initial conﬁguration, weautomatically have that P (0) s → s = Q (0) s → s =

1, and P (0) s → i = Q (0) s → i = , ∀ i (cid:44) s . P ( t ) s → i and Q ( t ) s → i are probabilities subjected to the initial con-dition that spreading is initiated by node s . We can relax sucha condition and consider arbitrary initial conﬁgurations con-sisting of one unknown source. If we indicate with z s = Prob. (cid:104) σ (0) ∀ j (cid:44) s = S , σ (0) s = I (cid:105) (3)the probability that node s is the initial spreader, then the prob-ability P ( t ) → i that node i receives the infection, from an arbitraryinitial spreader, at exactly stage t of the dynamics is estimatedas P ( t ) → i = N (cid:88) s = z s P ( t ) s → i . (4)Similarly, the probability Q ( t ) → i that node i receives the infec-tion by an arbitrary single source at time t or earlier is givenby Q ( t ) → i = N (cid:88) s = z s Q ( t ) s → i . (5) Individual-node mean-ﬁeld approximation

The individual-based mean-ﬁeld approximation (IBMFA)consists in neglecting dynamical correlations among variablesso that every node i feels the average, over an inﬁnite numberof independent realizations of the spreading process, behaviorof its neighbors [1, 24, 32]. Under IBMFA, we can write P ( t ) s → i = (cid:104) − Q ( t − s → i (cid:105)  − N (cid:89) j = (cid:16) − A ji β Q ( t − s → j (cid:17) . (6)Eq. (6) is derived as follows. The probability for node i tobe infected at exactly time t is given by the product that thenode has been not infected at any time earlier than t , i.e.,1 − Q ( t − s → i , and receives the infection by at least one of its in-fected neighbors, i.e., 1 − (cid:81) Nj = (cid:16) − A ji β Q ( t − s → j (cid:17) . The latterterm is computed as the product of individual contributions ofthe node’s neighbors, thus assuming complete independenceof their states.Eq. (6), together with Eq. (2), deﬁnes a system of N equa-tions, one for every node i . Solutions are obtained by iteration,starting from the imposed initial conditions.Limitations of IBMFA are apparent. Neglecting dynam-ical correlations leads to the possibility for the infection tospread in opposite directions along the same edge, a situa-tion that is indeed impossible in SI dynamics. Because ofthis fact, the cumulative probability Q ( t ) s → i always overesti-mates the true probability value, thus providing a consistentupper bound for the ground truth. Approximations more pre-cise than IBMFA can be obtained by accounting for dynami-cal correlations tracing back the evolution of the system over a given number of time steps, considering additional variablesrepresenting the states of pairs, triplets, etc. of nodes [6, 33].Better approximations require fully accounting for the unidi-rectional motion of the infection along network edges. Forexample, Lokhov and collaborators [26] rely on conditionalindependence among variables, and write equations of mes-sages spreading along individual edges, in the same spirit asdone in approximations used in the study of percolation mod-els on networks [21, 34–37]. Their dynamic message-passingapproximation (DMPA) is exact on trees. In networks withloops, DMPA still leads to an overestimation of the true Q ( t ) s → i ,as the infection is allowed to travel in opposite directions onthe same edge, although not immediately. Shortest-path combinatorial approximation

The exact computation of P ( t ) s → i requires the enumeration ofall possible ways in which the infection starting from node s reaches node i in exactly t time steps. Such an enumerationincludes all possible paths among the two nodes, and all pos-sible combinations for the propagation of the infection alongthese paths. For an arbitrary network, the number of possi-bilities grows exponentially with the system size, thus makingthe exact computation of P ( t ) s → i infeasible. Here, we propose away to approximate from below P ( t ) s → i by simply assuming thatthe spread of the infection may happen only along the short-est path connecting the nodes s and i , and then enumerating allpossible ways in which the infection can propagate along sucha path. We name the approximation as the shortest-path com-binatorial approximation (SPCA). We stress that SPCA is ex-act on trees, where each pair of nodes is connected by a uniquepath. We expect SPCA to provide a tight lower bound for theground-truth value of P ( t ) s → i in sparse loopy networks. UnderSPCA, P ( t ) s → i is uniquely determined by the distance (cid:96) si > s and i . We can write P ( t ) s → i = (cid:32) t − (cid:96) si − (cid:33) β (cid:96) si (1 − β ) t − (cid:96) si . (7)Eq. (7) is derived as follows. We can think of the path s → i as composed of two pieces, s → j and j → i , where j is thenearest neighbor of node i along the path s → i , so that the dis-tance between s and j is (cid:96) s j = (cid:96) si −

1, as in Figure 1. At stage t of the dynamics, the ﬁnal spreading attempt j → i must be, bydeﬁnition of P ( t ) s → i , successful. This elementary event happenswith probability β . However before the ﬁnal step, the infectionmust have reached node j and not moved further than node j in the preceding t − (cid:16) t − (cid:96) si − (cid:17) β (cid:96) si − (1 − β ) t − (cid:96) si , correspondingto the binomial probability of observing exactly (cid:96) si − t − s → j and j → i of the path s → i . Q ( t ) s → i is obtained relying on Eq. (2). It is easy to check thatlim t →∞ Q ( t ) s → i =

1, for all nodes i and for any source node s , as S i β β β β (1- β ) (1- β )1 j Figure 1. The shortest-path combinatorial approximation. The ﬁgureserves to illustrate the rationale behind Eq. (7). Here, we representa speciﬁc sequence of spreading attempts that allow the infection tospread from the source node s to node i at distance (cid:96) si = t = β (1 − β ) . expected for the SI model. Regardless of its distance from thesource, every node will be eventually infected.To get a sense of the magnitude of the approximation er-ror introduced by SPCA, we consider the case where nodes s and i are connected by two independent paths of length (cid:96) si and (cid:96) si + d (cid:96) , respectively. We focus our attention on theprobability Q ( t ) s → i that the infection reaches node i at time t orearlier. Such a probability is given by the likelihood that in-fection propagates along at least one of the two independentpaths, and its ground-truth value can be calculated exactly re-lying on a proper combination of Eqs. (2) and (7), see ap-pendix for details. We compare the ground-truth value withthe one we obtain using SPCA, and quantify the relative errorof the approximation with respect to the truth value. Resultsfor some combinations of the parameter values (cid:96) si and d (cid:96) areplotted in Figure 2. Relative error is a decreasing functionof d (cid:96) . It is worth noting that the relative error behaves non-monotonically as a function of time t , reaching a maximum atintermediate t values. Predictions in real-world networks

A nice property of SPCA is to provide a consistent lower-bound for ground-truth probabilities. SPCA neglects thatspreading may occur along longer paths. The simultane-ous use of IBMFA (or similar approaches that provide upperbounds for true probabilities, e.g., DMPA) and SPCA is veryuseful, as it allows us to delineate the region of possible out-comes for the SI model. In Figure 3 for example, we comparepredictions of IBMFA, DMPA and SPCA with estimates ofthe ground-truth probabilities obtained via numerical simula-tions of the SI process on a real network. We use the US air

Figure 2. Error committed by the shortest-path combinatorial ap-proximation. Relative error of the SPCA compared to the ground-truth value for the probability Q ( t ) s → i that node i is infected at time t orearlier by an infection starting from node s . Spreading obeys SI dy-namics with spreading probability β = .

01. Ground-truth values areestimated under the hypothesis that nodes s and i are connected bytwo independent paths of length (cid:96) si and (cid:96) si + d (cid:96) , respectively. SPCAapproximates truth probabilities relying on the shortest path only. (a)We set (cid:96) si = ﬀ erent values of d (cid:96) . Relative erroris plotted as a function of time. (b) We set d (cid:96) =

5, and considerdi ﬀ erent (cid:96) si values. transportation network of Ref. [38]. The network has N = ρ (cid:39) . d = z s = / N .While taking into account only the shortest path is an over-simpliﬁcation of the problem, combining SPCA with IBMFA(or DMPA), as for example by taking the arithmetic averageof the two approximations, can increase the accuracy of theindividual methods. This is especially true for sources of theprocess with low degree. B. The susceptible-infected-recovered model

Problem setting

For SIR dynamics, we consider the same initial conﬁgura-tion as in the SI model, where spreading is initiated by a singleinfected node s , i.e., σ (0) s = I , while all other nodes j are ini-tially in the susceptible state, i.e., σ (0) ∀ j (cid:44) s = S . The probabilitythat node i receives the infection at exactly time t is deﬁned inthe same identical way as for the SI model, see Eq. (1). Wecan further apply the same deﬁnition as in Eq. (2) to quan- Figure 3. Accuracy of the approximations in predicting ground-truthinfection probabilities in real-world networks. We considered the SImodel on the air transportation network of Ref. [38]. The spread-ing probability is set β = .

25. We run 4 ,

000 numerical simulationsof the process where a given node is the source of the spreading.Ground-truth values are compared with predictions from the IBMFA(green), SPCA (blue), DMPA (red), the average SPCA-IBMFA (pur-ple) and SPCA-DMPA (black). a) Mean absolute error over all pos-sible sources as a function of time. b) We display the absolute erroraveraged over time as a function of the degree of the source node. tify the probability Q ( t ) s → i that the infection reaches node i attime t or earlier. The additional recovered state that is allowedin the SIR model requires the deﬁnition of other probabilitiesnot deﬁned for the SI model. For instance, the probabilitythat node i recovers exactly at time t , given that the infectionstarted from node s , is denoted by R ( t ) s → i . The probability thatnode i recovers at time t or earlier is given by T ( t ) s → i = t (cid:88) r = R ( r ) s → i . (8)All other probabilities of interest can be immediately derived.For example, we have that the probability that node i is still inthe susceptible state at time t is given by 1 − Q ( t ) s → i . Also, theprobability that node i is found infected at time t is given by Q ( t ) s → i − T ( t ) s → i . Individual-based mean-ﬁeld approximation

Under IBMFA [1, 24, 32], we can write P ( t ) s → i = (cid:104) − Q ( t − s → i (cid:105)  − N (cid:89) j = (cid:104) − A ji β (cid:16) Q ( t − s → j − T ( t − s → j (cid:17)(cid:105) . (9) Eq. (9) is a direct generalization of Eq. (6). The mean-ing of the various terms is exactly the same as in Eq. (6),with the only di ﬀ erence that here we need to account forthe possibility of infected nodes to recover. To become in-fected, we require that node i is still in the susceptible stateat time t , i.e., 1 − Q ( t − s → i , and that infection arrives from atleast one of its neighbors that is still in the infected state, i.e.,1 − (cid:81) Nj = (cid:104) − A ji β (cid:16) Q ( t − s → j − T ( t − s → j (cid:17)(cid:105) .Properties and limitations of IBMFA for the SIR model arevery similar to those already illustrated for the SI model. Lim-itations of Eq. (9) in capturing the true probability are due tothe assumption of dynamical independence among variablesthat is in contrast with the unidirectional nature of spreading.Still, IBMFA can be improved by imposing only conditionalindependence instead of full independence among variablesas done in DMPA [26]. Also, IBMFA and DMPA continue toprovide e ﬀ ective methods to bound from the above marginalprobabilities of infection of the true SIR model. Shortest-path combinatorial approximation

In this section, we extend SPCA to the SIR model. Theprobability that node i is infected exactly at time t , given thatthe infection started from the source node s , is P ( t ) s → i = (1 − γ ) t − (cid:96) si (cid:32) t − (cid:96) si − (cid:33) β (cid:96) si (1 − β ) t − (cid:96) si . (10)Eq. (10) simply generalizes Eq. (7) with the inclusion of themultiplicative factor (1 − γ ) t − (cid:96) si . This factor accounts for even-tual recovery events that may prevent the infection to reachnode i . As the infection can proceed its trajectory as long asthe latest infected node does not recover before passing theinfection, then the condition that allows spreading to occur isthat recovery should not happen in t − (cid:96) si independent attempts,leading to the factor (1 − γ ) t − (cid:96) si .We can immediately derive thatlim t →∞ Q ( t ) s → i = (cid:32) − γ + γβ (cid:33) − (cid:96) si , (11)thus, in the long-term limit, the probability of infection ex-ponentially decreases towards zero as the distance from thesource increases.In spite of the fact that Eq. (11) is valid for individual nodesonly, we ﬁnd that the equation is useful for the prediction ofthe outbreak of the entire system. In Figure 4, we compare theaverage value of the outbreak size estimated from numericalsimulations with predictions quantiﬁed as O = (cid:32) − γ + γβ (cid:33) −(cid:104) (cid:96) (cid:105) , (12)where (cid:104) (cid:96) (cid:105) = N ( N − (cid:80) i > j (cid:96) i , j is the average value of the dis-tance among nodes in the network. While predictions do notmatch truth values, their functional similarity is apparent asshown by the magnitude of the mismatch between ground-truth values and predictions. Figure 4. Prediction of the phase diagram under the shortest-pathcombinatorial approximation. a) We consider a tree with N = O = . O = .

01 (dashed). b) Di ﬀ erence between ground-truth valuesof the outbreak size and predictions from Eq. (12) as a function ofthe SIR model parameters. The probability to recover at time t is given by R ( t ) s → i = γ (1 − γ ) t − t − (cid:88) r = P ( r ) s → i (1 − γ ) − r . (13)Eq. (13) is easily obtained considering that node i can recoveronly if previously infected, say at time r . Then, the probabil-ity that recovery happens after a certain number of additionaltime steps is given by the probability that recovery happenedat time t but didn’t happen in any of the previous stages, i.e., γ (1 − γ ) t − r − . Summing up over all possible time steps r whennode i could have been infected, one obtains Eq. (13).In Figure 5, we repeat the same exercise as in Figure 2 byestimating the relative error committed by SPCA when theground-truth topology is such that nodes s and i are connectedby two independent paths of length (cid:96) si and (cid:96) si + d (cid:96) , respec-tively. We note that the behaviour in the early stages of therelative error is quite similar to the SI case. Results for theSIR model di ﬀ er from those of the SI model in the late stagesof the dynamics, when the ﬁnite limit in Eq. (11) gives a non-vanishing asymptotic value. For the SIR model, the eventual Figure 5. Error committed by the shortest-path combinatorial ap-proximation. Same as in Figure 2 but for the SIR model. Spreadingprobability is β = .

5, while recovery probability is γ = . presence of multiple paths connecting two nodes plays a muchmore important role than in the SI model. Predictions in real-world networks

The considerations made for the SI model are still valid forthe SIR model. In networks with loops, SPCA underestimatesthe ground-truth probability Q ( t ) s → i , while DMPA overestimatesit. The combination of SPCA and DMPA deﬁnes the regionwhere the true values are located. Also, it is possible to im-prove the accuracy of the individual approximations by simplytaking their arithmetic average, see Figure 6. Improvementsare especially apparent in the supercritical regime of the dy-namics, where the SIR model behaves most similarly to the SImodel. III. APPLICATIONS

We now turn our attention to speciﬁc applications of thetheoretical framework in inference problems. We remark thatwe are not leveraging the framework to actually perform in-ference. Rather, we are using it to provide insights on theproperties of the inference problems, as for example how theability of an observer to perform inference is a ﬀ ected by thetime of the observations and the position of the observer in thesystem. Figure 6. Accuracy of the approximations in predicting ground-truth infection probabilities in real-world networks. a) We consider the SIRmodel on the air transportation network of Ref. [38]. We set β = . γ = .

1. The combination of the two parameter values correspondto the supercritical regime of the dynamics. We run 8 ,

000 numerical simulations of the SIR process for each node being the source of theinfection to obtain a single estimate of the ground-truth values of Q ( t ) → i for all i . Ground-truth values are compared with predictions from theIBMFA (green), SPCA (blue), DMPA (red), the average SPCA-IBMFA (purple) and the average SPCA-DMPA (black). The ﬁgure displaysthe relative error, averaged over all nodes, committed by the various approximations as a function of time. b) Absolute error, averaged overtime, of the various approximations as a function of the degree of the source node. Data are the same as in panel a. c) Same as in panel a, butfor β = .

01, corresponding to the subcritical regime of spreading. d) Same as in panel b, but obtained for the same parameter setting as inpanel c.

A. Identiﬁcation of the source of spreading

As a ﬁrst application, we consider the classical inferenceproblem aiming at the identiﬁcation of the initial spreader, i.e.,the so-called patient-zero problem [41–43]. We note that theproblem has been already studied with a combinatorial ap-proach similar to ours in Ref. [44]. The patient-zero problemis typically framed under the assumption of limited informa-tion, where only partial knowledge of the microscopic proper-ties of the system is available to the observer. In this paper, wefocus on two di ﬀ erent settings often considered in the litera-ture on this subject. In both cases, we assume that the observerhas full and exact knowledge of the network topology. In ad-dition, the observer is fully aware of the stage of the dynamicsas well as of the exact values of the spreading and recoveryprobabilities. Susceptible-infected model

First, we consider the case where the observer is allowedto constantly monitor the state of node i . Suppose that, attime t , node i gets infected, and the observer wants to infer theidentity of the patient zero. The probability V ( t ) s → i that node s is the initial spreader is given by the Bayes’ theorem and can be written as V ( t ) s → i = z s P ( t ) s → i P ( t ) → i , (14)where P ( t ) s → i and P ( t ) → i are the same probabilities as deﬁned inEqs. (4) and (7), respectively. z s , deﬁned in Eq. (3), is theprobability that node s is the source of the infection prior anytype of measurement made on the system.Second, we consider the case where the observer performsa single measurement of node i at time t and ﬁnds it infected.Infection may have occurred at time t or earlier. The proba-bility W ( t ) s → i that node s is the source of the infection under thiscondition is given by W ( t ) s → i = z s Q ( t ) s → i Q ( t ) → i . (15)Here, z s is still our prior on the node s acting as the sourceof the infection; Q ( t ) s → i and Q ( t ) → i are the same probabilities asdeﬁned in Eqs. (2) and (5), respectively.A comparison between V ( t ) s → i and W ( t ) s → i is illustrated in Fig-ure 7. The two strategies of performing local observationsof the network generally lead to di ﬀ erent predictions aboutthe location of the patient zero, and the di ﬀ erence among thetwo strategies strongly depends on the time when the mea-surements are performed. Figure 7. Comparison of observation strategies in the patient-zeroidentiﬁcation problem. We display the inferred probabilities on thelocation of the source s obtained under the hypothesis of continuousobservation of the state of nodes, i.e., V ( t ) s → i as deﬁned in Eq. (14),and under the hypothesis of instantaneous observation of their state,i.e., W ( t ) s → i as deﬁned in Eq. (15). Each curve corresponds to a singlemeasured node i at time t ; the curve is obtained by connecting pairsof contiguous points ( V ( t ) s → i , W ( t ) s → i ) for all s (cid:44) i . The two panels showresults valid for two di ﬀ erent values of the time of measurement,namely t =

12 in panel a and t =

90 in panel b. Results are obtainedon a tree with N =

100 nodes and uniformly distributed randomPr¨ufer sequence [39, 40]. Spreading is happening according to the SImodel with spreading probability β = . In the early stages of the spreading process, the values of V ( t ) s → i or W ( t ) s → i are highly heterogeneous, and the two strate-gies of observation lead to almost identical inferred proba-bilities regardless of the point of observation i . This fact iseasily explained as the most likely source of infection shouldbe located in the vicinity of the node where the system is ob-served from. At later stages of the dynamics, the probabilityvalues V ( t ) s → i or W ( t ) s → i are less heterogeneous, the two inferredprobabilities are negatively correlated, and their discrepancystrongly depends on the node i where the system is observedfrom. The negative correlation in the ﬁnal stages of the dy-namics seems surprising but can be intuitively explained. Ifa node gets infected after a very long time, it is very unlikelythat the infection started from one of its neighbors. However,if one ﬁnds the node infected but does not know when the in-fection happened, still the nearest nodes are the most likelysources of infection.The quantity that best characterizes the correlation betweenthe inferred probabilities V ( t ) s → i and W ( t ) s → i is Q ( t ) → i , i.e., the prob-ability to ﬁnd node i infected at time t or earlier. In Figure 8, we display the Spearman correlation coe ﬃ cient between V ( t ) s → i and W ( t ) s → i as a function of Q ( t ) → i . Correlation is only mildly de-pendent on the node where the system is observed from. Thenon-perfect correlation found at very low values of Q ( t ) → i is stillsurprising, but can be easily understood. If one ﬁnds the ob-served node infected at the very ﬁrst stages of the dynamicswithout knowing exactly the time when the node was infected,then it is very likely that the node itself is inferred to be the pa-tient zero. However, when one knows that the exact time of in-fection, one can properly distinguish cases when the observednode was or was not the actual source of spreading. Thereare two remarkable aspects emerging from Figure 8. First, thechange in sign of the correlation coe ﬃ cient is happening at Q ( t ) → i (cid:39) .

5. Second, the correlation coe ﬃ cient becomes max-imally negative well before Q ( t ) → i (cid:39)

1. This is due the factthat, at very late stages of the spreading when all nodes arelikely to be found infected, having knowledge that the infec-tion happened at a very late stage makes likely that the sourceof infection was far apart from the observation point; not hav-ing knowledge of the exact time of infection leads instead toan almost ﬂat probability distribution for the patient zero lo-cation but still with a very weak preference for nodes close tothe observation point, including the observed node itself.

Figure 8. Comparison of observation strategies in the patient-zeroidentiﬁcation problem. Spearman correlation coe ﬃ cient obtained byranking nodes according to the inferred probabilities of being thesource of spreading according to instantaneous and continuous ob-servation of the state of a node. The correlation coe ﬃ cient is plottedagainst the probability to ﬁnd the observed node i infected. Di ﬀ erentcurves correspond to di ﬀ erent nodes observed, and di ﬀ erent valuesof the probability to ﬁnd the observed node infected map to di ﬀ erentstages of the dynamics. Results are obtained in the same experimen-tal setting as of Figure 7. Susceptible-infected-recovered model

In the SIR model, three outcomes are possible when thestate of a node is measured. As a consequence, four di ﬀ er-ent conditional probabilities are potentially relevant for thepatient-zero identiﬁcation problem. However, ﬁnding the ob-served node in the infected vs. recovered state does not gener-ate a signiﬁcant di ﬀ erence in our ability to predict the identifyof the patient zero.In Figure 8, we measure the correlation between the in-ferred probabilities V ( t ) s → i and W ( t ) s → i on the location of the pa-tient zero. We notice that the two strategies of observationmay lead to very di ﬀ erent outcomes depending on either thetime when the observation is made and the values of the pa-rameters of the spreading model. Clearly, for γ = γ >

0, correlation is instead a non-monotonicfunction of time. At early stages of the dynamics, correlationdecreases for the same reasons as in the SI model. At latestages, however, it increases. The reason is quite intuitive.Finding a node infected but not yet recovered at a late stageof the dynamics means that it is unlikely that the node got in-fected at the very beginning process, otherwise it would havehad plenty of time to recover. Thus, knowing or not know-ing the exact time of infection is irrelevant for the patient-zeroinference problem, as one can exclude that the source of in-fection is very close to the observed node in either cases.

Figure 9. Comparison of observation strategies in the patient-zeroidentiﬁcation problem in the SIR model. a) Spearman correlationcoe ﬃ cient obtained by ranking nodes according to the inferred prob-abilities of being the source of spreading according to instantaneousand continuous observation of the state of a node. The correlationcoe ﬃ cient is plotted against time. Di ﬀ erent curves correspond todi ﬀ erent nodes observed. Results are obtained in the same conﬁg-uration as of Figure 7. Spreading probability is set β = .

2, whilerecovery probability is set γ = .

1. b) Same as in panel a, but fordi ﬀ erent γ values. Irrespective of the speciﬁc value of the recoveryprobability, the system is always observed from the same node. B. Gain of information from local measurements

What is the amount of information, about the microscopicconﬁguration of the system, that we can gain by observing thestate of a speciﬁc node? Clearly, as states of di ﬀ erent nodesin the network are correlated, the measurement of the state ofone node provides us with some knowledge about the state ofthe other nodes. However, the gain of information will be notthe same for all choices of the observed node; further, the gainof information may dramatically vary, even if we decide to ob-serve the same node, depending on the stage of the dynamicswhen the measurement is performed.As a second application of our framework, we study spread-ing processes from a information-theoretical perspective pro-viding indications about the content of information that eachnode carries about the whole network.To properly quantify the gain of information we should cal-culate the mutual information between network conﬁgurationsand the state of the observed node. This calculation would re-quire to estimate the probability of every network conﬁgura-tion conditioned by the state of the observed node, and thena sum over all the possible conditional probabilities. Due tothe huge number of possible conﬁgurations however, the ex-act computation of the information gain is infeasible. Here,we approximate it as the sum of the pairwise mutual informa-tion of all pairs of nodes. We compute the mutual informationamong pairs of nodes i and j using their joint probability ofgetting infected at time t or earlier, namely Q ( t ) s → i , j . The geo-metric framework of SPCA easily adapts to such a computa-tion. Speciﬁcally, the computation of the joint probability stillrelies on the deﬁnition of marginal probabilities, but properlyaccounts for the possible paths between the source s and thetwo target nodes i and j we are interested in (see Appendixfor details). The expected information gained by observingnode i is then quantiﬁed as the sum of the pairwise mutualinformation of the node with respect to all other nodes in thenetwork. Susceptible-infected model

Naively, we should expect that nodes that occupy centralpositions in the network correspond to optimal points of ob-servations. Such an intuition is generally correct, but withsome caveat. In Figure 10, we show the information gainedby measuring a single node as a function of its degree, i.e., asimple metric of network centrality. We considered metrics ofnetwork centrality more complicated than degree, but the re-sults of the analysis are qualitatively similar to those reportedhere. In the early stages of the dynamics, the information gainis correlated with node degree. At late stages, observing thenetwork from nodes with large degrees becomes sub-optimal.The time of the measurement plays a fundamental role forthe amount of information that can be actually gained. At thebeginning of the dynamics, all nodes are in the susceptiblestate, thus we do not expect any measurement to be informa-tive. Similar conclusions are valid for the late stages of thedynamics, when all nodes are likely to be in the infected state.0We expect, however, measurements to be informative whenuncertainty about the system conﬁguration is maximal. Thisfact is apparent from Figure 10. We see that there is an inter-mediate stage of the dynamics where the information contentof the network reaches a peak value. At that point in time, theinformation gained by observing a node is not strongly de-pendent on the centrality of the node where the observation isperformed.

Figure 10. Gain of information from local network measurements inthe SI model. a) Gain of information obtained from the observationof a single node. Information gain is plotted against the degree ofthe observed node. Each point in the plot corresponds to a di ﬀ erentnode used to observe the system. Di ﬀ erent colors and symbols standfor di ﬀ erent stages of the dynamical process when the observationis performed. The experimental setting is the same as in Figure 7.b) Total information content of the network as a function of time.Information content is measured as the sum of the individual-nodeentropies, i.e., I = (cid:80) i [ Q ( t ) → i log Q ( t ) → i + (1 − Q ( t ) → i ) log(1 − Q ( t ) → i )]. Susceptible-infected-recovered model

In Figure 11, we perform a similar analysis as of Figure 10,but for the SIR model. The content of information as a func-tion of time behaves in a di ﬀ erent way depending on thechoice of the parameters β and γ . If the probability of recov-ery γ is low, then the information content of the SIR model isalmost identical to the one we just described for the SI model.If γ values are large enough instead, information content doesnot longer decrease as time increases, reﬂecting the non-nulluncertainty of the ﬁnal conﬁgurations reached by SIR spread-ing. Figure 11. Gain of information from local network measurementsin the SIR model. Total information content of the network as afunction of time for three di ﬀ erent values of γ . We set β = . IV. CONCLUSIONS

In this paper, we presented a combinatorial approach to cal-culate the spreading probability, along the shortest path be-tween pairs of nodes in a network, for the Susceptible-Infected(SI) and the Susceptible-Infected-Recovered (SIR) models.We named it as the shortest-path combinatorial approxima-tion (SPCA). The approach is exact in absence of loops andgives a lower bound for the infection probability on arbitrarynetworks. The approximation can be in principle extended toinclude the e ﬀ ect of other paths (i.e., the second shortest path,the third shortest path, etc.) on the computation of the spread-ing probability. However, adding more paths exponentiallyincreases the complexity of the algorithm, since the proce-dure requires to disentangle independent vs. shared parts (i.e.,nodes and edges) among the various paths. We showed thatthe arithmetic average between the novel approximation andother approximations existing on the market, e.g., individual-node mean-ﬁeld and dynamic message-passing approxima-tions, can be used to obtain predictions that are more accuratethan those obtained by each approximation if used in isolation.The amount of improvement strongly depends on the degreeof the source and, in the SIR model, on the regime of the pro-cess, hinting that the importance of the shortest path dependson network’s connectivity as well as on the process’s param-eters. Potential follow-up studies could explore the predictivepower of more sophisticated ways than the arithmetic aver-age to combine SPCA with other approximations. On a treenetwork, we used SPCA to evaluate joint probabilities amongpairs of node states, and applied it to study general propertiesof standard inference problems. Speciﬁcally, we character-ized two di ﬀ erent strategies of single-node observation in theidentiﬁcation problem of the patient zero. We showed that theinference problem is highly sensitive to the modality in whichthe observation is performed. Measuring a node at a giventime or monitoring it throughout the process may lead to op-posite conclusions on the identity of the patient zero. Also, weanalyzed the entropy of the processes and quantiﬁed the in-formation gained, when the state of a node is measured, about1the rest of the network. The most informative node is not thesame throughout the entire process and the knowledge of thedynamical stage is crucial to optimize the information gainedby a measurement. These results can be extended by consider-ing the measurements, contemporaneous or sequential, of twonodes. By calculating a three-node joint probability one couldmeasure the most informative pair of nodes and study di ﬀ er-ent strategies for nodes’ control. While we focused only onthe SI and SIR models, other spreading processes in discretetime can be studied with a similar theoretical approach. ACKNOWLEDGMENTS

DM and FR acknowledge support from the US Army Re-search O ﬃ ce (W911NF-16-1- 0104). FR acknowledges sup-port from the National Science Foundation (CMMI-1552487). Appendix A: Magnitude of the error associated with theshortest-path combinatorial approximation

In Figures 2 and 5, we considered an hypothetical settingwhere the generic node i is connected to the source node s bytwo independent paths of length (cid:96) si and (cid:96) si + d (cid:96) , with d (cid:96) ≥

0. The paths are independent in the sense that they do notshare any node except for s and i . This fact allows us to easilycompute the exact probabilities for the ground-truth scenarioby simply combining the probabilities of the individual paths.The setting is useful to understand the magnitude of the errorthat we should expect to have when using SPCA in a non-treenetwork, where multiple paths among nodes may exist. Forsimplicity of notation, but without loss of generality, we willuse (cid:96) = (cid:96) si in the following description. Susceptible-infected model

For the SI model, the probability that the infection reachesa certain node along a path of length (cid:96) in t time steps or less isgiven by q ( (cid:96), t ) = t (cid:88) r = (cid:32) t − (cid:96) − (cid:33) β (cid:96) (1 − β ) t − (cid:96) , The previous expression is nothing more than a mere combi-nation of Eqs. (2) and (7) of the main text. We just avoidedto write an explicit dependence on the source and target nodesto simplify the expression. In presence of two independentpaths, the probability that the infection reaches the target nodeis given by q ( (cid:96), (cid:96) + d (cid:96), t ) = − [1 − q ( (cid:96), t )][1 − q ( (cid:96) + d (cid:96), t )] , thus equal to the probability that spreading occurs at least onone of the two independent paths. The relative error of Fig-ure 2 is ﬁnally quantiﬁed as (cid:15) ( (cid:96), d (cid:96), t ) = − q ( (cid:96), t ) q ( (cid:96), d (cid:96), t ) . Susceptible-infected-recovered model

For the SIR model, the calculation is a bit more cumber-some than for the SI model.Suppose node s is initially in the infected state, and supposethat two independent paths of length (cid:96) and (cid:96) + d (cid:96) connect node i to node s . The probability q ( (cid:96), (cid:96) + d (cid:96), t ) that node i becomesinfected at time t is given by the probability that the infectionspreads along at least one of these paths. We remark that weknow the analytical form of the probability q ( (cid:96), t ) that the in-fection spreads along a single path of length (cid:96) in t time stepsor less, see main text. However, this expression can be usedto combine the contribution of the two independent paths onlyprovided that the paths are dynamically independent. The lat-ter condition is satisﬁed only when the infection performs atleast one step towards the target along at least one of the paths.Indicate with v the neighbor of node s along the path oflength (cid:96) towards i , and with w the neighbor of node s alongthe path of length (cid:96) + d (cid:96) towards i . The initial conﬁgura-tion at time t = σ (0) s = I and σ (0) ∀ j (cid:44) s = S .At time t =

1, the states of nodes may change as the re-sults of spreading and recovery events. The only nodes thatcan change their states are s , v and w . For example, wecan go to the conﬁguration σ (1) = ( I , I , S , . . . ), i.e., such that σ (1) v = I , σ (1) w = S and σ (1) s = I , with probability Prob.[ σ (1) = ( σ (1) v = I , σ (1) w = S , σ (1) v = I , S , . . . , S )] = β (1 − β )(1 − γ ).After this ﬁrst step, the spreading of the infection will hap-pen independently along the two paths, thus we can write q [ (cid:96), (cid:96) + d (cid:96), t | σ (1) = ( σ (1) v = I , σ (1) w = S , σ (1) v = I , S , . . . , S )] = − [1 − q ( (cid:96) − , t − − q ( (cid:96) + d (cid:96), t − q ( (cid:96), (cid:96) + d (cid:96), t ) = (cid:88) σ q ( (cid:96), (cid:96) + d (cid:96), t | σ ) Prob.( σ ) , (A1)where the sum runs over all eight conﬁgurations σ of Table I.The expressions of the probabilities appearing in Table I arethen used to solve Eq. (A1) by iteration, starting from the ini-tial condition q ( (cid:96), (cid:96) + d (cid:96), t = = Appendix B: Joint probability of infection from a single sourceSusceptible-infected model

Here, we illustrate how to compute the joint probability Q ( t ) s → i , j that nodes i and j are infected at time t or earlier giventhat the source of spreading is node s . The computation stilltakes advantage of Eqs. (2) and (7), by properly accountingfor the position of the source node s relatively to the positionsof the target nodes i and j (see Figure 12).If node j is seating in between nodes s and j , then the infec-tion can reach node i only passing ﬁrst through node j . Thus,we can safely write that Q ( t ) s → i , j = Q ( t ) s → i . The same exact ar-gument leads us to write Q ( t ) s → i , j = Q ( t ) s → j if node i is seating inbetween nodes j and s .2 σ v σ v σ s Prob.( σ ) q ( (cid:96), (cid:96) + d (cid:96), t | σ ) S S I (1 − β ) (1 − γ ) q ( (cid:96), (cid:96) + d (cid:96), t − S S R (1 − β ) γ S I I β (1 − β )(1 − γ ) 1 − [1 − q ( (cid:96), t − − q ( (cid:96) + d (cid:96) − , t − S I R β (1 − β ) γ q ( (cid:96) + d (cid:96) − , t − I S I β (1 − β )(1 − γ ) 1 − [1 − q ( (cid:96) − , t − − q ( (cid:96) + d (cid:96), t − I S R β (1 − β ) γ q ( (cid:96) − , t − I I I β (1 − γ ) 1 − [1 − q ( (cid:96) − , t − − q ( (cid:96) + d (cid:96) − , t − I I R β γ − [1 − q ( (cid:96) − , t − − q ( (cid:96) + d (cid:96) − , t − σ = ( σ v , σ w , σ s , S , . . . , S ) reachable after one dynamical step assuming that the conﬁguration at preceding time is suchthat node s is infected and all other nodes are susceptible. We provide the value of the probability Prob.( σ ) for each of these conﬁgurations tohappen together with the conditional probability q ( (cid:96), (cid:96) + d (cid:96), t | σ ) that the infection will reach node i along one of the two paths of length (cid:96) and (cid:96) + d (cid:96) , respectively. The latter probability is given by appropriate combinations of the known probabilities q for the single independent paths. k i j Figure 12. Schematic illustration for the computation of the jointprobability. The shaded areas highlight di ﬀ erent parts of the networkwhere the source node can be located, relatively to the positions ofthe target nodes i and j . Red areas denote regions where one of thetwo paths of spreading is dependent on the other. The blue shadedarea indicate locations of the source node leading to path of spreadingthat are partially independent. A less straightforward computation is required when thesource node s is connected to nodes i and j with partially inde-pendent paths. Part of the spreading path can be in commonamong the two trajectories, say up to node k as indicated inFigure 12. However after this node, the two paths are dynam-ically independent one on the other and the two contributionsare computed separately. Speciﬁcally, we can write Q ( t ) s → i , j = t − max( (cid:96) ki ,(cid:96) kj ) (cid:88) r = P ( r ) s → k Q ( t − r ) k → i Q ( t − r ) k → j , (B1)where P ( r ) s → k is the usual probability that the infection reachednode k in exactly r stages of the dynamics. The sum on ther.h.s. of Eq. (B1) runs over all possible values of r compatiblewith the quantity that we want to estimate. Susceptible-infected-recovered model

In the SIR model we can compute Q ( t ) s → i , j using the verysame method for SI with the only caveat to take into accountEq. (A1) and Table I whenever the source is between i and j or the two shortest paths become independent. [1] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, andA. Vespignani, Reviews of modern physics , 925 (2015).[2] C. T. Butts, science , 414 (2009).[3] M. O. Jackson, Social and economic networks (Princeton uni-versity press, 2010).[4] A. Vespignani, Nature physics , 32 (2012).[5] A. L. Lloyd and R. M. May, Science , 1316 (2001).[6] K. T. Eames and M. J. Keeling, Proceedings of the nationalacademy of sciences , 13330 (2002).[7] L. Weng, F. Menczer, and Y.-Y. Ahn, in Eighth internationalAAAI conference on weblogs and social media (2014).[8] C. Castellano, S. Fortunato, and V. Loreto, Reviews of modernphysics , 591 (2009).[9] Y. Moreno, M. Nekovee, and A. F. Pacheco, Physical review E , 066130 (2004). [10] L. Dall’Asta, A. Baronchelli, A. Barrat, and V. Loreto, PhysicalReview E , 036105 (2006).[11] G. Brandi, R. Di Clemente, and G. Cimini, Physica A: Statisti-cal Mechanics and its Applications , 255 (2018).[12] I. Dobson, B. A. Carreras, D. E. Newman, and J. M. Reynolds-Barredo, IEEE Transactions on Power Systems , 4831(2016).[13] C. A. Hidalgo, B. Klinger, A.-L. Barab´asi, and R. Hausmann,Science , 482 (2007).[14] T. P. Vogels, K. Rajan, and L. F. Abbott, Annu. Rev. Neurosci. , 357 (2005).[15] Y. Moreno, R. Pastor-Satorras, and A. Vespignani, The Euro-pean Physical Journal B-Condensed Matter and Complex Sys-tems , 521 (2002).[16] J. L. Payne, K. D. Harris, and P. S. Dodds, Physical Review E , 016110 (2011). [17] C. Castellano and R. Pastor-Satorras, Physical review letters , 218701 (2010).[18] L. Buzna, K. Peters, and D. Helbing, Physica A: Statistical Me-chanics and its Applications , 132 (2006).[19] F. Altarelli, A. Braunstein, L. Dall’Asta, A. Lage-Castellanos,and R. Zecchina, Physical Review Letters , 118701 (2014).[20] A. Y. Lokhov, M. M´ezard, H. Ohta, and L. Zdeborov´a, PhysicalReview E , 012801 (2014).[21] F. Radicchi and C. Castellano, Physical review letters ,198301 (2018).[22] D. Kempe, J. Kleinberg, and ´E. Tardos, in Proceedings of theninth ACM SIGKDD international conference on Knowledgediscovery and data mining (2003), pp. 137–146.[23] Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsos, in (IEEE, 2003), pp. 25–34.[24] D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Falout-sos, ACM Transactions on Information and System Security ,1 (2008), ISSN 1094-9224, URL http://dx.doi.org/10.1145/1284680.1284681 .[25] B. Karrer and M. E. Newman, Physical Review E , 016101(2010).[26] A. Y. Lokhov, M. M´ezard, and L. Zdeborov´a, Physical ReviewE , 012811 (2015).[27] E. Cator and P. Van Mieghem, Physical Review E , 052802(2014).[28] J. P. Gleeson, Physical Review X , 021004 (2013).[29] D. Brockmann and D. Helbing, science , 1337 (2013).[30] M. E. Newman, Physical review E , 016128 (2002). [31] R. M. Anderson, B. Anderson, and R. M. May, Infectious dis-eases of humans: dynamics and control (Oxford universitypress, 1992).[32] Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsos (2003).[33] J. P. Gleeson, Physical Review Letters , 068701 (2011).[34] K. E. Hamilton and L. P. Pryadko, Physical review letters ,208701 (2014).[35] B. Karrer, M. E. Newman, and L. Zdeborov´a, Physical reviewletters , 208702 (2014).[36] F. Radicchi, Nature Physics , 597 (2015).[37] F. Radicchi and C. Castellano, Nature communications , 1(2015).[38] V. Colizza, R. Pastor-Satorras, and A. Vespignani, NaturePhysics , 276 (2007).[39] H. Pr¨ufer, Arch. Math. Phys , 742 (1918).[40] S. Pemmaraju and S. Skiena, Computational Discrete Mathe-matics: Combinatorics and Graph Theory with Mathematica ® (Cambridge university press, 2003).[41] D. Shah and T. Zaman, in Proceedings of the ACM SIGMET-RICS international conference on Measurement and modelingof computer systems (2010), pp. 203–214.[42] D. Shah and T. Zaman, IEEE Transactions on information the-ory , 5163 (2011).[43] W. Luo, W. P. Tay, and M. Leng, IEEE Transactions on SignalProcessing , 2850 (2013).[44] K. Zhu and L. Ying, IEEE / ACM Transactions on Networking24