[PDF] A message-passing approach for recurrent-state epidemic models on networks

Abstract

Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest. Recently, dynamic message-passing (DMP) has been proposed as an efficient algorithm for simulating epidemic models on networks, and in particular for estimating the probability that a given node will become infectious at a particular time. To date, DMP has been applied exclusively to models with one-way state changes, as opposed to models like SIS (susceptible-infectious-susceptible) and SIRS (susceptible-infectious-recovered-susceptible) where nodes can return to previously inhabited states. Because many real-world epidemics can exhibit such recurrent dynamics, we propose a DMP algorithm for complex, recurrent epidemic models on networks. Our approach takes correlations between neighboring nodes into account while preventing causal signals from backtracking to their immediate source, and thus avoids "echo chamber effects" where a pair of adjacent nodes each amplify the probability that the other is infectious. We demonstrate that this approach well approximates results obtained from Monte Carlo simulation and that its accuracy is often superior to the pair approximation (which also takes second-order correlations into account). Moreover, our approach is more computationally efficient than the pair approximation, especially for complex epidemic models: the number of variables in our DMP approach grows as 2mk where m is the number of edges and k is the number of states, as opposed to m k 2 for the pair approximation. We suspect that the resulting reduction in computational effort, as well as the conceptual simplicity of DMP, will make it a useful tool in epidemic modeling, especially for inference tasks where there is a large parameter space to explore.

Full PDF

AA message-passing approach for recurrent-state epidemic models on networks

Munik Shrestha

University of New Mexico, Albuquerque, NM 87131, USA andSanta Fe Institute, 1399 Hyde Park road, Santa Fe, NM 87501, USA

Samuel V. Scarpino and Cristopher Moore

Santa Fe Institute, 1399 Hyde Park road, Santa Fe, NM 87501, USA (Dated: September 6, 2018)Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest.Recently, dynamic message-passing (DMP) has been proposed as an eﬃcient algorithm for simulat-ing epidemic models on networks [1–5], and in particular for estimating the probability that a givennode will become infectious at a particular time. To date, DMP has been applied exclusively to mod-els with one-way state changes, as opposed to models like SIS (susceptible-infectious-susceptible)and SIRS (susceptible-infectious-recovered-susceptible) where nodes can return to previously inhab-ited states. Because many real-world epidemics can exhibit such recurrent dynamics, we proposea DMP algorithm for complex, recurrent epidemic models on networks. Our approach takes corre-lations between neighboring nodes into account while preventing causal signals from backtrackingto their immediate source, and thus avoids “echo chamber eﬀects” where a pair of adjacent nodeseach amplify the probability that the other is infectious. We demonstrate that this approach wellapproximates results obtained from Monte Carlo simulation and that its accuracy is often superiorto the pair approximation (which also takes second-order correlations into account). Moreover, ourapproach is more computationally eﬃcient than the pair approximation, especially for complex epi-demic models: the number of variables in our DMP approach grows as 2 mk where m is the numberof edges and k is the number of states, as opposed to mk for the pair approximation. We suspectthat the resulting reduction in computational eﬀort, as well as the conceptual simplicity of DMP,will make it a useful tool in epidemic modeling, especially for inference tasks where there is a largeparameter space to explore. I. INTRODUCTION

Mathematical models of epidemic processes are intrinsically non-linear and multiplicative. These modelsinclude the spread of disease [6, 7], transmission of social behaviors [8–11], cascades of banking failures [12,13], forest ﬁres [14–16], the propagation of marginal probabilities in constraint satisfaction problems [17, 18]and the dynamics of magnetic and glassy systems [19].The classical approach to modeling epidemics, such as the SIR model where each node is Susceptible,Infectious, or Recovered, assumes that at any given time each individual exists in a single state or “com-partment” [6, 7]. To make these models analytically tractable, it is often assumed that the population iswell mixed, so that interaction between any two individuals is equally likely; in physical terms, we assumethe model is mean-ﬁeld (also known as mass-action mixing in the epidemiology literature). Despite thisunrealistic assumption, mean-ﬁeld models capture some essential features of epidemics, such as a thresholdabove which we have an endemic phase with a non-zero fraction of infected individuals, and below which wehave outbreaks of size o ( n ) so that the equilibrium fraction of infected individuals is zero.In reality, contacts between individuals in the population are often highly structured, with some pairs ofindividuals much more likely to interact than others due to location or demographics [11, 20]. To relax themean-ﬁeld assumption, while retaining some measure of tractability, we can assume that individuals interacton a network, whose structure captures the heterogeneity in the population [21, 22]. However, replacing themean-ﬁeld approximation with a contact network substantially increases a model’s complexity.One reasonable goal is to compute the one-point marginals, e.g., for each node i the probability I i ( t ) that i is infectious at time t . In addition to being of direct interest, these marginals help us perform taskssuch as inferring the originator of an epidemic, determining an optimal set of nodes to immunize in orderto minimize the ﬁnal size of an outbreak, or calculating the probability that an entire group of nodes willremain uninfected after a ﬁxed time [23–27].We can always compute these marginals by performing Monte Carlo experiments. However, since we need a r X i v : . [ phy s i c s . s o c - ph ] M a y o perform many independent trials in order to collect good statistics, this is computationally expensive onlarge networks. This problem is compounded if we need to scan through parameter space, or if we wantto explore many diﬀerent initial conditions, vaccination strategies, etc. Therefore, it would be desirable tocompute these marginals using, say, a system of diﬀerential equations, with variables that directly model theprobabilities of various events.The most naive way to do this, as we review below, uses the one-point marginals themselves as variables.However, this approach completely ignores correlations between nodes. At the other extreme, to model thesystem exactly, we would need to keep track of the entire joint distribution: but if there are n individuals,each of which can be in one of k states, this results in a coupled system with k n variables. This exponentialscaling quickly renders most models computationally intractable, even on moderately sized networks.In between these two extremes, we can approximate the joint distribution by “moment closure,” assumingthat higher-order marginals can be written in terms of lower-order ones. This gives a hierarchy of increasinglyaccurate (and computationally expensive) approximations, familiar in physics as cluster expansions. At theﬁrst level of this hierarchy we assume that the nodes are uncorrelated, and approximate two-point marginalssuch as [ I i ( t ) ∧ I j ( t )] (the probability that i and j are both infectious at time t ) as I j ( t ) I j ( t ) . At the secondlevel, commonly referred to in the epidemiology literature as the pair approximation, we close the hierarchyat the level of pairs [ I i ( t ) ∧ I j ( t )] by assuming that three-point correlations can be factored in terms oftwo-point correlations. For a comprehensive review of these methods, see [22, 28].In this paper, we study an alternative method, namely Dynamic Message-Passing (DMP). As in beliefpropagation [32, 33], here variables or “messages” are deﬁned on a network’s directed edges: for instance, I j → i denotes the probability that j was infected by one of its neighbors other than i , so that the epidemicmight spread from j to i . However, unlike belief propagation, where the posterior distributions are updatedaccording to Bayes’ rule, here we write diﬀerential equations for the messages over time.For many epidemic models, such as SI (susceptible-infectious), SIR (susceptible-infectious-recovered) andSEIR (susceptible-exposed-infectious-recovered), only one-way state changes can occur. For example, in theSIR model, once an individual has left the Susceptible class and become Infectious, they cannot return tobeing Susceptible; once they become Recovered, they are immune to future infections, and might as well beRemoved. For these non-recurrent models, DMP is known to be be an eﬃcient algorithm to estimate I i ( t ) ,and it is exact on trees [1]; it can also be applied to threshold models [3–5] and used for inference [23].However, for many real-world diseases individuals can return to previously inhabited states. In these recurrent models, such as SIS (susceptible-infectious-susceptible), SIRS (susceptible-infectious-recovered-susceptible), and SEIS (susceptible-exposed-infectious-susceptible), individuals can cycle through the statesmultiple times, giving multiple waves of infection traveling through the population. The most obviousexamples of recurrent models are seasonal inﬂuenza, where due to the evolution of the virus individuals arerepeatedly infected during their lifetime [41], vaccination where protective immunity wanes over time [42], anddiseases curable by treatment which does not result in antibody-mediated immunity, such as gonorrhea [47].In all three cases, individuals leave the Susceptible class, only to return at some point in the future (althoughfor inﬂuenza, it is worth mentioning that if the evolutionary rate of the virus is functionally related to thenumber of susceptible individuals, then the recovery rate may not be independent from the state of one’sneighbors.) Unfortunately, the DMP approach of [1] cannot be directly extended to recurrent models, sincetheir equations for messages only track the ﬁrst time an individual makes the transition to a given state.The purpose of this paper is to develop a novel DMP algorithm for recurrent models of epidemics onnetworks, which we call rDMP. We will show that rDMP gives very good approximations for marginalprobabilities on networks, and is often more accurate than the pair approximation. Moreover, whereas thepair approximation requires keeping track of mk variables, if there are m edges and k states per node,rDMP requires just 2 mk variables. For complex models where k is large—for instance, for diseases withmultiple stages of infection or immunity, or multiple-disease epidemics where one disease makes individualsmore susceptible to another one—this gives a substantial reduction in the computational eﬀort required.Finally, the rDMP approach is conceptually simple, making it easy to write down the system of diﬀerentialequations for a wide variety of epidemic models. 2 IG. 1. We deﬁne messages on the directed edges of a network to carry causal information of the ﬂow of contagion,e.g. I j → i is the probability that j is Infectious because it received the infection from a neighbor k other than i . Thisprevents eﬀects from immediately backtracking to the node they came from, and avoids “echo chamber” infections.FIG. 2. Two simple, yet illustrative, cases of networks, where the darker node is initially Infectious. As we discuss,in these simple cases one can see the motivation for our approach to prevent infection signals from backtracking tothe node it immediately came from. II. MESSAGE-PASSING AND PREVENTING THE ECHO CHAMBER EFFECT

As shown in Fig. 1, the variables of rDMP are messages along directed edges of the network (in additionto one-point marginals). For instance, I j → i is the probability that j is Infectious because it was infected byone of its other neighbors k . The intuition behind this is the following, where we take the SIS model as anexample. If i is Susceptible, the rate at which j will infect i is proportional to the probability I j that j isinfected. But when computing this rate, we only include the contribution to I j that comes from neighborsother than i . In other words, we deliberately neglect the event that j receives the infection from i , andimmediately passes it back to i , even if i has become Susceptible in the intervening time.This choice avoids a kind of “echo chamber” eﬀect, where neighboring nodes artiﬁcially amplify eachothers’ probability of being Infectious. For instance, consider a simple but pathological case of the SI modelwhere there are only two nodes in the graph, i and j , with an edge between them as shown in Fig. 2. Ifthe transmission rate is λ , and if we assume the nodes are independent (i.e., if we use ﬁrst-order momentclosure) we obtain the following diﬀerential equations, d I i dt = λ S i I j d I j dt = λ S j I i , (1)where S i ( t ) = − I i ( t ) and similarly for j .Now suppose that j is initially Infectious with probability δ , and that i is initially Susceptible, i.e., I j ( ) = δ and I i ( ) =

0. Since in the SI model nodes never recover, the infection will eventually spread from j to i ,but only if i was Infectious in the ﬁrst place. Thus the marginals I i ( t ) and I j ( t ) should tend to δ as t → ∞ .However, integrating Eq. (1) gives a diﬀerent result. Once I i becomes positive, d I j /dt becomes positiveas well, allowing i to infect j with the infection that it received from j in the ﬁrst place. As a result, I j ( t ) approaches 1 as t → ∞ . Thus the “echo chamber” between i and j leads to the absurd result that j eventuallybecomes Infectious, even though with probability 1 − δ there was no initial infection in the system.In the rDMP approach, we ﬁx this problem by replacing I i and I j with the messages they send each other, d I i dt = λ S i I j → i , d I j dt = λ S j I i → j ,3o that i can only infect j if i received the infection from some node other than j . (Below we give theequations on a general network, including the time derivatives of the messages.) In this example, there areno other nodes, so if I j → i ( ) = δ and I i → j ( ) =

0, then I j ( t ) = δ for all t as it should be.Note that we do not claim that rDMP is exact in this case. In particular, as in (1), I i ( t ) tends to 1 as t → ∞ . This is because, unlike the system of [1], rDMP assumes that the events that j infects i at diﬀerenttimes are independent.In this two-node example, of course, the pair approximation is exact, since it maintains separate variablessuch as [ S j ∧ I k ] for each of the joint states of the two nodes. However, the pair approximation is subject toother forms of the echo chamber eﬀect. Consider a network with three nodes, as in Fig. 2 (right), where j is acommon neighbor of i and k . The pair approximation assumes that, conditioned on the state of j , the statesof i and k are independent; however, in a recurrent epidemic model, i and k could be correlated, for instanceif j infected them both and then returned to the Susceptible state. As a result, the pair approximation isvulnerable to a distance-two echo chamber, where i and k infect each other through j . As in the two-nodecase, rDMP prevents this.Preventing backtracking completely may seem like a strong assumption, and in recurrent models it is a priori possible, for instance, for a node to re-infect the neighbor it was infected by. Despite the well-documented importance of recurrent infections for diseases including (but certainly not limited to) seasonalinﬂuenza [41], Plasmodium malaria [49], and urinary tract infections [48], little is known about the source ofrecurrent infections. For certain sexually transmitted diseases such as gonorrhea [47] and repeated ringworminfections [50], there is evidence that backtracking plays a signiﬁcant role; on the other hand, it may be thatrecurrent infections are caused by diﬀerent strains, each of which is acting essentially without backtracking.Thus while our non-backtracking assumption is clearly invalid in some cases, we believe it is a reasonableapproach for most recurrent state infections. III. THE rDMP

EQUATIONS FOR THE SIS, SIRS, AND SEIS MODELS

In this section, we illustrate the rDMP approach for several recurrent epidemic models. We start with thesimplest one: in the SIS model, each node is either Infectious ( I ) or Susceptible ( S ). Infectious nodes infecttheir Susceptible neighbors at rate λ , and their infections wane back into the Susceptible state at rate ρ . Wedenote the probability that that node i is Infectious or Susceptible by I i and S i respectively. The objectivethen is to eﬃciently and accurately compute these probabilities as a function of time t .We deﬁne variables or “messages” that live on the directed edges ( i , j ) of the network. The directed natureof these messages prevent infection from backtracking from an Infectious node back to its infection source,e.g., if node i infects node j , then we prevent j from re-infecting i . In addition to tracking the one-pointmarginal I j , we deﬁne a message I j → i from j to i as the probability that j is in the Infectious state as a resultof being infected from one of its neighbors other than i . Given these incoming messages, the rate at which I i evolves in time is given by d I i dt = − ρ I i + λ S i (cid:88) j ∈ ∂i I j → i , (2)where ∂i denotes the neighbors of i . Similarly, the rate at which I j → i evolves in time is given by d I j → i dt = − ρ I j → i + λ S j (cid:88) k ∈ ∂j \ i I k → j , (3)where k ∈ ∂j \ i denotes the neighbors of j excluding i .For the SIRS model, we let ρ and γ denote the transition rates from Infectious to Recovered and from4ecovered to Susceptible respectively. Then the rDMP system for the SIRS model is given by d I j → i dt = − ρ I j → i + λ S j (cid:88) k ∈ ∂j \ i I k → j , (4)which is coupled with the one-point marginals through d S i dt = γ R i − λ S i (cid:88) j ∈ ∂i I j → i d I i dt = − ρ I i + λ S i (cid:88) j ∈ ∂i I j → i d R i dt = ρ I i − γ R i . (5)In the SEIS model, upon becoming exposed to an infected neighbor, Susceptible nodes ﬁrst go through alatent period called the Exposed state. In this state, individuals are infected but not yet Infectious. Exposednodes become Infectious at the rate ε , and Infectious nodes again wane back to Susceptible at rate ρ . TherDMP system for the SEIS model is d E j → i dt = − ε E j → i + λ S j (cid:88) k ∈ ∂j \ i I k → j , d I j → i dt = − ρ I j → i + ε E j → i , (6)which is coupled with the one-point marginals as d S i dt = ρ I i − λ S i (cid:88) j ∈ ∂i I j → i d I i dt = − ρ I i + ε E i d E i dt = − ε I i + λ S i (cid:88) j ∈ ∂i I j → i . (7)Note that here we track messages for the Exposed state, in addition to one-point marginals, since they actas precursors for the Infectious messages. There is no need to track messages for the Susceptible state, sinceit does not cause state changes in its neighbors.Generalizing these equations to more complex epidemic models with k diﬀerent states, as opposed to threeor four, is straightforward. Even in a model where every state can cause state changes in its neighbors—forinstance, where having Susceptible neighbors speeds up the rate of recovery, or where Exposed nodes canalso infect their neighbors at a lower rate—the total number of variables we need to track in a networkwith n nodes and m edges is at most 2 mk in addition to the nk one-point marginals. In contrast, the pairapproximation requires mk states to keep track of the joint distribution of every neighboring pair. IV. EXPERIMENTS IN REAL AND SYNTHETIC NETWORKS

In this section we report on numerical experiments for rDMP for the SIS and SIRS models on real andsynthetic networks. As a performance metric, we use the average L error per node between the marginalscomputed from rDMP and the true probabilities computed (up to sampling error) using continuous-time5 IG. 3. Results on the SIS model. On the left, the marginal probability that node 29 in Zachary’s Karate club (seeinset on right) is Infectious as a function of time. We compare the true marginal derived by 10 independent MonteCarlo simulations with that estimated by rDMP, the independent node approximation, and the pair approximation.On the right is the L error, averaged over all nodes; we see that rDMP is the most accurate of the three methods.Here the transmission rate is λ = ρ = i for the n =

33 nodes in Zachary’s Karate Club, with thesame parameters as in Fig. 3. The vertical axis is the true marginal computed by Monte Carlo simulations; thehorizontal axis is the estimated marginals from rDMP (black (cid:63) ) and the pair approximation (blue × ). Both methodsoverestimate the marginal, but rDMP is closer to the true value (the line y = x ) for every node. Monte Carlo simulations. That is, L rDMP1 ( t ) = n (cid:88) i (cid:12)(cid:12) I MC i ( t ) − I rDMP i ( t ) (cid:12)(cid:12) , (8)We use this metric to compare the performance of rDMP with the independent-node approximation andthe pair approximation, or equivalently ﬁrst- and second-order moment closure [22, 28]. As we will see,for a wide range of parameters, rDMP is more accurate than either of these approaches, even though it iscomputationally easier than the pair approximation. 6 IG. 4. Comparison with a scatter plot of steady-state infection probability in the Zachary club. Horizontal axisis the steady-state infection probability calculated by DMP (black-asterisk) or the pair-approximation (blue-cross),whereas vertical axis is the result from the Monte Carlo simulation. Each point refers to the steady-state infectionprobability of one of the individuals in the club. Same parameters as in Fig. 3. Closer a point is to the green dasheddiagonal line, more accurate or closer DMP or the pair-approximation is to the actual Monte Carlo simulation.Sameparameters as in Fig. 3.FIG. 5. A contour plot of the di↵erence D ( t, ⇢ ) between L DMP1 ( t ) and L pair1 ( t ) for increasing values of the parameter ⇢ , i.e. D ( t, ⇢ ) = L DMP1 ( t ) L pair1 ( t ) in the Zachary’s network. A positive D ( t, ⇢ ) (colored red) means the errorfrom DMP is worse than that from the pair-approximation, whereas DMP outperforms the pair-approximations inthe blue regions. Same parameters as in Fig. 3, but we sweep through various value of the recovery rate ⇢ . where L DMP1 is deﬁned in Eq. (9). So if D ( t, ⇢ ) is positive (negative), the error from r -DMP is more (less)than that from the pair-approximation. In Fig. 5, keeping all the parameters the same as in Fig. 3 except ⇢ , we indeed see that r -DMP is only positive (colored red) at early times when ⇢ is relatively low.In Fig. 6, we compare the performance in a single instance of an Erd˝os-R´enyi graph (inset of the ﬁgure)with 100 nodes and a single initially infectious node. Transmission rate = 0.2, and recovery rate ⇢ = 0.10,and Monte Carlo results were averaged over 10 runs. We see that r -DMP does the best, except at earlytimes when pair-approximation marginally outperforms r -DMP.We also evaluated the performance of all three methods in various other networks like random-regulargraphs, random geometric graphs, scale-free networks, Newman-Watts-Strogatz small world network [cite],and a social network of dolphins [21]. We ﬁnd that r -DMP outperforms the ﬁrst-moment-closure approach7 FIG. 5. The diﬀerence between L rDMP1 and L pair1 on Zachary’s Karate Club for various values of the ratio ρ/λ . Werescale time so that λ = L rDMP1 < L pair1 and rDMP is more accurate; in the redregion, L rDMP1 > L pair1 . We see that rDMP is more accurate except at early times or when ρ/λ is small. In Fig. 3, we show results for the SIS model on Zachary’s Karate Club [34]. On the left, we show themarginal probability that a particular node is Infectious as a function of time, estimated by rDMP andby ﬁrst- and second-order moment closure, and compared with the true marginals given by Monte Carlosimulation. On the right, we show the average L error for the three methods. Here λ = ρ = runs. We see that rDMP is signiﬁcantly more accurate than the other two, exceptat some early times when the pair approximation marginally outperforms rDMP.As a further illustration, in Fig. 4 we show the steady-state marginal I i for each node i (measured byrunning the system until t =

50, at which point I i ( t ) is nearly constant), with the same parameters andinitial condition as in Fig. 3. We show the true marginal of each node on the y -axis, and the marginalsestimated by rDMP and the pair approximation on the x -axis. If the estimated marginals were perfectlyaccurate, the points would fall on the line y = x . Both methods overestimate the marginals to some extent,but rDMP is more accurate than the pair approximation on every node. Thus rDMP makes accurateestimates of the marginals on individual nodes, as opposed to just the average across the population.To investigate how rDMP compares with the pair approximation across a broader range of parameters,in Fig. 5 we vary the ratio between waning rate ρ and the transmission rate λ . Since we can always rescaletime by multiplying λ and ρ by the same constant, we do this by holding λ = ρ .We then measure the diﬀerence in the L error of the two methods, L rDMP1 − L pair1 .In the blue region, rDMP is more accurate than the pair approximation; in the red region, it is less so.We see that rDMP is more accurate except at early times (as in Fig. 3) or when ρ is small compared to λ ,i.e., if the model is close to the SI model where Infectious nodes rarely become Susceptible again.In Fig. 6, we simulate the SIS model on an Erd˝os-R´enyi graph with n =

100 and average degree 3, with λ = ρ = I that node 29 is Infectious; on the right, we show the L error for I i averagedover the network. In the insets, we show the marginal probability R for the Recovered state and thecorresponding average L error. Here the transmission rate is λ = ρ = γ = runs. As for the SIS model, rDMPis signiﬁcantly more accurate than the independent node approximation, and is more accurate than the pairapproximation except at early times.We found similar results on many other families of networks, including random regular graphs, randomgeometric graphs, scale-free networks, Newman-Watts-Strogatz small world networks, and a social network7 IG. 6. The fraction f of Infectious nodes as a function of time in the SIS model on an Erd˝os-R´enyi graph (inset)with n =

100 and average degree 3. Here = ⇢ = independent runs. Except at early times, rDMPtracks the true tra jectory more closely.FIG. 7. The SIRS model on the Karate Club. On the left, we show the true and estimated marginal probability thata node 29 is Infectious (main ﬁgure) or Recovered (inset) as a function of time. On the right is the average L errorfor the Infectious and Marginal states. The transmission rate is = ⇢ = = runs. As for the SIS model, rDMP is signiﬁcantly moreaccurate than the ﬁrst-order model where nodes are independent, and is more accurate than the pair approximationexcept at early times. of dolphins [29]. Namely, rDMP outperforms the ﬁrst-order approximation where nodes are independent,and outperforms the pair approximation across a wide range of parameters and times.8 FIG. 6. The fraction f of Infectious nodes as a function of time in the SIS model on an Erd˝os-R´enyi graph (inset)with n =

100 and average degree 3. Here λ = ρ = independent runs. Except at early times, rDMPtracks the true trajectory more closely.FIG. 7. The SIRS model on the Karate Club. On the left, we show the true and estimated marginal probability thata node 29 is Infectious (main ﬁgure) or Recovered (inset) as a function of time. On the right is the average L errorfor the Infectious and Marginal states. The transmission rate is λ = ρ = γ = runs. As for the SIS model, rDMP is signiﬁcantly moreaccurate than the ﬁrst-order model where nodes are independent, and is more accurate than the pair approximationexcept at early times. of dolphins [29]. Namely, rDMP outperforms the ﬁrst-order approximation where nodes are independent,and outperforms the pair approximation across a wide range of parameters and times.8 . LINEAR STABILITY, EPIDEMIC THRESHOLDS, AND RELATED WORK Systems of diﬀerential equations for rDMP, such as (3), do not appear to have a closed analytic formdue to their nonlinearities. On the other hand, we can compute quantities such as epidemic thresholdsby linearizing around a stationary point, such as { I ∗ j → i = } where the initial outbreak is small. Given aperturbation (cid:15) j → i = I j → i − I ∗ j → i , the linear stability of the system, i.e., whether or not (cid:15) j → i diverges in time,is governed by the eigenvalues of the Jacobian matrix J of the right hand side of (3) at the stationary point I ∗ i . The Jacobian for (3) at { I ∗ j → i } is J ( j → i ) , ( k → j (cid:48) ) = − δ kj δ ij (cid:48) ρ + λ ( − I ∗ j ) B ( j → i ) , ( k → j (cid:48) ) . (9)where B ( j → i ) , ( k → j (cid:48) ) = δ jj (cid:48) ( − δ ik ) . (10)This deﬁnition of B is another way of saying that the edge k → j inﬂuences edges j → i for i (cid:54) = k , but doesnot backtrack to k . This corresponds to our assumption that infections, for instance, do not bounce from k to j and back again and create an echo chamber eﬀect. For this reason, B is also known in the literature asthe non-backtracking matrix [36] or the Hashimoto matrix [31].Now, for a small perturbation (cid:126) (cid:15) away from a stationary point { I ∗ j → i } , the linearized system of (3) becomes d (cid:126) (cid:15)dt = J (cid:126) (cid:15) , (11)If J has any eigenvalues with positive real part, then (cid:107) (cid:126) (cid:15) ( t ) (cid:107) grows exponentially in time. So, the ﬁxed point { I j → i } is stable as long as the leading eigenvalue J of J has negative real part.One trivial, but important, stationary point to test is I ∗ j → i = (cid:126) J becomes J = λ (cid:16) B − ρλ (cid:17) , (12)where is the 2 m × m identity matrix. So, the leading eigenvalue of J becomes positive when the largesteigenvalue B of B is greater than ρ/λ . In other words, if R = λρ B (cid:62) R is the reproductive number, even a small initial probability of infection will lead to a widespreadendemic state, where the infection becomes extensive. If (13) does not hold, a small initial probability ofinfection will instead decay back to an infection-less state.Since B is not symmetric, not all its eigenvalues are real. However, by the Perron-Frobenius theorem,it’s leading eigenvalue is real; moreover, it is upper bounded by A , the leading eigenvalue of the adjacencymatrix A . Interestingly, if we examine the linear stability of the ﬁrst-order approximation where nodes areindependent, [22], the epidemic threshold for the SIS model is given by λρ A (cid:62) B (cid:54) A , the threshold (13) gives a better upper bound for the true epidemic threshold than wewould get from the ﬁrst-order approximation. A similar threshold for the SIR model in sparse networks, orequivalently for percolation, using B was recently demonstrated in [37]. (We note that when backtrackingis allowed, it has important consequences for epidemic thresholds on power-law networks [38].)Whereas the leading eigenvector of B governs the epidemic threshold, the spectral gap between B ’s toptwo eigenvectors governs how quickly the epidemic converges to the leading behavior (at least until we leave9 IG. 8. Same as in Fig. 3, but with transmission rate λ = ρ = A of the adjacency matrix(the Jacobian matrix of ﬁrst-moment-closure approach) of a network. In other words, if ρλ < A , it is known from theﬁrst-moment-method that an infection-free state becomes unstable and epidemics become widespread and endemic.Here we show the results from SIS model in Zachary’s Karate Club, where A ≈ ρλ = < A whichis well below the threshold from the ﬁrst-moment method, the contagion fades away eventually, which is correctlycaptured by our DMP approach. the linear regime). Qualitatively, this depends on bottlenecks in the network such as those due to communitystructure, where an epidemic spreads quickly in one community but then takes a longer time to cross overinto another. Indeed, the second eigenvector of the non-backtracking matrix B was recently used to detectcommunity structure [36].Similarly, just as the leading eigenvector of B was recently shown to be a good measure of importanceor “centrality” of a node [40], it may be helpful in identifying “superspreaders”—nodes where an initialinfection will generate the largest outbreak, and be the most likely to lead to a widespread epidemic. VI. CONCLUSION

Modern epidemiological studies often require recurrent models, where nodes can return to their previousinhabited states multiple times. For example, consider diseases such as inﬂuenza where individuals areinfected multiple times throughout their lives, or whooping cough where vaccine eﬀectiveness wanes overtime; in both cases, individuals return to the Susceptible class. In this paper we have extended DynamicMessage-Passing (DMP) to recurrent epidemic models. Our rDMP approach deﬁnes messages on the directededges of a network in such a way as to prevent signals, such as the spread of infection, from backtrackingimmediately to the node that they came from. By preventing these “echo chamber eﬀects,” rDMP obtainsgood estimates of the time-varying marginal probabilities on a wide variety of networks, estimating both thefraction of infectious individuals in the entire network, and the probabilities that individual nodes becomeinfected.Like the pair approximation, rDMP takes correlations between neighboring nodes into account. However,our experiments show that rDMP is more accurate than the pair approximation for a wide variety of networkstructures and parameters. Moreover, rDMP is computationally less expensive than the pair approximation,especially for complex epidemic models with a large number of states, using O ( mk ) instead of O ( mk ) variables for models with k states on networks with m edges.Finally, rDMP is conceptually simple, allowing the user to immediately write down the system of diﬀer-ential equations for a wide variety of epidemic models, such as those with multiple stages of infection or10mmunity [43, 44], or those with multiple interacting diseases [45, 46]. We expect that given its simplicityand accuracy, it will be an attractive option for future epidemiological studies. VII. ACKNOWLEDGMENTS

This work is supported by AFOSR and DARPA under grant [1] B. Karrer and M.E.J. Newman, Message passing approach for general epidemic models.

Phys. Rev. E , 016101(2010)[2] Joel C. Miller, Anja C. Slim and Erik M. Volz, Edge-based compartmental modelling for infectious diseasespread. Journal of the Royal Society Interface [Internet]. Phys. Rev.E , 022805 (2014)[4] F. Altarelli, A. Braunstein, L. Dall’Asta, and R. Zecchina, Large deviations of cascade processes on graphs. Phys. Rev. E Phys. Rev. E , 012811 (2015)[6] N. T. J. Bailey, The Mathematical Theory of Infectious Diseases and its Applications . Hafner Press, New York(1975).[7] R. M. Anderson and R. M. May,

Infectious Diseases of Humans . Oxford University Press, Oxford (1991).[8] M. Granovetter, Threshold models of collective behavior.

American Journal of Sociology , 14201443(1978).[9] M. Granovetter, The strength of weak ties.

American Journal of Sociology , 13601380(1973).[10] J.H. Miller and S.E. Page, The standing ovation problem.

Complexity , 8-16 (2004).[11] B. Gon¸calves, N. Perra, A. Vespignani, Modeling Users’ Activity on Twitter Networks: Validation of Dunbar’sNumber. PLoS ONE , e22656 (2011).[12] R. M. May and A. G. Haldane, Systemic risk in banking ecosystems.

Nature , 351-355 (2011).[13] F. Caccioli, M. Shrestha, C. Moore, and J. D Farmer, Stability analysis of ﬁnancial contagion due to overlappingportfolios.

Journal of Banking & Finance , 233-245 (2014).[14] P. Bak, K. Chen, and C. Tang, A forest-ﬁre model and some thoughts on turbulence. Phys. Lett. A, , 297-300(1990).[15] B. Drossel, and F. Schwabl, Self-organized critical forest-ﬁre model.

Phys. Rev. Lett. , 1629-1632 (1992).[16] P. Grassberger, Critical behaviour of the Drossel-Schwabl forest ﬁre model. New J. Phys, , 17 (2002).[17] M. M´ezard and A. Montanari, Information, Physics, and Computation.

Oxford University Press (2009).[18] C. Moore and S. Mertens,

The Nature of Computation.

Oxford University Press (2011).[19] R. Morris, Zero-temperature Glauber dynamics on Z d . Prob. Theory Rel. Fields, , 3-4 (2011).[20] R.I.M Dunbar, Neocortex size as a constraint on group size in primates.

Journal of Human Evolution

22 (6) ,469-493 (1992)[21] L. A. Meyers, Contact network epidemiology: Bond percolation applied to infectious disease prediction andcontrol,

Bulletin of the American Mathematical Society Networks: An Introduction . Oxford University Press (2010).[23] A.Y. Lokhov, M. M´ezard, H. Ohta, and L. Zdeborov`a, Inferring the origin of an epidemic with dynamic message-passing algorithm.

Phys. Rev. E , 012801 (2014)[24] F. Altarelli, A. Braunstein, L. Dall’Asta, A. Ingrosso, and R. Zecchina, The zero-patient problem with noisyobservations. J. Stat. Mech

P10016 (2014)[25] F. Altarelli, A. Braunstein, L. Dall’Asta, J.R. Wakeling, and R. Zecchina, Containing epidemic outbreaks bymessage-passing techniques.

Phys. Rev. X J. Stat. Mech

P09011 (2013)[27] F. Altarelli, A. Braunstein, A. Ramezanpour, and R. Zecchina, Stochastic optimization by message passing.

J.Stat. Mech

P11009 (2011)

28] M. A. Porter and J. P. Gleeson, Dynamical systems on networks: A tutorial. arXiv:1403.7663 (2014).[29] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, The bottlenose dolphincommunity of Doubtful Sound features a large proportion of long-lasting associations.

Behavioral Ecology andSociobiology , 396-405 (2003).[30] P. Zhang, and C. Moore, Scalable detection of statistically signiﬁcant communities and hierarchies: message-passing for modularity. Proceedings of the National Academy of Sciences (51), 18144-18149[31] K. Hashimoto, Zeta functions of ﬁnite graphs and representations of p -adic groups. Advanced Studies in PureMathematics AAAI Proceedings , (1982).[33] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´a, Asymptotic analysis of the stochastic block model formodular networks and its algorithmic applications. Phys. Rev. E , 066106 (2011).[34] W. W. Zachary, An information ﬂow model for conﬂict and ﬁssion in small groups. Journal of AnthropologicalResearch

33 (4) , 452-473 (1977).[35] M. J. Keeling and P. Rohani,

Modeling Infectious Diseases in Humans and Animals . Princeton and Oxford:Princeton University Press (2008).[36] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov´a, and P. Zhang, Spectral redemption inclustering sparse networks.

Proceedings of the National Academy of Sciences

110 (52) , 20935-20940 (2013).[37] B. Karrer, M. E. J. Newman, and L. Zdeborov´a, Percolation on sparse networks.

Phys. Rev. E , 208702(2014).[38] S. Chatterjee and R. Durrett, Contact processes on random graphs with power law degree distributions havecritical value 0.

The Annals of Probability , 2332–2356 (2009).[39] H W Watson, and Francis Galton, On the Probability of the Extinction of Families Journal of the AnthropologicalInstitute of Great Britain , , 138-144, (1875).[40] T. Martin, X. Zhang, M. E. J. Newman, Localization and centrality in networks. Phys. Rev. E , 052808(2014).[41] D. J. D Earn, J Dushoﬀ, S. A Levin, Ecology and evolution of the ﬂu. Trends in ecology & evolution , 334–340(2002).[42] M. G. M Gomes, L. J White, G. F Medley, Infection, reinfection, and vaccination under suboptimal immuneprotection: epidemiological perspectives. Journal of Theoretical Biology , 539–549 (2004).[43] S. Melnik, J. A. Ward, J. P. Gleeson, and M. A. Porter, Multi-stage complex contagions.

Chaos , 013124(2013).[44] J. C. Miller and E. M Volz, Incorporating Disease and Population Structure into Models of SIR Disease inContact Networks. PLoS ONE , (8) e69162 (2013).[45] B. Karrer and M. E. J. Newman, Competing epidemics on complex networks, Phys. Rev. E , 036106 (2011).[46] J. C. Miller, Cocirculation of infectious diseases on networks, Phys. Rev. E , 060801 (2013).[47] M. R. Golden, W. L. H. Whittington, H. H. Handsﬁeld, J. P. Hughes, W. E. Stamm, M. Hogben, A. Clark,C. Malinski, J. R. L Helmers, K. K. Thomas, and K. K Holmes, Eﬀect of expedited treatment of sex partnerson recurrent or persistent gonorrhea or chlamydial infection. New England Journal of Medicine , 676–685(2005).[48] P. H. Conway, A. Cnaan, T. Zaoutis, and B. V. Henry, R. W. Grundmeier, and R. Keren, Recurrent urinary tractinfections in children: risk factors and association with prophylactic antimicrobials.

Journal of the AmericanMedical Association , 179–186 (2007).[49] G. M. Jeﬀery, Epidemiological signiﬁcance of repeated infections with homologous and heterologous strains andspecies of Plasmodium. JBulletin of the World Health Organization , 873 (1966).[50] L. M. Drusin, B. G. Ross, K. H. Rhodes, A. N. Krauss, R. A. Scott, Nosocomial Ringworm in a NeonatalIntensive Care Unit A Nurse and Her Cat. Infection Control , 605–607 (2000)., 605–607 (2000).