Epidemic mitigation by statistical inference from contact tracing data
Antoine Baker, Indaco Biazzo, Alfredo Braunstein, Giovanni Catania, Luca Dall'Asta, Alessandro Ingrosso, Florent Krzakala, Fabio Mazza, Marc Mézard, Anna Paola Muntoni, Maria Refinetti, Stefano Sarao Mannelli, Lenka Zdeborová
EEpidemic mitigation by statistical inference from contact tracing data
Antoine Baker , Indaco Biazzo , Alfredo Braunstein , GiovanniCatania , Luca Dall’Asta , Alessandro Ingrosso , Florent Krzakala ,Fabio Mazza , Marc M´ezard , Anna Paola Muntoni , Maria Refinetti ,Stefano Sarao Mannelli , and Lenka Zdeborov´a Laboratoire de Physique de l’Ecole Normale Sup´erieure, Universit´e PSL, CNRS, Sorbonne Universit´e,Universit´e Paris-Diderot, Sorbonne Paris Cit´e, Paris, France Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino, Italy Italian Institute for Genomic Medicine, Via Nizza 52, Torino, Italy Collegio Carlo Alberto, Via Real Collegio 30, Moncalieri, Italy & INFN Sezione di Torino, Via P.Giuria 1, Torino, Italy The Abdus Salam International Centre for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy Universit´e Paris-Saclay, CNRS & CEA, Institut de physique th´eorique, 91191, Gif-sur-Yvette,France ´Ecole Polytechnique F´ed´erale de Lausanne, 1015 Lausanne, Switzerland Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as theCOVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devicescan play an important role. While a lot of attention has been paid to analyzing the privacy andethical risks of the associated mobile applications, so far much less research has been devoted tooptimizing their performance and assessing their impact on the mitigation of the epidemic.We develop Bayesian inference methods to estimate the risk that an individual is infected. Thisinference is based on the list of his recent contacts and their own risk levels, as well as personalinformation such as results of tests or presence of syndromes. We propose to use probabilisticrisk estimation in order to optimize testing and quarantining strategies for the control of anepidemic. Our results show that in some range of epidemic spreading (typically when the manualtracing of all contacts of infected people becomes practically impossible, but before the fractionof infected people reaches the scale where a lock-down becomes unavoidable), this inference ofindividuals at risk could be an efficient way to mitigate the epidemic. Our approaches translateinto fully distributed algorithms that only require communication between individuals who haverecently been in contact. Such communication may be encrypted and anonymized and thuscompatible with privacy preserving standards. We conclude that probabilistic risk estimationis capable to enhance performance of digital contact tracing and should be considered in thecurrently developed mobile applications. 1 a r X i v : . [ q - b i o . P E ] S e p otivation One of the main tools public health authorities use to mitigate the spread of apandemic, such as COVID-19, is the trace-test-isolate strategy. Identifying, calling, testing, and ifneeded quarantining the recent contacts of an individual who has just been tested positive is thestandard route for limiting the transmission of a highly contagious virus. This standard strategyproves its efficacy at early stages of the epidemic, when the number of newly infected individualsis small enough to be manageable by reasonable-scale manual contact tracing infrastructures.However, it cannot be applied as such when the epidemic starts to spread faster, because theaverage number of contacts of a typical individual in the few days before he is tested positive canbe large, not all contacts are with people known to the individual, and manual tracing incursdelays during which infected contacts keep on spreading the virus.For these reasons, and taking into account the properties and parameters of the COVID-19epidemic, digital contact tracing was convincingly argued to be a viable route to mitigation ofCOVID-19 and other similar epidemics [1]. Current mobile phone technology indeed enablesautomated, real-time proximity tracing between individuals and many works in this directionwere initiated and deployed in past months [2, 3, 4, 5, 6, 6]. With currently developed mobileapplications, the distance and duration of a contact between two individuals can be estimated.Furthermore, contextual or health information about individuals can be included as well. Thistracing can be used while preserving the privacy of each individual’s information, the level ofprivacy protection depending on the protocol. While many works have been devoted, justifiably,to the compatibility of privacy and tracing, see e.g. [2, 4, 6, 7, 8], much less work is availableconcerning the assessment of the efficiency of tracing in mitigation of the pandemic. Mostconsidered systems use the tracing data simply as a fast and scalable device to identify all recentcontacts, in order to notify and perhaps isolate all of them.In this paper we show that approximate Bayesian probabilistic inference techniques allow touse the data exchanged by the tracing applications in order to provide highly accurate estimatesof the probability that any given individual is infected. This estimate can then be used in orderto focus the tests and other interventions on the group of individuals who have the largestprobabilities of being infected, even if they do not show symptoms. The contact-tracing protocolsthat we propose require individuals that have been in contact in the recent past to be able toexchange messages about their risk level. When two individuals meet, they exchange a smallamount of information (typically through Bluetooth). Later on, these individuals exchangemessages carrying information about their current status, e.g. an increased risk due to presenceof syndromes associated with the illness or due to their history of past contacts. Probabilisticinference then concatenates this information from all past contacts locally on the individualsphone and sends updates of the status to their contacts.We shall describe hereafter two concrete algorithms to perform the inference; one which is moreaccurate, based on Belief Propagation [9], and a second one which is a simpler approximationbased on the so-called Mean-Field method. The latter requires smaller communication bandwidthbetween devices and could be potentially more privacy-friendly. Both the algorithms are inspiredby our previous work [10, 11, 12, 13, 14, 15, 16, 17], and adapted to the present contact tracingproblem. For a given testing capacity we show, through extensive simulation on realistic modelsfor COVID-19 diffusion, that both methods allow to significantly reduce the impact of an outbreakand eventually contain the epidemics in many cases where standard tracing protocols fail to doso. We additionally evaluate robustness of the methods to presence of false negative tests as wellas partial adoption of the contact-tracing mobile applications.
Related work
While most considered systems use the tracing data simply as a fast and scalabledevice to identify all recent contacts, works aiming at estimating the risk of infection appearedrecently. These include a machine learning based risk estimation proposed in [5], that providesonly limited validation of the approach even on data specific to the privacy preserving protocol,and is thus difficult to directly compare to our approach. Preliminary version of our work wasfirst presented in Ellis COVID-19 workshop [18]. An approach similar to our MF algorithms2as proposed by [19] using Monte Carlo sampling to estimate the corresponding probabilities.This work does not provide validation of the approach involving the control of the epidemic.Another recent work estimating the risks from the tracing data is [20] that is in the spirit similarto our work, it uses Monte Carlo sampling based estimations of the risks. The authors of [20]evaluate their approach only on data that come from the model that is assumed in the inferencealgorithm. A key aspect of our work is that we test on data coming from a much more complexmodel than assumed when designing the inference algorithm [21]. We believe that this is cruciallyimportant for eventual validation on real world contact data, which are not available to us atthis point. We note that the authors of [20] evaluate their approach on networks up to 10kindividuals, compared to 500k used in our simulations. As is common with Monte-Carlo schemes,convergence properties could significantly deteriorate with system size. The lack of separationbetween the epidemic generating model and the inference procedure in the implementation of[20] makes it difficult for us to compare directly to our approach, and we thus leave it for futurework. The python code used for our simulations [22] is modular and comparisons with otherinference procedures can be performed by adding new modules.
Scheme of propagation
A convincing validation of any individual-level intervention policyrequires extensive simulations by means of sufficiently detailed agent-based models. In suchmodels, at each time, a given individual is in a state that belongs to a finite set of possiblestates, like for instance susceptible, exposed, infected-asymptomatic, infected-asymptomatic, inICU, recovered, or dead. The most accurate mathematical descriptions of COVID-19 epidemicpropagation are based on complex multi-state compartment models, in which infected individualsare not immediately contagious upon infection, may be asymptomatic or develop mild/severesymptoms with some delay, and the ages, households and workplaces are also taken into account.Even though the long-term effects of SARS-CoV-2 infection are still under study, it seemsreasonable to assume that some level of immunity is developed with recovery, so that theindividual progression through the epidemic compartments is not recurrent (a recovered persontypically does not become infected again). The observation of non-trivial distributions ofincubation and recovery times as well as that of time-dependent viral transmission capacity[23, 24, 25, 26, 26] indicate that the most realistic models for SARS-CoV-2 infection clearly departfrom the simplest, and largely adopted, Markovian epidemic models. In addition, such models alsoprovide representations of the time-varying contact network over which viral transmissions occur,some including real-world mobility data [27] or computer-generated synthetic surrogates [21, 28].In particular, the model in Ref. [21] simulates the spread of COVID-19 in urban age-stratifiedpopulations with a multi-layer contact network (see also Supplemental Material A for details).
Bayesian Epidemic Tracing
Information regarding the status of tested or symptomatic in-dividuals can be used in different ways within a contact tracing procedure. In the simplestsituation, the observation of an infected individual involves tracing (a fraction of) his recentcontacts in order to prevent/contain further transmissions of the infection. It is also possibleto infer transmission chains and detect the parent cases which are the origin of the infectiondetected in an individual. In a Bayesian approach, this is made possible assuming as a prior aparticular probabilistic model of epidemic propagation and using it to define a likelihood functionconditioned on the evidences coming from observational data, i.e. tests (PCR and/or serology)and self-reported symptoms from a fraction of the individuals. As we shall show, the adoptedprior inferential model provides the mathematical framework for developing risk assessment, butit does not need to reflect the real epidemic spread in all its details in order to allow for valuableinference. Indeed, for epidemic propagation generated with complex agent-based models along thelines described above, we show that the approximate computation of local probability marginalscan be effectively obtained using as a prior much simpler inferential models. In practice in thispaper we use a simple agent-based Susceptible-Infected-Recovered (SIR) model. We emphasizethat, depending on the approximate inference technique used for computing such probability3arginals, some of the ingredients of realistic epidemic propagation can be reintroduced, such asnon-Markovian evolution between states, time-dependent infectiousness or more compartments.Here we propose two distributed algorithms for risk estimation from contact tracing data,which are both derived within a Bayesian framework and based on a message-passing principle:Mean-Field (MF) algorithm and Belief Propagation (BP) algorithm. They both provide estimatesof the time-dependent local probability marginals of being infected which can be used to estimatethe risk level of each individual; this risk of being infected can in turn be used to implement asanitary protocol, like suggesting higher risk individuals to be tested and/or quarantined. Themain differences between these methods, which are presented in the next section, are in theaccuracy of the approximation of the epidemic dynamics and in the way in which observationaldata are incorporated into the probabilistic model.
Methods
We use the Bayesian approach that is based on a prior description of the epidemicprocess, and a method to include the information from observations. We present here, and usein this paper, a prior description based on a discrete-time Markov chain corresponding to theSIR model. It can be generalized to non-Markovian, continuous-time, and other model settings.Let x ti be the state of an individual at time t (it is convenient to think of t as a number ofdays), with x ti ∈ X and X a finite set of epidemic states. We use X = { S, I, R } for the case inwhich the susceptible ( S ), infected ( I ) and recovered ( R ) individual states are considered. Thestate of individual i at time t , x ti , depends on her state at the previous time, and on the statesat time t − j that she has met between times t − t . We denoteby ∂i ( t ) this set of individuals, and by x t − ∂i = { x t − j } for j ∈ ∂i ( t ). Then p (cid:0) x ti | x t − ∂i , x t − i (cid:1) is theprobability of individual transitions for i occurring between time t − t . For the SIRmodel, this probability depends on the following parameters: • the recovery rate µ i , defining the daily probability that the infected individual i moves tothe recovered state ’R’; • the transmission rates { λ k → i ( t ) } k ∈ ∂i ( t ) , which are the probability of infection from aninfected k to a susceptible i on day t .Let x = (cid:8) x ti (cid:9) t =0 ,...,Ti =1 ,...,N be a collective time-trajectory generated by the epidemic. The priorprobability associated with this trajectory is defined by p ( x ) = (cid:89) i p (cid:0) x i (cid:1) T (cid:89) t =1 p (cid:0) x ti | x t − ∂i , x t − i (cid:1) , (1)where we assumed a factorized probability of initial state x = { x i } i =1 ,...,N , i.e. p (cid:0) x (cid:1) = (cid:81) i p (cid:0) x i (cid:1) .We can now include the effects of observations. Given a set O of observations O = {O r } r ∈ O ,where each observation r provides some information on the state of an individual at a given time(as the result of tests, or of individual symptoms), and assuming that these observations arestatistically independent, the posterior probability of the trajectory x can be expressed usingBayes theorem as p ( x |O ) = 1 p ( O ) p ( x ) (cid:89) r p ( O r | x )= 1 p ( O ) (cid:89) i p (cid:0) x i (cid:1) T (cid:89) t =1 p (cid:0) x ti | x t − ∂i , x t − i (cid:1) (cid:89) r p ( O r | x ) . (2)For the BP approach, we start from the posterior at (2) and remark that it can be written as: p ( x |O ) = 1 Z (cid:89) i ψ i ( x i , x ∂i ) . (3)4elief Propagation [9] can then be used to estimate marginal posterior probabilities from (3).However, a straightforward factor graph representation [29, 30] of (3) with { x i } as variable nodesand the compatibility functions { ψ i ( x i , x ∂i ) } as factor nodes, contains many short loops, so thatthe corresponding BP equations would not be exact even when the underlying contact networkis acyclic. We instead construct a factor graph representation that closely reflects the topologicalstructure of the contact network by associating the individual trajectories of a pair of individualsin contact, and involves BP messages m ij ( x i , x j ) for pairs of trajectories. The correspondingBP fixed-point system for { m ij ( x i , x j ) } ( ij ) ∈ E is solved by iteration. This formalism has beenemployed for large-deviation analyses of a class of dynamical processes including applications toepidemics [12, 13, 15, 31, 32], in particular regarding the patient zero problem and the inferenceof causality chains of infection. We extended here previous works to deal with non-Markovianprocesses and to make it computationally efficient through a limited time-window approximation(see Supplemental Material).Restricted to Markovian epidemic models, a simpler, Mean-Field approximation can be devisedstarting from (1). It is based on assuming that p (cid:0) x t (cid:1) ≈ (cid:81) i p (cid:0) x ti (cid:1) , so that p (cid:0) x t +1 i (cid:1) ≈ (cid:88) x ti , x t∂i p (cid:0) x t +1 i | x t∂i , x ti (cid:1) p (cid:0) x ti (cid:1) (cid:89) j ∈ ∂i p ( x tj ) . (4)Thanks to this factorization, one can write closed equations for the evolution of the individualprobabilities p ( x ti ) for the simple prior model (1) along the same spirit as presented in [10, 11],(see Supplemental Material).For risk inference, we need to estimate the p ( x ti | O ) in the full model (2) that includes theobservations. We show in the Supplemental Material a heuristic that incorporates the presenceof observations done at a time t obs into the Mean-Field equations. The algorithm propagates theinformation on the population from t MF days before the current time and it simply takes intoaccount the following facts: • If an individual is tested ’S’ at time t obs , it has been ’S’ at all previous times • If an individual is tested ’R’ at time t obs , it will be ’R’ at all following times • If an individual is tested ’I’ at time t obs , we assume that he has been ’I’ at times [ t obs − τ, t obs ],where τ , the typical time between infection and observation, is a parameter of the algorithm. Results
We test the inference of risk on two types of epidemic spreading and contact networks. • SIR spreading model on proximity-based random network: This is a simple SIR-model-based propagation in a population of N individuals, where the graph of contact is updateddynamically at each step as follows. The individuals are distributed uniformly in a squareof side √ N , and at each time step a contact can be established between two individuals i and j with a probability e − d ij /(cid:96) , where d ij is the Euclidean distance between the pointsand (cid:96) is a parameter that controls the density of the contact graph. We shall call this thegeometric contact model. • Oxford OpenABM model: The second model is a much more realistic epidemic spreadmodel [1], which is aimed at capturing essential features of the contacts in real populationsas well as the real epidemiology of COVID-19, we call it the OpenABM model. In theabsence of sufficiently detailed real world data, we view the data from the OpenABM modelas realistic and our main point is to demonstrate that even though the proposed inferenceprocedures do not capture most of the details and complexity of this model, they still workand provide large improvement over competing current contact tracing methods.For the OpenABM model, we use in MF inference an extremely simplified hypothesis ofequal recovery rates, µ i = µ , and transmission probabilities only divided into two classes (inter-household contacts, with λ k → i ( t ) = λ and intra-household contacts, with λ k → i ( t ) = 2 λ ). The5alues of parameters µ and λ are chosen on the basis of population averages; they could alsobe inferred from data. Note that arbitrarily heterogeneous parameters can be used if moreinformation is available, such as the duration of a contact. It is important to notice that theMF algorithms that we derive from this simple model turn out to be very efficient at predictingthe risk of an individual to be infected, even in sophisticated propagation models that involveindividual and time-dependent rates µ i and λ k → i ( t ). The BP inference model is slightly morecomplex (see Section C.4 for details), although still much simpler than the original OpenABMpropagation model.In both models, we start the simulation at time 0 with everybody in the susceptible state S except a small number of infected individuals. The number of these ”patients-zero” will bespecified in the following for each case.We apply the following testing protocol. We observe a fraction of symptomatic individuals atthe day of symptoms. After a fixed number of days ( t s tart ) the interventions start. Every day, weperform a fixed amount n r of tests to the top individuals ranked as having the largest probabilitiesof being infected, according to the different risk estimation strategies. We assume that the resultof the test is available on the same time step (day) and is included in the observations used toadjust the probabilities of risk on the next time step (day).Besides BP and MF risk estimation, the ranking strategies considered for comparison are: • Random Guessing (RG): The n r individuals on which the tests are performed are randomlychosen among the individuals that were not previously tested positive. • Contact Tracing (CT): One ranks the individuals who have not been tested positivepreviously according to the number of contacts with confirmed positive individuals duringthe time interval [ t − τ, t [, and tests the n r individuals with the largest number of contacts.This is what would be possible to implement with the currently deployed mobile applications.For BP and MF inference, the ranking is done as follows: • Belief Propagation (BP) and Mean-Field (MF): One uses the algorithm (BP or MF) inorder to estimate the probabilities q ti = p ( x ti = I ) of being infected at time t . Individualswho have not been tested positive previously are ranked according to their risk q ti , the n r individuals with largest risk are tested. For BP, the rank is computed as the probability ofinfection in the last δ rank days. Prioritizing recent infections can be more effective as ithelps containing the ”boundary” of an ongoing outbreak.We compare test-guided containment strategies based on MF, BP, RG and CT in a scenariowhere quarantines are put in place when tested individuals result infected. We show that BPand MF-based methods are able to predict infections and control the epidemic considerably moresuccessfully than the classic contact tracing strategy. Implementation of the MF and BP riskestimation algorithms and all the tests that follow can be found at [22].We evaluate the proposed framework in a pessimistic regime with 200 simultaneous independentoutbreaks that are discovered after ten days. We start with a simulation of the proximity-basedrandom network. Figure 1 shows the development of the epidemic over three months in apopulation of 500 000 individuals, starting from 200 patients-zero, and performing n r = 1500tests per day. In spite of rather large fluctuations from run to run, one sees a very clear signalindicating that the proposed inference methods, MF and BP, largely improve upon the usual CT,which is itself better than RG. The best inference method is clearly BP, but the simpler MF,which is less demanding in terms of the amount of information exchanged between individuals,and therefore easier to protect for better privacy, is also quite successful. Even in this pessimisticregime both risk inference methods allow to slow down the epidemic spread by more than amonth compared to classic contact tracing.In this first test we assumed that the model used for BP and MF inference coincides exactly withunderlying epidemic propagation model (SIR), which is overly optimistic. A much more stringent6
20 40 60 80
Days N u m b e r o f i n f e c t e d RGCTMFBP
Figure 1: Spreading of the epidemics in a 2D geometric graph with 6 contacts on averageper day (scale of the graph is 1) and 500 000 individuals. The parameters of theforward simulation of the SIR model are the same as used by the inference algorithms: λ = 0 . , µ = 0 .
02. In the plot we show the average numbers (bold lines) of infectedindividuals versus time of simulations among three different realizations (thin lines)with 200 patients zero. The system freely evolves for the first 10 days, then interventionsstart. We observe 50% of the infected individuals 5 days after their infection. Weperform 1500 tests every day according to the ranking given by the algorithms. Theobserved infected individuals are quarantined. The MF parameters are τ = 5, t MF = 15. N u m b e r o f i n f e c t e d
625 obs.
RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP N u m b e r o f i n f e c t e d
625 obs. HH
RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP
Figure 2: Effect of the control strategy on the epidemic spreading, according to the OpenABMmodel, in a population of 500 000 individuals. In all the panels we show the numberof infected individuals in a time window of 100 days when interventions are appliedstarting from day 10. The number of patients zero here is set to 50. Thin lines representthe results for single instances of the epidemics, while the thick line is the averageamong the different realizations. We compare the effect of an increasing number ofavailable medical tests per day (from left to right), performed to the individuals athighest risk as evaluated by the corresponding strategy (RG, CT, MF and BP). Thetop panels depict a scenario where only tested positive individuals are confined, limitingtheir contacts to the cohabitants, while the bottom panels show how the number ofinfected individuals change if the entire household is quarantined whenever an infectedmember is detected. 7est has been performed on the more realistic OpenABM COVID-19 model. In this framework,each infected individual can either be asymptomatic or show symptoms of various degree (mild orsevere). As in a realistic setting, we will consider an additional source of information: individualsthat show severe symptoms are immediately quarantined when symptoms emerge (typically 5days after infection) or hospitalized. In addition, half of the mildly symptomatic individuals isassumed to self-report and self-isolate as well. No direct information is available on asymptomatic(or pre-symptomatic) infected individuals, their detection is possible only through contact tracing.We mimic a post lock-down scenario where only a small fraction of individuals is initiallyinfected, i.e. few tens of patients zero in a population of 500 thousand individuals that all employa contact-tracing application. The epidemic dynamics freely evolves according to the OpenABMmodel [21] for ten days and then a number of individuals with the highest infection risk, assessedby RG, CT, MF, or BP, is tested on a daily basis. The MF algorithm assumes a Markovian SIRspreading with parameters λ = 0 .
02 and µ = 1 /
12, and has parameters τ = 5 and t MF = 10. SeeSection C.4 for details on the BP parameters. In these simulations, the result of the medicaltests is assumed to be exact; errors in tests will be addressed below. The number of medicaltests associated with the individuals detected by the mobile application is fixed while there isno limitation on those performed to the fraction of symptomatic people presented above. Theoriginal contact dynamics is then modified in agreement to two alternative strategies: testedpositive are confined and can have contacts only with their cohabitants or, whenever one personresults positive to the medical test, all the households are quarantined without being tested.Figure 2 shows the number of infected individuals in a time interval of 100 days when thenumber of initial infections is 50 and the intervention starts after 10 days. In the top panels,we show the results for three independent realizations of the epidemics in the case where testedpositive individuals only are quarantined, while in the bottom panels we show the results for amore restrictive intervention scenario in which all the households are confined. The number ofavailable tests per day increases from 625 to 5000 (from left to the right panels). The lines arecolored according to the adopted ranking strategy and the thick lines show the mean numberof infected individuals mediated on the three instances. The results suggest that for both theinference strategies the size of the epidemics is significantly reduced if compared to the randomtesting and also to the classic contact tracing, even when few tests are available. We remarkthe behavior of the BP-based strategy when 1000 (1250 when the confinement is not extendedto the households) daily medical tests are performed: the confinement of the people inferredby this method suffices to stop the epidemic after 75 days. The MF-based strategy performsnotably better than CT and it achieves similar performances to BP when the number of dailyobservations is large.In Figure 5 of the Supplemental Material we show the number of infected individuals for atime window of 100 days in, to some extent, a different scenario, that is when the containmentmeasures are applied earlier in time, after a week from the beginning of the epidemics, and thesize of the epidemics at initial time is smaller than that examined in Figure 2. Robustness of the inference
In the previous section we investigated how several interventionprotocols (differing in the treatment of the households and the number of available tests) controlrealistic epidemics when paired to the considered risk assessment strategies (RG, CT, MF andBP). However, some of the conditions assumed in that section are not realistic. In reality, thesensitivity of medical tests is not 100% and it is to be expected that only a fraction of thepopulation will adopt the app, so that not all contacts are detectable. In this section, we shalladdress these two issues, focusing on the more realistic OpenABM model.We first consider the case in which the results of the medical tests are inaccurate and thereforethere exists a fraction of the tested individuals incorrectly identified as uninfected or infected.Concerning the fraction of false positive tests this simply puts a small additional fraction ofindividual in isolation, but does not lead to deterioration of the epidemic control. We hence focuson the influence of false negatives and test how the performance depends on the false negative8ate (FNR) of the medical tests. We remark that within the Bayesian framework it is possible tocorrectly include this information in a straight-forward way and we do so for the BP algorithm,but not for the MF as we want to keep it as simple as possible and test its robustness. In Figure3 we show the results of several simulations (three different realizations of the dynamics), startingfrom the setting in Figure 2 with 2500 medical tests and the quarantine of the entire households,when the FNR spans the range [0 . , . N u m b e r o f i n f e c t e d FNR 0.09
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.15
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.19
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.25
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.31
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.4
RGCT MFBP
Figure 3: Effect of tests inaccuracy to the evolution of the controlled epidemics. We simulate thesame intervention protocol as Figure 2 for 2500 daily observations (bottom panel). Weconsider here the effects of an additional source of noise, that is a non negligible falsenegative rate (FNR) of the results of the medical tests, from 0.09 to 0.40.We now turn to the study of partial adoption of the mobile application. This is done inthe simulation by hiding the contacts of a fraction of individuals (which are unknown to theinference algorithm): these hidden contacts represent individuals without the application orwithout smartphone. Figure 4 shows the result of mitigation, in the OpenABM model, with AF(the fraction of individuals who have adopted the app) ranging between 0.6 and 0.9. It showsthat the method is still effective in presence of partially detected contacts. Although performanceis severely affected, one observes that even at AF equal to 0.6 the use of inference algorithmsallows to delay the spreading of the epidemic and to flatten the peak of infected individuals,way more efficiently than the classical contact tracing strategy. Furthermore, it should be notedthat application utilization may be positively correlated to the number of contacts of individuals.Including more detailed information about mobile application utilization e.g. in population ageclasses may greatly reduce the impact of low adoption. Similar results are presented in theSupplemental Material, Figure 7 for a smaller number of daily observations.9
50 100Days5×10 N u m b e r o f i n f e c t e d AF 0.9 randomtracing MFBP N u m b e r o f i n f e c t e d AF 0.8 randomtracing MFBP N u m b e r o f i n f e c t e d AF 0.7 randomtracing MFBP N u m b e r o f i n f e c t e d AF 0.6 randomtracing MFBP
Figure 4: Effect of a poor adoption fraction of the mobile application to the number of infectedindividuals. We simulate the same intervention protocol as Figure 2 for 5000 dailyobservations. We assume here that only a fraction of the population, from 90 % to60%, uses the mobile application for contact tracing.
Conclusion
The above results show that, in the regime where the epidemic is growing andexhaustive testing of all contacts is unfeasible, inference methods allow to contain the epidemicsmore efficiently than the classical tracing of contacts, which itself is better than random testing.Both inference schemes require exchange of information between individuals during a limitedtime window after they have been in contact, and could be implemented in contact-tracingsmartphone applications in a distributed way. Additionally, numerical tests show that theapproach is robust to false negatives in the test results as well as to partial adoption of the mobiletracing applications, although the adoption rate required for efficient control of the epidemicwith the number of daily tests considered is large relative to the one of the currently deployedapplications.The volume of daily exchanged messages per pair of individuals in the two proposed methodsis constant with respect to both the population size and time (a total of about 1kB for MF, 1MBfor BP per individual assuming ∼
10 daily contacts). This volume is negligible when comparedwith normal data usage. A privacy-preserving implementation will require an additional overload,but the computational burden on the phone’s CPU will remain negligible.Having access to the estimated posterior probability of being infected in time, a series ofthreshold values could be put in place so as to suggest actions on individuals, including reductionof contacts, self-isolation and testing.With regard to privacy, it is worth emphasizing that the proposed inference methods arein principle more protective than the usual (manual) tracing. On the one hand, both can beimplemented in a fully distributed way using point to point cryptography without fully centralizedprocessing and storage of information on infections or contacts. On the other hand, by identifyingindividuals who have the largest probability of being infected through a cumulative process bywhich information is integrated, the direct attribution of potential infection events to a givenindividual is made much harder. Details of such fully privacy preserving implementation, alongthe lines of [4], are left for future work.
Acknowledgments
We would like to thank the ELLIS network for organizing a series ofCOVID-19 related workshops. We also want to thank Yoshua Bengio, Irina Rish, the MILAteam working on the contact tracing problem, as well as Luca Ferretti and Ivan Bestvinafor numerous enlightening discussions. We acknowledge computational resources providedby HPC@POLITO ( ) and well as Google Cloud for the SIPARgrant in COVID-19 research credits program. This work has been partially supported by theSmartData@PoliTO ( http://smartdata.polito.it ) center on Big Data and Data Science, theFrench Agence Nationale de la Recherche under grant ANR-17-CE23-0023-01 PAIL and fromthe Chaire CFM-ENS on data science. 10 eferences [1] Luca Ferretti, Chris Wymant, Michelle Kendall, Lele Zhao, Anel Nurtay, Lucie Abeler-D¨orner, Michael Parker, David Bonsall, and Christophe Fraser. Quantifying sars-cov-2transmission suggests epidemic control with digital contact tracing.
Science , 368(6491),2020.[2] Jason Bay, Joel Kek, Alvin Tan, Chai Sheng Hau, Lai Yongquan, Janice Tan, and Tang AnhQuy. Bluetrace: A privacy-preserving protocol for community-driven contact tracing acrossborders.
Government Technology Agency-Singapore, Tech. Rep , 2020. https://bluetrace.io/static/bluetrace_whitepaper-938063656596c104632def383eb33b3c.pdf arXiv preprint arXiv:2005.12273 ,2020.[5] Hannah Alsdurf, Yoshua Bengio, Tristan Deleu, Prateek Gupta, Daphne Ippolito, RichardJanda, Max Jarvie, Tyler Kolody, Sekoul Krastev, Tegan Maharaj, et al. Covi white paper. arXiv preprint arXiv:2005.08502 , 2020.[6] Justin Chan, Shyam Gollakota, Eric Horvitz, Joseph Jaeger, Sham Kakade, TadayoshiKohno, John Langford, Jonathan Larson, Sudheesh Singanamalla, Jacob Sunshine, et al.Pact: Privacy sensitive protocols and mechanisms for mobile contact tracing. arXiv preprintarXiv:2004.03544 , 2020.[7] Hyunghoon Cho, Daphne Ippolito, and Yun William Yu. Contact tracing mobile apps forcovid-19: Privacy considerations and related trade-offs. arXiv preprint arXiv:2003.11511 ,2020.[8] Ramesh Raskar, Isabel Schunemann, Rachel Barbar, Kristen Vilcans, Jim Gray, PraneethVepakomma, Suraj Kapa, Andrea Nuzzo, Rajiv Gupta, Alex Berke, et al. Apps gone rogue:Maintaining personal privacy in an epidemic. arXiv preprint arXiv:2003.08567 , 2020.[9] Jonathan S Yedidia, William T Freeman, and Yair Weiss. Understanding belief propagationand its generalizations.
Exploring artificial intelligence in the new millennium , 8:236–239,2003.[10] Andrey Y. Lokhov, Marc M´ezard, Hiroki Ohta, and Lenka Zdeborov´a. Inferring the origin ofan epidemic with a dynamic message-passing algorithm.
Physical Review E , 90(1), 07 2014.[11] Andrey Y Lokhov, Marc M´ezard, and Lenka Zdeborov´a. Dynamic message-passing equationsfor models with unidirectional dynamics.
Physical Review E , 91(1):012811, 2015.[12] Fabrizio Altarelli, Alfredo Braunstein, Luca Dall’Asta, Alejandro Lage-Castellanos, andRiccardo Zecchina. Bayesian inference of epidemics on networks via belief propagation.
Physical Review Letters , 112(11):118701, March 2014.[13] Fabrizio Altarelli, Alfredo Braunstein, Luca Dall’Asta, Alessandro Ingrosso, and RiccardoZecchina. The patient-zero problem with noisy observations.
Journal of Statistical Mechanics:Theory and Experiment , 2014(10):P10016, October 2014.[14] F. Altarelli, A. Braunstein, L. Dall’Asta, J. R. Wakeling, and R. Zecchina. Containingepidemic outbreaks by message-passing techniques.
Physical Review X , 4(2):021024, May2014. 1115] Alfredo Braunstein and Alessandro Ingrosso. Inference of causality in epidemics on temporalcontact networks.
Sci Rep , 6:27538, June 2016.[16] Jacopo Bindi, Alfredo Braunstein, and Luca Dall’Asta. Predicting epidemic evolution oncontact networks from partial observations.
Plos one , 12(4):e0176376, 2017.[17] Alfredo Braunstein, Alessandro Ingrosso, and Anna Paola Muntoni. Network reconstructionfrom infection cascades.
Journal of The Royal Society Interface , 16(151):20180844, February2019.[18] https://ellis.eu/covid-19/events/ellis-against-covid-19-06-05-2020, 5 2020.[19] https://github.com/ViraTrace/InfectionModel , 4 2020.[20] Ralf Herbrich, Rajeev Rastogi, and Roland Vollgraf. Crisp: A probabilistic model forindividual-level covid-19 infection risk estimation based on contact data. arXiv preprintarXiv:2006.04942 , 2020.[21] Hinch R, Probert W, Nurtay A, Kendall M, Wymant C, Hall M, Lythgoe K, Cruz A B, ZhaoL, Stewart A, Ferretti L, Bonsall D Abeler-Dorner L, and Fraser C. Covid-19 agent-basedmodel with instantaneous contract tracing. Technical report, 2020.[22] https://github.com/sibyl-team/epidemic mitigation, 2020.[23] Soufiane Bentout, Abdennasser Chekroun, and Toshikazu Kuniya. Parameter estimationand prediction for coronavirus disease outbreak 2019 (covid-19) in algeria. 7:306–318, 052020.[24] Nicolas Franco. Covid-19 belgium: Extended seir-qd model with nursery homes and long-termscenarios-based forecasts from school opening, 2020.[25] Jonathan Fintzi, Damon Bayer, Isaac Goldstein, Keith Lumbard, Emily Ricotta, SarahWarner, Lindsay M. Busch, Jeffrey R. Strich, Daniel S. Chertow, Daniel M. Parker, Ber-nadette Boden-Albala, Alissa Dratch, Richard Chhuon, Nichole Quick, Matthew Zahn,and Vladimir N. Minin. Using multiple data streams to estimate and forecast sars-cov-2transmission dynamics, with application to the virus spread in orange county, california,2020.[26] Sarah KEFAYATI, Hu Huang, Prithwish Chakraborty, Fred Roberts, Vishrawas Gopalakrish-nan, Raman Srinivasan, Sayali Pethe, Piyush Madan, Ajay Deshpande, Xuan Liu, JianyingHu, and Gretchen Jackson. On machine learning-based short-term adjustment of epidemi-ological projections of covid-19 in us. medRxiv , 2020.[27] Lars Lorch, William Trouleau, Stratis Tsirtsis, Aron Szanto, Bernhard Sch¨olkopf, andManuel Gomez-Rodriguez. A Spatiotemporal Epidemic Model to Quantify the Effects ofContact Tracing, Testing, and Containment. arXiv:2004.07641 [physics, q-bio, stat] , 2020.[28] Luca Ferretti, Chris Wymant, Michelle Kendall, Lele Zhao, Anel Nurtay, Lucie Abeler-D¨orner, Michael Parker, David Bonsall, and Christophe Fraser. Quantifying SARS-CoV-2transmission suggests epidemic control with digital contact tracing.
Science , 368(6491), May2020.[29] Marc Mezard and Andrea Montanari.
Information, physics, and computation . OxfordUniversity Press, 2009.[30] Judea Pearl. Reverend bayes on inference engines: a distributed hierarchical approach. In
Proceedings of the Second AAAI Conference on Artificial Intelligence , pages 133–136, 1982.1231] Fabrizio Altarelli, Alfredo Braunstein, Luca Dall’Asta, and Riccardo Zecchina. Largedeviations of cascade processes on graphs.
Physical Review E , 87(6):062115, June 2013.[32] F. Altarelli, A. Braunstein, L. Dall’Asta, and R. Zecchina. Optimizing spread dynamicson graphs by message passing.
Journal of Statistical Mechanics: Theory and Experiment ,2013(09):P09011, September 2013. 13 . Brief description of the OpenABM model
The OpenABM computational model by R. Hinch et al. [21] involves simulating epidemicpropagation in an age-stratified population based on the UK national census data (see Table 1 in[21]). Each individual is represented by a node of a multi-layered network and takes part in threedifferent subnets describing different social contexts: two static subnets represent householdsand workplaces while a degree-heterogeneous random network, different every day, represents theoccasional interactions which individuals have on a daily basis. Age stratification influences boththe composition of households (the elderly tend to live with other elderly, children preferablylive with young adults) and the social activity level of individuals (e.g. participation in theworkplace network, average number of random daily interactions). The epidemic states consideredby the model are those discussed above, apart from the exposed state, which is not explicitlymodeled. Hospitalization and/or death is possible only for severely symptomatic patients, whilethe resistant state is eventually reached in case of no or mild symptoms. The epidemic dynamicsis modeled as a discrete-time stochastic process, with a temporal resolution of one day, in whichinfected individuals can transmit the disease with a (daily) infection rate which depends mainlyon the symptomatic state of the potential infector, the age of the potential infected and the timepassed since the potential infector became infected. The latter dependence, modeled through aGamma distribution, is an attempt to describe the temporal variation of the infectiousness levelof SARS-CoV-2. In particular, the incubation period (usually represented by the E state), is hereimplicitly taken into account by assuming an infectiousness level which does not immediatelygrow from zero. Finally, the infection rate depends on an intrinsic infectiousness level of thevirus and on the type of network on which the contact occurs. Although the notion of durationof contacts is not present, the infectiousness rate associated with contacts inside households islarger than for the other environments, to account for the typically longer duration of domesticinteractions. With the exception of the viral transmission process just described, all otherpossible transitions between individual epidemic states (e.g. transition from mild symptomaticinfected state to the resistant state) are independent of the state of the other individuals. Theseevents are characterized by a (discrete) waiting time, also distributed according to a Gammadistribution or, in case of dichotomous support, according to a shifted Bernoulli distribution.The parameters of these distributions have been extracted from recent SARS-CoV-2 literatureand are summarized in Tables 5-6-7 of Hinch et al. [21].The model provides the possibility of intervention in order to slow down and, if possible,contain the epidemic outbreak. In particular, it is possible to introduce interventions of increasingseverity, from case-based measures (e.g. quarantine for individuals which are positive to swab testsand their housemates) to mobility restrictions for some categories of individuals and lockdownscenarios. In this respect, the OpenABM model is very appropriate for the implementationof contact tracing strategies. Finally, the OpenABM model also provides for the possibilityof varying the adoption fraction of the contact tracing app within the population, possiblyintroducing different percentages of adoption in different age groups of individuals.
B. Mean-Field Inference of Risk
The mean field inference is based on a prior epidemic model which is an agent-based SIR model.At each time, an individual i is in either one of the three states: Susceptible, Infected, Removed, x i ( t ) ∈ { S, I, R } (where the ”Removed” state means either dead, or recovered and having acquiredimmunity).When going from time t to time t + 1, the following events can take place: • If x i ( t ) = R : x i ( t + 1) = R . • If x i ( t ) = I : we call µ i the probability that an infected individual i recovers:14 x i ( t + 1) = I with probability 1 − µ i x i ( t + 1) = R with probability µ i • If x i ( t ) = S : One looks at all the individuals j which are infected and have been in contactwith i at time t (in practice this means between day t and day t + 1). Define this set ofindividuals as ∂i ( t ). Each individual j in ∂i ( t ) infects i with a probability λ j → i ( t ). Moreprecisely: (cid:40) x i ( t + 1) = I with probability 1 − (cid:81) j ∈ ∂i ( t ) (1 − λ j → i ( t )) x i ( t + 1) = S with probability (cid:81) j ∈ ∂i ( t ) (1 − λ j → i ( t ))To resume, the parameters of the model are: • µ i : The probability of removal of the infectious patient i . • λ i → j ( t ): The transmission probability given that there was a contact between an infected i and susceptible j at time t . This depends on i, j and on t , depending on the durationand nature of contacts between i and j between day t and t + 1 transmission probabilitychanges with contact time, as well as the sanitary measures, such as wearing masks.From these rules one obtains the propagation of the epidemic p (cid:0) x ti | x t − ∂i , x t − i (cid:1) as defined ineq. (1).In order to understand and monitor the propagation of the epidemic, and to develop mitigatingstrategies, it is important to evaluate the probabilities that individual j is in state S , I or R at agiven time t . We denote these probabilities respectively by P jS ( t ), P jI ( t ), P jR ( t ).These marginal probabilities are in general difficult to evaluate. A straightforward strategy forthis evaluation is to simulate a large number of instances of the propagation, and estimate P jS ( t )as the fraction of instances where individual j is in state S at time t . However this requiresa centralized system, and a large computing power. Here, instead, we use some approximatetechniques from statistical physics, that allow for a good estimate of P jS ( t ) through a fullydistributed method, using only simple exchange of information at each contact.The Mean-Field (MF) method computes the marginal probabilities of (1) through an iterativeprocess. The probability of individual j receiving the infection from her contact k at time t depends on their contact transmission λ k → j ( t ) and on the joint probability of j being S and k being I at time t . The mean field approximation estimates this joint probability by theproduct P jS ( t ) P kI ( t ). Using this approximation, one can write the probability that individual j issusceptible at time t + 1 as P jS ( t + 1) = P jS ( t ) (cid:89) k ∈ ∂j ( t ) (cid:16) − P kI ( t ) λ k → j ( t ) (cid:17) . (5)The probability of being recovered is P jR ( t + 1) = P jR ( t ) + µ j P jI ( t ) (6)and the probability of being infected is obtained using the fact that P S + P R + P I = 1. Inpractice, considering that the probability of transmission is small, our MF algorithm is based onthe following linearized form for P S (and we have checked that this linearization is fine in theregimes of epidemic propagation that we explore): P jS ( t + 1) = P jS ( t ) − (cid:88) k ∈ ∂j ( t ) P kI ( t ) λ k → j ( t ) ,P jR ( t + 1) = P jR ( t ) + µ j P jI ( t ) ,P jI ( t + 1) = P jI ( t ) + P jS ( t ) (cid:88) k ∈ ∂j ( t ) P kI ( t ) λ k → j ( t ) − µ j P jI ( t ) . (7)15he mean-field equations have an intuitive content which is easy to understand. They basicallyreproduce in an agent-dependent model the equations used for the global monitoring of theproportions of S,I,R states in a population. They can also be derived as a limiting case of thedynamical message passing equations from [10], when the transmission probabilities are small.This approach offers several advantages. Every individual j can estimate her probabilities P jS ( t ), P jI ( t ), P jR ( t ) every day, by updating the equations (7). These probabilities can be storedin her phone. For the update, individual j needs to receive, during the contact with k , theinformation on λ k → j ( t ) and the information from k about his estimates of P kS ( t ), P kI ( t ), P kR ( t ).The value of λ k → j ( t ) is the standard contact-tracing information, which estimates the encounterduration within a certain distance, as used in all contact tracing applications that are beingdeveloped, for instance based on bluetooth signals between the phones of j and k . On top ofthis, the phone of individual k should send the values of P kS ( t ) and P kI ( t ) to individual j duringthe contact, and reciprocally. The information is fully distributed, there is no need for a centralsystem that stores the full information, and the data exchange can be encrypted.We suppose that, at time t obs an individual i is tested or presents illness-associated syndromes.Then the state of i is known: x obsi ( t ) ∈ { S, I, R } and P iq ( t obs ) = δ q,q obsi ( t ) . In case of syndromes attime t obs the probability P iq ( t obs ) is updated on the basis of external medical data, namely theprobability to be infected among all people presenting the same set of syndromes.A simple inference method that turns out to be quite efficient consists in adapting the mean-fieldequations (7) in order to take into account the results of tests and symptoms. The informationabout tests and syndroms must be propagated back in time and be used to update the risk levelsof the contacts of person i in recent times. Assume that we are estimating the probabilitiesfor each individual i to be in each of the three states q at a given time t , P iq ( t ). We run themean-field equations (7) starting at time t − t MF with the whole population S , and imposingthe constraints due to the tests done in the interval [ t − t MF , t ] as follows. If j is tested at atime t obs in this interval, then:if x j ( t obs ) = S : P jS ( t (cid:48) ) = 1 for t (cid:48) ∈ [ t − t MF , t obs ] (8)if x j ( t obs ) = I : P jI ( t (cid:48) ) = 1 for t obs − τ ≤ t (cid:48) ≤ t obs (9)if x j ( t obs ) = R : P jR ( t (cid:48) ) = 1 for t (cid:48) ≥ t obs (10)Our inference procedure depends on two parameters: τ is the typical time between the infectionand the testing consecutive to the apparition of syndroms, and t MF is the integration time ofthe mean-field procedure. C. Belief Propagation Inference of Risk
C.1. Graphical model setting
Each pair i, j of individuals will be in mutual contact in a finite set of instants X ij ⊂ R ∞ = R ∪ { + ∞} . For reasons that will become clear in the following, we will always assume ∞ ∈ X ij .As time advances, instantaneous contagion will happen with probability λ at time s ij ∈ X ij if i is infected and j is susceptible. We will assume λ = λ s ij ,t i ij, to possibly depend both on thespecific (absolute) contact time s ij , on the direction of the contact and on the time t i of infectionof individual i . Individual i can thus become infected in one instant in the set X i = ∪ j ∈ ∂i X ij .We will denote by t i ∈ X i , r i ∈ R respectively the times of infection and recovery of individual i ,with t i = ∞ (resp. r i = ∞ ) if the individual did not become infected (resp. recovered) withinthe time-frame.We will assume the recovery delay r i − t i of node i to be distributed with a continuousdistribution with pdf p R,i ( r i − t i ). We will assume a set of factorized, site-dependent observations p O,i ( O i | t i , r i ). Model parameters will be hidden for the moment inside functions p R,i , λ s,tij , and p O,i and we will include them only later to avoid cluttering the notation.16he standard SIR model can be obtained by setting p R,i ( r i − t i ) = µe − µ ( r i − t i ) , λ s,tij ≡ λ . Inthis setup, the model is memory-less (Markov) on the state of infection variables x ti ∈ { S, I, R } .In the following we will always assume that t i ∈ X i and s ij , s ji ∈ X ij . Given the times ofinfection and recovery t i and r i , the transmission delay s ij has “truncated” generalized geometricdistribution S ij ( s ij | t i , r i ) = I [ t i < s ij < r i ] λ s ij ,t i ij (cid:89) t i
C.2. Belief propagation equations
A naive interpretation of (14) as a graphical model would introduces many unneeded short cyclesthat were not present in the original contact network. For example, pairs ( t i , s ji ) , ( t i , s ij ) , ( t j , s ij ) , ( t j , s ji )share respectively factors with indices i, ( ij ) , j, ( ji ) , effectively forming a small cycle. A simplesolution consists in regrouping factors as in (15) and considering ( s ij , s ji ) as a single variable:Ψ i (cid:0) t i , r i , { s ki , s ik } k ∈ ∂i (cid:1) = δ (cid:0) t i , min k ∈ ∂ ∗ i s ki (cid:1) A i ( s i ∗ ) R i ( r i − t i ) (cid:89) j ∈ ∂i S ij ( s ij | t i , r i ) (15) A particularly interesting case is with γ i = γ → γ ti = 0 for t >
0: in this case individuals can beself-infected only at time 0, representing a closed system with a single unknown seed at time t = 0. s ij , s ji ) have degree two and live inthe middle of the original edges, and vars t i , r i have degree 1, i.e. a topology that closely followsthe one of the original contact network: p ( t , r , s |O ) = 1 Z (cid:89) i Ψ i (16)The corresponding BP equations for Ψ i are m ij ( s ij , s ji ) ∝ (cid:88) t i (cid:88) r i p O,i ( O i | t i , r i ) A i ( s i ∗ ) R i ( r i − t i ) S ij ( s ij | t i , r i ) × (17) × (cid:88) { s ki } δ (cid:0) t i , min k ∈ ∂ ∗ i s ki (cid:1) (cid:89) k ∈ ∂ ∗ i \ j S ik ( s ik | t i , r i ) m ki ( s ki , s ik )and marginals for t i are b i ( t i ) ∝ (cid:88) r i p O,i ( O i | t i , r i ) A i ( s i ∗ ) R i ( r i − t i ) S ij ( s ij | t i , r i ) × (18) × (cid:88) { s ki } δ (cid:0) t i , min k ∈ ∂ ∗ i s ki (cid:1) (cid:89) k ∈ ∂ ∗ i S ik ( s ik | t i , r i ) m ki ( s ki , s ik ) (19)and similarly for r i . A more efficient computation of the equations can be achieved by defining: G k ( t i , r i ) = (cid:88) s ki ≥ t i s ik >t i S ik ( s ik | t i , r i ) m ki ( s ki , s ik ) G k ( t i , r i ) = (cid:88) s ki >t i s ik >t i S ik ( s ik | t i , r i ) m ki ( s ki , s ik )and substituting the extra neighbor message m i ∗ t i ( s i ∗ t i , s ii ∗ t ) = (cid:40) γ ti s i ∗ t i = t, s ii ∗ t = ∞ − γ ti s i ∗ t i = ∞ , s ii ∗ t = ∞ m ij ( s ij , s ji ) ∝ (cid:88) t i (cid:88) r i p O,i ( O i | t i , r i ) A i ( s i ∗ ) R i ( r i − t i ) S ij ( s ij | t i , r i ) × (20) × (cid:88) { s ki } (cid:89) j ∈ ∂ ∗ i I [ s ki ≥ t i ] − (cid:89) j ∈ ∂ ∗ i I [ s ki > t i ] (cid:89) k ∈ ∂ ∗ i \ j S ik ( s ik | t i , r i ) m ki ( s ki , s ik ) ∝ (cid:88) t i
In the BP-based epidemic tracing scheme, exchanged messages between two individuals growquadratically with the number of temporal contacts occurred between them. However, onlyrecent contacts are important to determine marginal probabilities at present time, thereforekeeping only a short time window (about two or three weeks) is sufficient to obtain quasi-optimalresults. For better accuracy, information about contacts and observations at the dropped times isincluded approximately as simple factorized priors applied at the start of the window. This priorcontains the posterior probability at the first non-dropped time computed only using contacts andobservations at the dropped time (and the prior computed in the previous step). All simulationshave been performed using a 21 days time window.
C.4. Algorithm Parameters
For the OpenABM model, we chose to use Gamma distributions for the recovery density p R.i = p R and a rescaled Gamma for the infection transmissivity λ s ij ,t i ij = p I ( s ij − t i ). The five parameterswere fitted from experimental data produced by the model (parameters could in principle bealso learned or adjusted online during the process through an approximate maximum likelihoodprocedure [13], but we leave this for future work). Note that the model used for inference withBP is still much simplified with respect to OpenABM itself, in particular having only three states(against 11 in OpenABM). As a consequence, results are only weakly sensitive to the parameters.The used values were Gamma( k = 10, µ = 0 .
57) for p R and Gamma( k =5.76, µ = 0 .
20 40 60 80 100Days10 N u m b e r o f i n f e c t e d
250 obs.
RGCTMFBP N u m b e r o f i n f e c t e d
500 obs.
RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP N u m b e r o f i n f e c t e d
250 obs. HH
RGCTMFBP N u m b e r o f i n f e c t e d
500 obs. HH
RGCTMFBP N u m b e r o f i n f e c t e d RGCTMFBP
Figure 5: Effect of the control strategy on the epidemic spreading. In all the panels we show thenumber of infected individuals in a time window of 100 days when some intervention isapplied starting from day 7. The number of patients zero here is set to 20. Thin linesrepresent the results for single instances of the epidemics, while the think line is theaverage among the different realizations. We compare the effect of an increasing numberof available medical tests per day (from left to right), performed to the individuals atrisk suggested by the app. The top panels depict a scenario where only tested positiveindividuals are confined, limiting their contacts to the cohabitants, while the bottompanels show how the number of infected individuals change if the entire household isquarantined whenever an infected is detected.a scale of 0.25 (multiplied by 2 for intra-household contacts as with MF inference) for λ . Theself-infection probability was chosen to be p seed = 1 /N at time t = 0 ( k/N where k is the numberof patient zeros would bring slightly better results, but would use inaccessible information) and0 for t > t > δ rank for the computation of theranking was chosen to be 10 days. D. Additional Results
In this section, we present additional results to stress how the containment measures associatedwith the inference-based methods proposed in the main text, are effective in limiting the epidemics.We consider a realistic spreading dynamics given by the OpenABM model [21] and, unlike thesetting illustrated in the main text, we study the case in which the restrictions are applied earlierin time (i.e. after a week from the beginning of the epidemics) and the size of the epidemics atthe initial time is smaller (the number of patients zero here is 20), being also consistent with anearlier intervention scenario.In Figure 5, we display the behavior of the number of infected individuals as a function oftime, similarly to Figure 2 of the main text. Qualitatively, we retrieve the same behavior we haveobserved in Figure 2: inference-based ranking allows for a more effective intervention resulting ina remarkable decrease of the number of infected individuals, and when the number of tests issufficiently large, the epidemics are stopped in slightly more than two months. Quantitatively, wenotice that to control the spreading a reduced number of daily medical tests are needed (aboutten times less than those used for the results in Figure 2), suggesting that early intervention isequally effective with a more parsimonious usage of testing resources.Figure 6 suggests how robust the containment measures are when the medical tests are20
20 40 60 80 100Days10 N u m b e r o f i n f e c t e d FNR 0.09
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.15
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.19
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.25
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.31
RGCT MFBP N u m b e r o f i n f e c t e d FNR 0.4
RGCT MFBP
Figure 6: Effect of tests inaccuracy to the evolution of the controlled epidemics. We simulate thesame intervention protocol as Figure 5 for 1000 daily observations (bottom panel). Weconsider here the effects of an additional source of noise, that is a non negligible falsenegative rate (FNR) of the results of the medical tests, from 0.09 to 0.40. N u m b e r o f i n f e c t e d AF 0.9 randomtracing MFBP N u m b e r o f i n f e c t e d AF 0.8 randomtracing MFBP N u m b e r o f i n f e c t e d AF 0.7 randomtracing MFBP N u m b e r o f i n f e c t e d AF 0.6 randomtracing MFBPrandomtracing MFBP