[PDF] A machine learning aided global diagnostic and comparative tool to assess effect of quarantine control in Covid-19 spread

Abstract

We have developed a globally applicable diagnostic Covid-19 model by augmenting the classical SIR epidemiological model with a neural network module. Our model does not rely upon previous epidemics like SARS/MERS and all parameters are optimized via machine learning algorithms employed on publicly available Covid-19 data. The model decomposes the contributions to the infection timeseries to analyze and compare the role of quarantine control policies employed in highly affected regions of Europe, North America, South America and Asia in controlling the spread of the virus. For all continents considered, our results show a generally strong correlation between strengthening of the quarantine controls as learnt by the model and actions taken by the regions' respective governments. Finally, we have hosted our quarantine diagnosis results for the top 70 affected countries worldwide, on a public platform, which can be used for informed decision making by public health officials and researchers alike.

Full PDF

AA machine learning aided global diagnostic and comparativetool to assess eﬀect of quarantine control in Covid-19 spread

Raj Dandekar , Chris Rackauckas , and George Barbastathis Department of Civil and Environmental Engineering, Massachusetts Institute ofTechnology, Cambridge, MA 02139, USA Department of Applied Mathematics, Massachusetts Institute of Technology,Cambridge, MA 02139, USA Department of Mechanical Engineering, Massachusetts Institute of Technology,Cambridge, MA 02139, USA Singapore-MIT Alliance for Research and Technology (SMART) Centre,Singapore 138602July 28, 2020

Article Summary Line:

Data-driven epidemiological model to quantify and compare quarantinecontrol policies in controlling COVID-19 spread in Europe, North America, South America andAsia.

Running Title:

Machine Learning aided quarantine model - Covid19.

Keywords:

COVID, Machine Learning, Epidemiology

We have developed a globally applicable diagnostic Covid-19 model by augmenting the classicalSIR epidemiological model with a neural network module. Our model does not rely upon previousepidemics like SARS/MERS and all parameters are optimized via machine learning algorithmsemployed on publicly available Covid-19 data. The model decomposes the contributions to theinfection timeseries to analyze and compare the role of quarantine control policies employed inhighly aﬀected regions of Europe, North America, South America and Asia in controlling thespread of the virus. For all continents considered, our results show a generally strong correlationbetween strengthening of the quarantine controls as learnt by the model and actions taken by theregions’ respective governments. Finally, we have hosted our quarantine diagnosis results for thetop 70 aﬀected countries worldwide, on a public platform, which can be used for informed decisionmaking by public health oﬃcials and researchers alike.

The Coronavirus respiratory disease 2019 originating from the virus “SARS-CoV-2”

1, 2 has led to aglobal pandemic, leading to 12 , ,

765 conﬁrmed global cases in more than 200 countries as of July12, 2020. As the disease began to spread beyond its apparent origin in Wuhan, the responses oflocal and national governments varied considerably. The evolution of infections has been similarlydiverse, in some cases appearing to be contained and in others reaching catastrophic proportions.1 a r X i v : . [ phy s i c s . s o c - ph ] J u l n Hubei province itself, starting at the end of January, more than 10 million residents were quar-antined by shutting down public transport systems, train and airport stations, and imposing policecontrols on pedestrian traﬃc. Subsequently, similar policies were applied nation-wide in China.By the end of March, the rate of infections was reportedly receding. By the end of February 2020, the virus began to spread in Europe, with Italy employing ex-traordinary quarantine measures starting 11 March 2020. France enforced a lockdown beginning 17March followed later by UK on 23 March; whereas no lockdown was enforced in Sweden. SouthKorea, Iran and Spain experienced acute initial increases, but then adopted drastic generalizedquarantine. In the United States, the ﬁrst infections were detected in Washington State as early as20 th January 2020 and now it is being reported that the virus had been circulating undetected inNew York City as early as mid-February. Federal and state government responses were compar-atively delayed and variable, with most states having stay at home orders declared by the end ofMarch. In South America, Brazil, Chile and Peru are the highest aﬀected countries as of 12 Julyand they employed diﬀering quarantine policies. Brazil’s ﬁrst case was reported in the last weekof February and the country went into a state of partial quarantine on 24 March. Chile declareda state of catastrophe for 90 days in the ﬁrst week of March, and the military was deployed toenforce quarantine measures. In Peru, a nationwide curfew was employed much later, on March19. Thus, aﬀected countries around the world enforced diﬀering quarantine strategies in an eﬀortto mitigate the virus spread.Given the available Covid-19 data for the infected case count by country and world-wide, it isseen that the infection growth curve also showed signiﬁcantly diverse behaviour globally. In somecountries, the infected case count peaked within a month and showed a subsequent decline, whilein certain other countries, it was seen to increase for much longer before plateauing. In some ofthe highly aﬀected countries, the infected count has not yet reached a plateau and the daily activecases continue to increase or remain stagnant as of 12 July 2020.Given the observed spatially and temporally diverse government responses and outcomes, therole played by the varying quarantine measures in diﬀerent countries in shaping the infectiongrowth curve is still not clear. With publicly available Covid-19 data by country and world-wideby now widely available, there is an urgent need to use data-driven approaches to bridge this gap,quantitatively estimate and compare the role of the quarantine policy measures implemented inseveral countries in curtailing spread of the disease.As of this writing, more than a 100 papers have been made available, mostly in preprint form.Existing models have one or more of the following limitations: • Lack of independent estimation: Using parameters based on prior knowledge of SARS/MERScoronavirus epidemiology and not derived independently from the Covid-19 data or param-eters like rate of detection, nature of government response ﬁxed prior to running the model. • Lack of global applicability: Not implemented on a global scale. • Lack of interpretibility: Using several free/ﬁtting parameters making it a cumbersome, com-plicated model to reciprocate and use by policy makers. In this paper, we propose a globally scalable, interpretable model with completely independent pa-rameter estimation through a novel approach: augmenting a ﬁrst principles-derived epidemiologicalmodel with a data-driven module, implemented as a neural network. We leverage this model toquantify the quarantine strengths and analyze and compare the role of quarantine control policiesemployed to control the virus eﬀective reproduction number in the European, North Amer-ican, South American and Asian continents. In a classical and commonly used model, known asSEIR, the population is divided into the susceptible S , exposed E , infected I and recovered R groups, and their relative growths and competition are represented as a set of coupled ordi-nary diﬀerential equations. The simpler SIR model does not account for the exposed population E . These models cannot capture the large-scale eﬀects of more granular interactions, such as thepopulation’s response to social distancing and quarantine policies. However, a major assumptionof these models is that the rate of transitions between population states is ﬁxed. In our approach,2 a)(b) Figure 1: (a) Schematic of the augmented QSIR model considered in the present study. (b) Schematic ofthe neural network architecture used to learn the quarantine strength function Q ( t ) . we relax this assumption by estimating the time-dependent quarantine eﬀect on virus exposureas a neural network informs the infected variable I in the SIR model. This trained model thusdecomposes the eﬀects and the neural network encodes information about the quarantine strengthfunction in the locale where the model is trained.In general, neural networks with arbitrary activation functions are universal approximators. Unbounded activation functions, in particular, such as the rectiﬁed linear unit (ReLU) has beenknown to be eﬀective in approximating nonlinear functions with a ﬁnite set of parameters.

Thus, a neural network solution is attractive to approximate quarantine eﬀects in combinationwith analytical epidemiological models. The downside is that the internal workings of a neuralnetwork are diﬃcult to interpret. The recently emerging ﬁeld of Scientiﬁc Machine Learning exploits conservation principles within a universal diﬀerential equation, SIR in our case, to mit-igate overﬁtting and other related machine learning risks.In the present work, the neural network is trained from publicly available infection and popu-lation data for Covid-19 for a speciﬁc region under study; details are in the Experimental Proce-dures section. Thus, our proposed model is globally applicable and interpretable with parameterslearned from the current Covid-19 data, and does not rely upon data from previous epidemics likeSARS/MERS.

The classic SIR epidemiological model is a standard tool for basic analysis concerning the outbreakof epidemics. In this model, the entire population is divided into three sub-populations: susceptible S ; infected I ; and recovered R . The sub-populations’ evolution is governed by the following system3f three coupled nonlinear ordinary diﬀerential equationsd S ( t ) d t = − β S ( t ) I ( t ) N (1)d I ( t ) d t = β S ( t ) I ( t ) N − γI ( t ) (2)d R ( t ) d t = γI ( t ) . (3)Here, β is the infection rate and γ is the recovery rates, respectively, and are assumed to beconstant in time. The total population N = S ( t ) + I ( t ) + R ( t ) is seen to remain constant as well;that is, births and deaths are neglected. The recovered population is to be interpreted as thosewho can no longer infect others; so it also includes individuals deceased due to the infection. Thepossibility of recovered individuals to become reinfected is accounted for by SEIS models, butwe do not use this model here, as the reinfection rate for Covid-19 survivors is considered to benegligible as of now. The reproduction number R t in the SEIR and SIR models is deﬁned as R t = βγ. (4)An important assumption of the SIR models is homogeneous mixing among the subpopulations.Therefore, this model cannot account for social distancing or or social network eﬀects. Additionallythe model assumes uniform susceptibility and disease progress for every individual; and that nospreading occurs through animals or other non-human means. Alternatively, the SIR model maybe interpreted as quantifying the statistical expectations on the respective mean populations, whiledeviations from the model’s assumptions contribute to statistical ﬂuctuations around the mean. To study the eﬀect of quarantine control globally, we start with the SIR epidemiological model.Figure 1a shows the schematic of the modiﬁed SIR model, the QSIR model, which we consider.We augment the SIR model by introducing a time varying quarantine strength rate term Q ( t ) and a quarantined population T ( t ) , which is prevented from having any further contact with thesusceptible population. Thus, the term I ( t ) denotes the infected population still having contactwith the susceptibles, as done in the standard SIR model; while the term T ( t ) denotes the infectedpopulation who are eﬀectively quarantined and isolated. Thus, we can write an expression for thequarantined infected population T ( t ) as T ( t ) = Q ( t ) × I ( t ) (5)Further we introduce an additional recovery rate δ which quantiﬁes the rate of recovery of thequarantined population. Based on the modiﬁed model, we deﬁne a Covid spread parameter in asimilar way to the reproduction number deﬁned in the SIR model (4) as C p ( t ) = βγ + δ + Q ( t ) . (6) C p > C p < Q ( t ) does not follow from ﬁrst principles and is highly dependent on local quarantine policies,we devised a neural network-based approach to approximate it.Recently, it has been shown that neural networks can be used as function approximators torecover unknown constitutive relationships in a system of coupled ordinary diﬀerential equa-tions.

30, 32

Following this principle, we represent Q ( t ) as a n layer-deep neural network withweights W , W . . . W n , activation function r and the input vector U = ( S ( t ) , I ( t ) , R ( t )) as Q ( t ) = r ( W n r ( W n − . . . r ( W U ))) ≡ NN ( W, U ) (7)4or the implementation, we choose a n = S ( t ) d t = − β S ( t ) I ( t ) N (8)d I ( t ) d t = β S ( t ) I ( t ) N − ( γ + Q ( t )) I ( t ) == β S ( t ) I ( t ) N − ( γ + NN ( W, U )) I ( t ) (9)d R ( t ) d t = γI ( t ) + δT ( t ) (10)d T ( t ) d t = Q ( t ) I ( t ) = NN ( W, U ) I ( t ) − δT ( t ) . (11)More details about the model initialization and parameter estimation methods is given in the Ex-perimental Procedures section.In all cases considered below, we trained the model using data starting from the dates when the500 th infection was recorded in each region and up to June 1 2020. In each subsequent case study, Q ( t ) denotes the rate at which infected persons are eﬀectively quarantined and isolated from theremaining population, and thus gives composite information about (a) the eﬀective testing rate ofthe infected population as the disease progressed and (b) the intensity of the enforced quarantine asa function of time. To understand the nature of evolution of Q ( t ) , we look at the time point when Q ( t ) approximately shows an inﬂection point, or a ramp up point. An inﬂection point in Q ( t ) indicates the time when the rate of increase of Q ( t ) i.e dQ ( t )/ dt was at its peak while a ramp uppoint corresponds to a sudden intensiﬁcation of quarantine policies employed in the region underconsideration.We deﬁne the quarantine eﬃciency, Q eﬀ as the increase in Q ( t ) within a month following thedetection of the 500 th infected case in the region under consideration. Thus Q eﬀ = Q ( ) − Q ( ) (12)The magnitude of Q eﬀ shows how rapidly the infected individuals were prevented from cominginto contact with the susceptibles in the ﬁrst month following the detection of the 500 th infectedcase; and thus contains composite information about the quarantine and lockdown strength; andthe testing and tracing protocols to identify and isolate infected individuals. Figure 2 shows the comparison of the model-estimated infected and recovered case counts withactual Covid-19 data for the highest aﬀected European countries as of 1 June 2020, namely: Rus-sia, UK, Spain and Italy, in that order. We ﬁnd that irrespective of a small set of optimizedparameters (note that the contact rate β and the recovery rate γ are ﬁxed, and not functions oftime), a reasonably good match is seen in all four cases.Figure 3 shows the evolution of the neural network learnt quarantine strength function Q ( t ) for the considered European nations. Inﬂection points in Q ( t ) are seen for UK, Spain and Italyat 14, 10 and 16 days, respectively, post detection of the 500 th case i.e on 23 th March, 15 th Marchand 14 th March, respectively. This is in good agreement with nationwide quarantine imposed on25 th March, 14 th March and 9 th March in UK, Spain and Italy, respectively.

5, 33, 34

Figure 16a shows the comparison of the contact rate β , quarantine eﬃciency as deﬁned in thebeginning of this subsection and the recovery rate γ . It should be noted that the contact and5

20 40 60

Days post 500 infected (a) Russia

Days post 500 infected (b) UK

Days post 500 infected (c) Spain

Days post 500 infected (d) Italy

Figure 2: COVID-19 infected and recovered evolution compared with our neural network augmented modelprediction in the highest aﬀected European countries as of June 1, 2020.

Days post 500 infected Q ( t ) Quarantine strengthNationwide stay at home imposed (a) Russia

Days post 500 infected Q ( t ) Quarantine strengthGovernment Lockdown imposedInflection point in learnt Q(t) (b) UK

Days post 500 infected Q ( t ) Quarantine strengthGovernment Lockdown imposedInflection point in learnt Q(t) (c) Spain

Days post 500 infected Q ( t ) Quarantine strengthGovernment Lockdown imposedInflection point in learnt Q(t) (d) Italy

Figure 3: Quarantine strength Q ( t ) learned by the neural network in the highest aﬀected Europeancountries as of June 1, 2020. The transition from the red to blue shaded region indicates the Covid spreadparameter of value C p < Q ( t ) plot denoted by the red dashed line. For regions inwhich a clear inﬂection or ramp up point is not seen (Russia), the red dashed line is not shown.

10 20 30 40 50 60

Days post 500 infected C p C p =1 (a) Russia Days post 500 infected C p C p =1 (b) UK Days post 500 infected C p C p =1 (c) Spain Days post 500 infected C p C p =1 (d) Italy Figure 4: Control of COVID-19 quantiﬁed by the Covid spread parameter evolution in the highest aﬀectedEuropean countries as of June 1, 2020. The transition from the red to blue shaded region indicates C p < recovery rates are assumed to be constant in our model, in the duration spanning the detectionof the 500 th infected case and June 1 st , 2020. The average contact rate in Spain and Italy isseen to be higher than Russia and UK over the considered duration of 2 − Although the social distancing strength also varied with time, we do notfocus on that aspect in the present study, and will be the subject of future studies. A higherquarantine eﬃciency combined with a higher recovery rate led Spain and Italy to bring downthe Covid spread parameter (deﬁned in (6)), C p from > < ,

25 days. respectively, ascompared to 32 days for UK and 42 days for Russia (ﬁgure 4).

Figure 5 shows Q eﬀ for the 23 highest aﬀected European countries. We can see that Q eﬀ in thewestern European regions is generally higher than eastern Europe. This can be attributed to thestrong lockdown measures implemented in western countries like Spain, Italy, Germany, Franceafter the rise of infections seen ﬁrst in Italy and Spain. Although countries like Switzerland andTurkey didn’t enforce a strict lockdown as compared to their west European counterparts, theywere generally successful in halting the infection count before reaching catastrophic proportions,due to strong testing and tracing protocols.

37, 38

Subsequently, these countries also managedto identify potentially infected individuals and prevented them from coming into contact withsusceptibles, giving them a high Q eﬀ score as seen in ﬁgure 5. In contrast, our study also managesto identify countries like Sweden which had very limited lockdown measures; with a low Q eﬀ scoreas seen in ﬁgure 5. This strengthens the validity of our model in diagnosing information about theeﬀectiveness of quarantine and isolation protocols in diﬀerent countries; which agree well with theactual protocols seen in these countries. 7 a) Figure 5: (a) Quarantine eﬃciency, Q eﬀ deﬁned in (12) for the 23 highest aﬀected European countries. Notethat Q eﬀ contains composite information about the quarantine and lockdown strength; and the testing andtracing protocols to identify and isolate infected individuals. Map also shows the demarcation betweencountries with a high Q eﬀ shown by a green dotted line and those with a low Q eﬀ shown by a red dottedline. Days post 500 infected

Data: InfectedPredictionData: RecoveredPrediction (a) New York

Days post 500 infected (b) New Jersey

Days post 500 infected (c) Illinois

Days post 500 infected (d) California

Figure 6: COVID-19 infected and recovered evolution compared with our neural network augmented modelprediction in the highest aﬀected USA states as of June 1, 2020.

20 40 60

Days post 500 infected Q ( t ) Quarantine strengthStay at home imposedInflection point in learnt Q(t) (a) New York

Days post 500 infected Q ( t ) Quarantine strengthStay at home imposedInflection point in learnt Q(t) (b) New Jersey

Days post 500 infected Q ( t ) Quarantine strengthStay at home imposedInflection point in learnt Q(t) (c) Illinois

Days post 500 infected Q ( t ) Quarantine strengthStay at home imposedRamp up point in learnt Q(t) (d) California

Figure 7: Quarantine strength Q ( t ) learned by the neural network in the highest aﬀected USA states asof June 1, 2020. The transition from the red to blue shaded region indicates the Covid spread parameterof value C p < Q ( t ) plot denoted by the red dashed line. Days post 500 infected C p C p =1 (a) New York Days post 500 infected C p C p =1 (b) New Jersey Days post 500 infected C p C p =1 (c) Illinois Days post 500 infected C p C p =1 (d) California Figure 8: Control of COVID-19 quantiﬁed by the Covid spread parameter evolution in the highest aﬀectedUSA states as of June 1, 2020. The transition from the red to blue shaded region indicates C p < .4 USA Figure 6 shows reasonably good match between the model-estimated infected and recovered casecounts with actual Covid-19 data for the highest aﬀected North American states (including statesfrom Mexico, the United States, and Canada) as of 1 June 2020, namely: New York, New Jersey,Illinois and California. Q ( t ) for New York and New Jersey show a ramp up point immediately inthe week following the detection of the 500 th case in these regions, i.e. on 19 March for New Yorkand on 24 March for New Jersey (ﬁgure 7). This matches well with the actual dates: 22 Marchin New York and 21 March in New Jersey when stay at home orders and isolation measures wereenforced in these states. A relatively slower rise of Q ( t ) is seen for Illinois while California showinga ramp up post a week after detection of the 500 th case. Although no signiﬁcant diﬀerence is seenin the mean contact and recovery rates between the diﬀerent US states, the quarantine eﬃciencyin New York and New Jersey is seen to be signiﬁcantly higher than that of Illinois and California(ﬁgure 16b), indicating the eﬀectiveness of the rapidly deployed quarantine interventions in NewYork and New Jersey. Owing to the high quarantine eﬃciency in New York and New Jersey,these states were able to bring down the Covid spread parameter, C p to less than 1 in 19 days(ﬁgure 8). On the other hand, although Illinois and California reached close to C p = C p still remained greater than 1 (ﬁgure 8), indicating thatthese states were still in the danger zone as of June 1, 2020. An important caveat to this result isthe reporting of the recovered data.Comparing with Europe, the recovery rates seen in North America are signiﬁcantly lower (ﬁg-ures 16a,b). It should be noted that accurate reporting of recovery rates is likely to play a majorrole in this apparent diﬀerence. In our study, the recovered data include individuals who cannotfurther transmit infection; and thus includes treated patients who are currently in a healthy stateand also individuals who died due to the virus. Since quantiﬁcation of deaths can be done in arobust manner, the death data is generally reported more accurately. However, there is no cleardeﬁnition for quantifying the number of people who transitioned from infected to healthy. As aresult, accurate and timely reporting of recovered data is seen to have a signiﬁcant variation be-tween countries, under reporting of the recovered data being a common practice. Since the eﬀectivereproduction number calculation depends on the recovered case count, accurate data regarding therecovered count is vital to assess whether the infection has been curtailed in a particular region ornot. Thus, our results strongly indicate the need for each country to follow a particular metric forestimating the recovered count robustly, which is vital for data driven assessment of the pandemicspread. Figure 9a shows the quarantine eﬃciency for 20 major US states spanning the whole country.Figure 9b shows the comparison between a report published in the Wall Street Journal on May21 highlighting USA states based on their lockdown conditions, and the quarantine eﬃciencymagnitude in our study. The size of the circles represent the magnitude of the quarantine eﬃciency.The blue color indicate the states for which the quarantine eﬃciency was greater than the meanquarantine eﬃciency across all US states, while those in red indicate the opposite. Our resultsindicate that the north-eastern and western states were much more responsive in implementingrapid quarantine measures in the month following early detection; as compared to the southern andcentral states. This matches the on-ground situation as indicated by a generally strong correlationis seen between the red circles in our study (states with lower quarantine eﬃciency) and the yellowregions seen in in the Wall Street Journal report (states with reduced imposition of restrictions)and between the blue circles in our study (states with higher quarantine eﬃciency) and the blueregions seen in the Wall Street Journal report (states with generally higher level of restrictions).This strengthens the validity of our approach in which the quarantine eﬃciency is recovered througha trained neural network rooted in fundamental epidemiological equations.10 Y N J M I P A F L G A C A M A T X I L M D O K U T A Z N E W A O H O R C O S D (a)(b) Figure 9: (a) Quarantine eﬃciency, Q eﬀ deﬁned in (12) for 20 major USA states. Note that Q eff containscomposite information about the quarantine and lockdown strength; and the testing and tracing protocolsto identify and isolate infected individuals. (b) Comparison between a report published in the Wall StreetJournal on May 21 and the quarantine eﬃciency magnitude in our study. A generally strong correlationis seen between the magnitude of quarantine eﬃciency in our study and the level of restrictions actuallyimposed in diﬀerent USA states.

20 40 60

Days post 500 infected (a) India

Days post 500 infected (b) China

Days post 500 infected (c) South Korea

Figure 10: COVID-19 infected and recovered evolution compared with our neural network augmentedmodel prediction in the highest aﬀected Asian countries as of June 1, 2020.

Days post 500 infected Q ( t ) Quarantine strengthSecond phase of Government Lockdown (a) India

Days post 500 infected Q ( t ) Quarantine strengthGovernment Lockdown imposedRamp up point in learnt Q(t) (b) China

Days post 500 infected Q ( t ) Quarantine strengthWidespread testing, isolation and tracingRamp up point in learnt Q(t) (c) South Korea

Figure 11: Quarantine strength Q ( t ) learnt by the neural network in the highest aﬀected Asian countriesas of June 1, 2020. The transition from the red to blue shaded region indicates the Covid spread parameterof value C p < Q ( t ) plot denoted by the red dashed line. For regions in which a clearinﬂection or ramp up point is not seen (India), the red dashed line is not shown.

10 20 30 40 50 60

Days post 500 infected C p C p =1 (a) India Days post 500 infected C p C p =1 (b) China Days post 500 infected C p C p =1 (c) South Korea Figure 12: Control of COVID-19 quantiﬁed by the Covid spread parameter evolution in the highest aﬀectedAsian countries as of June 1, 2020. The transition from the red to blue shaded region indicates C p < Figure 10 shows reasonably good match between the model-estimated infected and recovered casecount with actual Covid-19 data for the highest aﬀected Asian countries as of 1 June 2020, namely:India, China and South Korea. Q ( t ) shows a rapid ramp up in China and South Korea (ﬁgure 11)which agrees well with cusps in government interventions which took place in the weeks leadingto and after the end of January and February for China and South Korea respectively. Onthe other hand, a slow build up of Q ( t ) is seen for India, with no signiﬁcant ramp up. Thisis reﬂected in the quarantine eﬃciency comparison (ﬁgure 16c), which is much higher for Chinaand South Korea compared to India. South Korea shows a signiﬁcantly lower contact rate thanits Asian counterparts, indicating strongly enforced and followed social distancing protocols. No signiﬁcant diﬀerence in the recovery rate is observed between the Asian countries. Owing tothe high quarantine eﬃciency in China and a high quarantine eﬃciency coupled with stronglyenforced social distancing in South Korea, these countries were able to bring down the Covidspread parameter C p from > < Figure 13 shows reasonably good match between the model-estimated infected and recovered casecount with actual Covid-19 data for the highest aﬀected South American countries as of 1 June2020, namely: Brazil, Chile and Peru. For Brazil, Q ( t ) is seen to be approximately constant ≈ Q ( t ) is seen to stagnate (ﬁgure14a). The key diﬀerence between the Covid progression in Brazil compared to other nations isthat the infected and the recovered (recovered healthy + dead in our study) count is very close toone another as the disease progressed (ﬁgure 13). Owing to this, as the disease progressed, the newinfected people introduced in the population were balanced by the infected people removed fromthe population, either by being healthy or deceased. This higher recovery rate combined with agenerally low quarantine eﬃciency and contact rate (ﬁgure 16d) manifests itself in the Covid spreadparameter for Brazil to be < Q ( t ) is almost constant for the entire duration considered (ﬁgure 14b). Thus, althoughgovernment regulations were imposed swiftly following the initial detection of the virus, leading toa high initial magnitude of Q ( t ) , the government imposition became subsequently relaxed. This13

20 40 60

Days post 500 infected (a) Brazil

Days post 500 infected (b) Chile

Days post 500 infected (c) Peru

Figure 13: COVID-19 infected and recovered evolution compared with our neural network augmentedmodel prediction in the highest aﬀected South American countries as of June 1, 2020.

Days post 500 infected Q ( t ) Quarantine strengthQuarantine imposed in big cities (a) Brazil

Days post 500 infected Q ( t ) Quarantine strengthNationwide curfew imposed (b) Chile

Days post 500 infected Q ( t ) Quarantine strengthNationwide quarantine announced (c) Peru

Figure 14: Quarantine strength Q ( t ) learnt by the neural network in the highest aﬀected South Americancountries as of June 1, 2020. The transition from the red to blue shaded region indicates the Covid spreadparameter of value C p <

20 40 60

Days post 500 infected C p C p =1 (a) Brazil Days post 500 infected C p C p =1 (b) Chile Days post 500 infected C p C p =1 (c) Peru Figure 15: Control of COVID-19 quantiﬁed by the Covid spread parameter evolution in the highest aﬀectedSouth American countries as of June 1, 2020. The transition from the red to blue shaded region indicates C p < maybe attributed to several political and social factors outside the scope of the present study. Even for Chile, the infected and recovered count remain close to each other compared to othernations. A generally high quarantine magnitude coupled with a moderate recovery rate (ﬁgure16d) leads to C p being < Q ( t ) shows a very slow build up (ﬁgure 14c) with a very low magnitude. Also, the recovered count islower than the infected count compared to its South American counterparts (ﬁgure 13c). A lowquarantine eﬃciency coupled with a low recovery rate (ﬁgure 16d) leads Peru to be in the dangerzone ( C p >

1) for 48 days post detection of the 500 th case (ﬁgure 15c). Our model captures the infected and recovered counts for highly aﬀected countries in Europe,North America, Asia and South America reasonably well, and is thus globally applicable. Alongwith capturing the evolution of infected and recovered data, the novel machine learning aided epi-demiological approach allows us to extract valuable information regarding the quarantine policies,the evolution of Covid spread parameter C p , the mean contact rate (social distancing eﬀectiveness),and the recovery rate. Thus, it becomes possible to compare across diﬀerent countries, with themodel serving as an important diagnostic tool.Our results show a generally strong correlation between strengthening of the quarantine con-trols, i.e. increasing Q ( t ) as learnt by the neural network model; actions taken by the regions’respective governments; and decrease of the Covid spread parameter C p for all continents consid-ered in the present study.Based on the Covid-19 data collected (details in the Materials and Methods section), we notethat accurate and timely reporting of recovered data is seen to have a signiﬁcant variation betweencountries; with under reporting of the recovered data being a common practice. In the NorthAmerican countries, for example, the recovered data are signiﬁcantly lower than its European andAsian counterparts. Thus, our results strongly indicate the need for each country to follow a par-ticular metric for estimating the recovered count robustly, which is vital for data driven assessmentof the pandemic spread. 15 . Russia 2. UK 3. Spain 4. Italy0.00.20.40.60.8 Contact rate: Quarantine efficiency: Q eff Recovery rate: + (a) Europe

1. NY 2. NJ 3. Illinois 4. CA0.000.250.500.751.00 Contact rate: Quarantine efficiency: Q eff Recovery rate: + (b) USA

1. India 2. China 3. Korea0.000.250.500.751.00 Contact rate: Quarantine efficiency: Q eff Recovery rate: + (c) Asia

1. Brazil 2. Chile 3. Peru0.000.250.500.751.00 Contact rate: Quarantine efficiency: Q eff Recovery rate: + (d) South America

Figure 16: Global comparison of infection, recovery rates and quarantine eﬃciency. This could bethe subject of future studies.

The starting point t = i.e. I ≈ t = T ( t ) is initialized to a smallnumber T ( t = ) ≈ The time resolved data for the infected, I data and recovered, R data for each locale considered isobtained from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.The neural network-augmented SIR ODE system was trained by minimizing the mean square errorloss function L NN ( W, β, γ, δ ) = ∣∣ log ( I ( t ) + T ( t )) − log ( I data ( t ))∣∣ + ∣∣ log ( R ( t )) − log ( R data ( t ))∣∣ (13)that includes the neural network’s weights W . For most of the regions under consideration, W, β, γ, δ were optimized by minimizing the loss function given in (13). Minimization was employedusing local adjoint sensitivity analysis

32, 46 following a similar procedure outlined in a recent study with the ADAM optimizer with a learning rate of 0 .

01. The iterations required for convergencevaried based on the region considered and generally ranged from 40 , − , W, β, γ, δ . In the ﬁrst stage, (13) was minimized. For the second stage, we ﬁxthe optimal γ, δ found in the ﬁrst stage to optimize for the remaining parameters:

W, β based onthe loss function deﬁned just on the infected count as L ( W, β ) = ∣∣ log ( I ( t ) + T ( t )) − log ( I data ( t ))∣∣ .In the second stage, we don’t include the recovered count R ( t ) in the loss function, since R ( t ) depends on γ, δ which have already been optimized in the ﬁrst stage. By placing more emphasison minimizing the infected count, such a two stage procedure leads to much more accurate modelestimates; when the recovered data count is low. The iterations required for convergence in bothstages varied based on the region considered and generally ranged from 30 , − , <

1% for allregions considered.Preliminary versions of this work can be found at medRxiv 2020.04.03.20052084 and arXiv:2004.02752 . Data for the infected and recovered case count in all regions was obtained from the Center for Sys-tems Science and Engineering (CSSE) at Johns Hopkins University. All code ﬁles are available athttps://github.com/RajDandekar/MIT-Global-COVID-Modelling-Project-1. All results are pub-licly hosted at https://covid19ml.org/ or https://rajdandekar.github.io/COVID-QuarantineStrength/.

This eﬀort was partially funded by the Intelligence Advanced Reseach Projects Activity (IARPA.)We are grateful to Emma Wang for help with some of the simulations, and to Haluk Akay,Hyungseok Kim and Wujie Wang for helpful discussions and suggestions.

The authors declare no conﬂicts of interest.

References [1] Chan, J. F.-W, Yuan, S, Kok, K.-H, To, K. K.-W, Chu, H, Yang, J, Xing, F, Liu, J, Yip, C.C.-Y, Poon, R. W.-S, et al. (2020) A familial cluster of pneumonia associated with the 2019novel coronavirus indicating person-to-person transmission: a study of a family cluster. TheLancet 395, 514-523.[2] CDC. (2020) Coronavirus Disease 2019 (COVID-19) Situation Summary, 3 March 2020.[3] WHO. (2020) Coronavirus disease 2019 (COVID-19) Situation Report - 174, 12 July 2020.[4] Cyranoski, D. (2020) What china’s coronavirus response can teach the rest of the world.Nature.[5] Gibney, E. (2020) Whose coronavirus strategy worked best? Scientists hunt most eﬀectivepolicies. Nature News ( ).[6] Holshue, M. L, DeBolt, C, Lindquist, S, Lofy, K. H, Wiesman, J, Bruce, H, Spitters, C,Ericson, K, Wilkerson, S, Tural, A, et al. (2020) First case of 2019 novel coronavirus in theunited states. New England Journal of Medicine.[7] Carey, B & Glanz, J. (2020) Hidden Outbreaks Spread Through U.S. Cities Far Earlier ThanAmericans Knew, Estimates Say. The New York Times ( ).[8] Report. (2020) Coronavirus in Latin America: What governments are doingto stop the spread. Global Americans ( https://theglobalamericans.org/2020/03/coronavirus-in-latin-america/ ).[9] Bertsimas, D, Bandi, H, Boussioux, L, Cory-Wright, R, Delarue, A, Digalakis, V, Gilmour,S, Graham, J, Kim, A, Lahlou Kitane, D, Lin, Z, Lukin, G, Li, M, Mingardi, L, Na, L, Or-fanoudaki, A, Papalexopoulos, T, Paskov, I, Pauphilet, J, Skali Lami, M, Sobiesk, B, Stellato,B, Carballo, Y, Wang, H, Wiberg, C, & Zeng, C. (2020) An aggregated dataset of clinicaloutcomes for covid-19 patients. covid analytics ( ).1810] Chinazzi, M, Davis, J. T, Ajelli, M, Gioannini, C, Litvinova, M, Merler, S, y Piontti, A. P,Mu, K, Rossi, L, Sun, K, et al. (2020) The eﬀect of travel restrictions on the spread of the2019 novel coronavirus (covid-19) outbreak. Science.[11] Li, M. L, Bouardi, H. T, Lami, O. S, Trikalinos, T. A, Trichakis, N. K, & Bertsimas, D.(2020) Forecasting covid-19 and analyzing the eﬀect of government interventions ( https://doi.org/10.1101/2020.06.23.20138693 ).[12] Kraemer, M. U, Yang, C.-H, Gutierrez, B, Wu, C.-H, Klein, B, Pigott, D. M, Du Plessis, L,Faria, N. R, Li, R, Hanage, W. P, et al. (2020) The eﬀect of human mobility and controlmeasures on the covid-19 epidemic in china. Science 368, 493-497.[13] Ferguson, N, Laydon, D, Nedjati Gilani, G, Imai, N, Ainslie, K, Baguelin, M, Bhatia, S,Boonyasiri, A, Cucunuba Perez, Z, Cuomo-Dannenburg, G, et al. (2020) Impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand. Im-perial College London.[14] Imai, N, Cori, A, Dorigatti, I, Baguelin, M, Donnelly, C. A, Riley, S, & Ferguson, N. M. (2020)Report 3: transmissibility of 2019-nCov. Imperial College London.[15] Read, J. M, Bridgen, J. R, Cummings, D. A, Ho, A, & Jewell, C. P. (2020) Novel coronavirus2019-ncov: early estimation of epidemiological parameters and epidemic predictions ().[16] Tang, B, Wang, X, Li, Q, Bragazzi, N. L, Tang, S, Xiao, Y, & Wu, J. (2020) Estimationof the transmission risk of the 2019-nCov and its implication for public health interventions.Journal of Clinical Medicine 9, 462.[17] Li, Q, Guan, X, Wu, P, Wang, X, Zhou, L, Tong, Y, Ren, R, Leung, K. S, Lau, E. H, Wong,J. Y, et al. (2020) Early transmission dynamics in Wuhan, China, of novel coronavirus–infectedpneumonia. New England Journal of Medicine.[18] Wu, J. T, Leung, K, & Leung, G. M. (2020) Nowcasting and forecasting the potential domesticand international spread of the 2019-nCov outbreak originating in Wuhan, China: a modellingstudy. The Lancet 395, 689-697.[19] Kucharski, A. J, Russell, T. W, Diamond, C, Liu, Y, Edmunds, J, Funk, S, Eggo, R. M, Sun,F, Jit, M, Munday, J. D, et al. (2020) Early dynamics of transmission and control of covid-19:a mathematical modelling study. The Lancet Infectious Diseases.[20] Fang, H, Chen, J, & Hu, J. (2006) Modelling the sars epidemic by a lattice-based monte-carlosimulation. IEEE 27, 7470-7473.[21] Saito, M. M, Imoto, S, Yamaguchi, R, Sato, H, Nakada, H, Kami, M, Miyano, S, & Higuchi, T.(2013) Extension and veriﬁcation of the seir model on the 2009 inﬂuenza a (h1n1) pandemicin japan. Mathematical biosciences 246, 47-54.[22] Smirnova, A, deCamp, L, & Chowell, G. (2019) Forecasting epidemics through nonparametricestimation of time-dependent transmission rates using the seir model. Bulletin of mathematicalbiology 81, 4343-4365.[23] Cybenko, G. (1989) Approximations by superpositions of sigmoidal functions. Mathematicsof Control, Signals and Systems 2, 303-314.[24] Hornik, K. (1991) Approximation capabilities of multilayer feedforward networks. NeuralNetworks.[25] Sonoda, S & Murata, N. (2017) Neural network with unbounded activation functions isuniversal approximator. Appl. Comp. Harmonic Anal. 43, 233-268.[26] Glorot, X, Bordes, A, & Bengio, Y. (2011) Deep sparse rectiﬁer neural networks. Proc. 14thInternational Conference on Artiﬁcial Intelligence and Statistics, 315-323.1927] Goodfellow, I, Warde-Farley, D, Mirza, M, Courville, A, & Bengio, Y. (2013) Maxout net-works. 30th int. conf. mach. learn., 1319-1327.[28] Dahl, G. E, Sainath, T. N, & Hinton, G. E. (2013) Improving deep neural networks for LVCSRusing rectiﬁed linear units and dropout. IEEE Acoustics, Speech and Signal Processing, 8609-8613.[29] Baker, N, Alexander, F, Bremer, T, Hagberg, A, Kevrekidis, Y, Najm, H, Parashar, M, Patra,A, Sethian, J, Wild, S, et al. (2019) Workshop report on basic research needs for scientiﬁcmachine learning: Core technologies for artiﬁcial intelligence. USDOE Washington.[30] Rackauckas, C, Ma, Y, Martensen, J, Warner, C, Zubov, K, Supekar, R, Skinner, D, &Ramadhan, A. (2020) Universal Diﬀerential Equations for Scientiﬁc Machine Learning(arXiv:2001.04385).[31] Mukhopadhyay, B & Bhattacharyya, R. (2008) Analysis of a spatially extended nonlinear seisepidemic model with distinct incidence for exposed and infectives. Nonlinear Analysis: RealWorld Applications 9, 585-598.[32] Rackauckas, C, Innes, M, Ma, Y, Bettencourt, J, White, L, & Dixit, V. (2019) Diﬀeqﬂux.jl- A Julia Library for Neural Diﬀerential Equations. CoRR abs/1902.02376 ( http://arxiv.org/abs/1902.02376 ).[33] Jones, S. (2020) Spain orders nationwide lockdown to battle coronavirus. The Guardian,March 14.[34] Harlan, C & Pitrelli, S. (2020) Italy extends coronavirus lockdown to entire country, imposingrestrictions on 60 million people. The Washington Post.[35] Helm, T, Graham-Harrison, E, & Mckie, R. (2020) How did Britain get its coro-navirus response so wrong? Guardian ( ).[36] DW. (2020) Coronavirus: What are the lockdown measures across europe?. DW ( ).[37] O’Dea, C. (2020) What Switzerland did right in the battle against coronavirus. MarketWatch,June 15.[38] Guerin, O. (2020) Coronavirus: How Turkey took control of covid-19 emergency. BBC News( ).[39] Goodman, S. (2020) Sweden Has Become the Worlds Cautionary Tale. The New York Times( ).[40] Maxouris, C. (2020) These states have some of the most drastic restrictionsto combat the spread of coronavirus. CNN ( ).[41] Gershman, J. (2020) A Guide to State Coronavirus Reopenings and Lockdowns. The WallStreet Journal, May 21.[42] Normille, D. (2020) Coronavirus cases have dropped sharply in south korea.whats the secret to its success?. science ( ).[43] Thompson, D. (2020) Whats Behind South Koreas COVID-19 Exceptionalism?. The Atlantic( ). 2044] Romo, R. (2020) Politics and poverty hinder Covid-19 response inLatin America. CNN, May 29 ( ).[45] Apple. (2020) Mobility trend report. ( ).[46] Cao, Y, Li, S, Petzold, L, & Serban, R. (2003) Adjoint sensitivity analysis for diﬀerential-algebraic equations: The adjoint dae system and its numerical solution. SIAM journal onscientiﬁc computing 24, 1076-1089.[47] Kingma, D. P & Ba, J. (2014) Adam: A method for stochastic optimization( arXivpreprintarXiv:1412.6980arXivpreprintarXiv:1412.6980