[PDF] COVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model

Abstract

Motivation: Early detection and isolation of COVID-19 patients are essential for successful implementation of mitigation strategies and eventually curbing the disease spread. With a limited number of daily COVID19 tests performed in every country, simulating the COVID-19 spread along with the potential effect of each mitigation strategy currently remains one of the most effective ways in managing the healthcare system and guiding policy-makers. We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region and provides suggestions on what strength the upcoming mitigation measure should be. The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e.g., climate, temperature, humidity) and mitigation measures. Results: Using Switzerland as a case study, COVIDHunter estimates that the policy-makers need to keep the current mitigation measures for at least 30 days to prevent demand from quickly exceeding existing hospital capacity. Relaxing the mitigation measures by 50% for 30 days increases both the daily capacity need for hospital beds and daily number of deaths exponentially by an average of 23.8x, who may occupy ICU beds and ventilators for a period of time. Unlike existing models, the COVIDHunter model accurately monitors and predicts the daily number of cases, hospitalizations, and deaths due to COVID-19. Our model is flexible to configure and simple to modify for modeling different scenarios under different environmental conditions and mitigation measures. Availability: this https URL

Full PDF

BBioinformatics doi.10.1093/bioinformatics/xxxxxxAdvance Access Publication Date: Day Month YearManuscript Category

Subject Section

COVIDHunter: An Accurate, Flexible, andEnvironment-Aware Open-Source COVID-19Outbreak Simulation Model

Mohammed Alser ∗ , Jeremie S. Kim, Nour Almadhoun Alserr, Stefan W. Tell,and Onur Mutlu ∗ ETH Zurich, Zurich 8006, Switzerland ∗ To whom correspondence should be addressed.

Associate Editor: XXXXXXX

Received on XXXXX; revised on XXXXX; accepted on XXXXX

Abstract

Motivation:

Early detection and isolation of COVID-19 patients are essential for successful implementationof mitigation strategies and eventually curbing the disease spread. With a limited number of daily COVID-19 tests performed in every country, simulating the COVID-19 spread along with the potential effect of eachmitigation strategy currently remains one of the most effective ways in managing the healthcare systemand guiding policy-makers. We introduce

COVIDHunter , a ﬂexible and accurate COVID-19 outbreaksimulation model that evaluates the current mitigation measures that are applied to a region and providessuggestions on what strength the upcoming mitigation measure should be. The key idea of COVIDHunteris to quantify the spread of COVID-19 in a geographical region by simulating the average number of newinfections caused by an infected person considering the effect of external factors, such as environmentalconditions (e.g., climate, temperature, humidity) and mitigation measures.

Results:

Using Switzerland as a case study, COVIDHunter estimates that the policy-makers need tokeep the current mitigation measures for at least 30 days to prevent demand from quickly exceedingexisting hospital capacity. Relaxing the mitigation measures by 50% for 30 days increases both the daily capacity need for hospital beds and daily number of deaths exponentially by an average of . × , whomay occupy ICU beds and ventilators for a period of time. Unlike existing models, the COVIDHuntermodel accurately monitors and predicts the daily number of cases, hospitalizations, and deaths due toCOVID-19. Our model is ﬂexible to conﬁgure and simple to modify for modeling different scenarios underdifferent environmental conditions and mitigation measures. Availability: https://github.com/CMU-SAFARI/COVIDHunter

Contact: [email protected], [email protected]

Supplementary information:

Supplementary data is available at

Bioinformatics online.

Coronavirus disease 2019 (COVID-19) is caused by SARS-CoV-2 virus,which was ﬁrst detected in Wuhan, the capital city of Hubei Province inChina, in early December 2019 (Du Toit, 2020). Since then, it has rapidlyspread to nearly every corner of the globe and has been declared a pandemicin March 2020 by the World Health Organization (WHO). As of January2021, COVID-19 has since resulted in more than 96 million laboratory-conﬁrmed cases around the world, and has killed nearly 2.2% of theinfected population. As there are currently no anti-SARS-CoV-2-speciﬁcdrugs or effective vaccines widely available to everyone, early detectionand isolation of COVID-19 patients remain essential for effectively curbingthe disease spread. As a result, many countries across the world haveimplemented unprecedented lockdown and social distancing measures,affecting millions of people. Regardless of the availability and affordability of COVID-19 testing, it is still extremely challenging to detect and isolateCOVID-19 infections at early stages due to three key issues. 1) It is verydifﬁcult to accurately identify the initial contraction time of COVID-19for a patient. This is because COVID-19 patients can develop symptomsbetween 2 to 14 days (or longer in a few cases) after exposure to thenew coronavirus (Lauer et al. , 2020; Li et al. , 2020). This variable delay isreferred to as the virus’ incubation period . 2) The coronavirus genome canexhibit rapid genetic changes in its nucleotide sequence, which may occurduring viral cell replication, within the host body, or during transmissionbetween hosts (Andersen et al. , 2020). This genetic diversity affectsthe virus virulence, infectivity, transmissibility, and evasion of the hostimmune responses (Phan, 2020; Pachetti et al. , 2020; Toyoshima et al. ,2020). 3) The situation becomes even worse as the coronavirus can surviveand therefore remain infectious outside the host, on common surfacessuch as metal, glass, and banknotes (both paper and polymer) at roomtemperature for up to 28 days (Kampf et al. , 2020; Riddell et al. , 2020). a r X i v : . [ q - b i o . P E ] F e b Alser et al.

Simulating the spread of COVID-19 has the potential to mitigatethe effects of the three key issues, help to better manage the healthcaresystem, and provide guidance to policy-makers on the effectiveness ofvarious (current, planned or discussed) social distancing and mitigationmeasures. To this end, many COVID-19 simulation models are proposed(e.g., (Tradigo et al. , 2020; Russell et al. , 2020; Ashcroft et al. , 2020)),some of which are announced to assist in decision-making for policy-makers in countries such as the United Kingdom (ICL (Flaxman et al. ,2020)), United States (IHME (Reiner et al. , 2020)), and Switzerland(IBZ (Huisman et al. , 2020)). These models tend to follow one of two keyapproaches. (1) Evaluating the current actual epidemiological situation byaccounting for reporting delays and under-reporting due to inefﬁcienciessuch as low number of COVID-19 tests. (2) Evaluating the current andfuture epidemiological situation by simulating the COVID-19 outbreak without relying on the observed (laboratory-conﬁrmed) number of casesin simulation.The ﬁrst approach, taken by the IBZ (Huisman et al. , 2020),LSHTM (Russell et al. , 2020), and (Ashcroft et al. , 2020) models, is not mainly used for prediction purposes as it reﬂects the epidemiologicalsituation with about two weeks of time delay (due to its dependence onobserved COVID-19 reports). The IBZ model (Huisman et al. , 2020)estimates the daily reproduction number, R , of SARS-CoV-2 fromobserved COVID-19 incidence time series data after accounting forreporting delays and under-reporting using the numbers of conﬁrmedhospitalizations and deaths. The R number describes how a pathogenspreads in a particular population by quantifying the average number ofnew infections caused by each infected person at a given point in time. TheLSHTM model (Russell et al. , 2020) adjusts the daily number of observedCOVID-19 cases by accounting for under-reporting (uncertainty) usingboth deaths-to-cases ratio estimates and correcting for delays betweencase conﬁrmation (i.e., laboratory-conﬁrmed infection) to death.The second approach, taken by ICL (Flaxman et al. , 2020) andIHME (Reiner et al. , 2020) models, usually requires a large number ofvarious input parameters and assumptions. IHME (Reiner et al. , 2020)model requires input parameters such as testing rates, mobility, socialdistancing policies, population density, altitude, smoking rates, self-reported contacts, and mask use. This model makes two key assumptions:1) the infection fatality rate (IFR), which indicates the rate of people thatdie from the infection is taken using data from the Diamond Princess Cruiseship and New Zealand and 2) the decreasing fatality rate is reﬂective ofincreased testing rates (identifying higher rates of asymptomatic cases).ICL (Flaxman et al. , 2020) model requires input parameters such as thedaily number of conﬁrmed deaths, IFR, mobility rates from Google, age-and country-speciﬁc data on demographics, patterns of social contact, andhospital availability. This model makes three key assumptions: 1) age-speciﬁc IFRs observed in China and Europe are the same across everycountry, 2) the number of conﬁrmed deaths is equal to the true number ofCOVID-19 deaths, and 3) the change in transmission rates is a function ofaverage mobility trends.To our knowledge, there is currently no model capable of accuratelymonitoring the current epidemiological situation and predicting futurescenarios while considering a reasonably low number of parameters andaccounting for the effects of environmental conditions, as we summarizein Table 1. The low number of parameters provides four key advantages:1) allowing ﬂexible (easy-to-adjust) conﬁguration of the model inputparameters for different scenarios and different geographical regions,2) enabling short simulation execution time and simpler modeling, 3)enabling easy validation/correction of the model prediction outcomes byadjusting fewer variables, and 4) being extremely useful and powerfulespecially during the early stages of a pandemic as many of theparameters are unknown. Simulation models need to consider the factthat the environmental conditions (e.g., air temperature) affect pathogeninfectivity (Fares, 2013; Kampf et al. , 2020; Riddell et al. , 2020; Xu et al. ,2020) and simulating this effect helps to provide accurate estimation ofthe epidemiological situation.Our goal in this work is to develop such a COVID-19 outbreaksimulation model. To this end, we introduce COVIDHunter , a simulationmodel that evaluates the current mitigation measures (i.e., non-pharmaceutical intervention or NPI) that are applied to a region andprovides insight into what strength the upcoming mitigation measure should be and for how long it should be applied, while consideringthe potential effect of environmental conditions. Our model accuratelyforecasts the numbers of infected and hospitalized patients, and deaths fora given day, as validated on historical COVID-19 data (after accountingfor under-reporting). The key idea of COVIDHunter is to quantify thespread of COVID-19 in a geographical region by calculating the dailyreproduction number, R , of COVID-19 and scaling the reproductionnumber based on changes in both mitigation measures and environmentalconditions. The R number changes during the course of the pandemicdue to the change in the ability of a pathogen to establish an infectionduring a season and mitigation measures that lead to lower number ofof susceptible individuals. COVIDHunter simulates the entire populationof a region and assigns each individual in the population to a stage of theCOVID-19 infection (e.g., from being healthy to being short-term immuneto COVID-19) based on the scaled R number. Our model is ﬂexible toconﬁgure and simple to modify for modeling different scenarios as it uses only three input parameters, two of which are time-varying parameters, tocalculate the R number. Whenever applicable, we compare the simulationoutput of our model to that of four state-of-the-art models currently usedto inform policy-makers, IBZ (Huisman et al. , 2020), LSHTM (Russell et al. , 2020), ICL (Flaxman et al. , 2020), and IHME (Reiner et al. , 2020).The contributions of this paper are as follows: • We introduce COVIDHunter, a ﬂexible and validated simulationmodel that evaluates the current and future epidemiological situationby simulating the COVID-19 outbreak. COVIDHunter accuratelyforecasts for a given day 1) the reproduction number, 2) the number ofinfected people, 3) the number of hospitalized people, 4) the numberof deaths, and 5) number of individuals at each stage of the COVID-19infection. COVIDHunter evaluates the effect of different current andfuture mitigation measures on the COVIDHunter’s ﬁve numbers. • As a case study, we statistically analyze the relationship betweentemperature and number of COVID-19 cases in Switzerland. Weﬁnd that for each 1 ◦ C rise in daytime temperature, there is a 3.67%decrease in the daily number of conﬁrmed cases. We demonstratehow considering the effect of climate (e.g., daytime temperature) onCOVID-19 spread signiﬁcantly improves the prediction accuracy. • Compared to IBZ, LSHTM, ICL, and IHME models, COVIDHunterachieves more accurate estimation, provides no prediction delay, andprovides ease of use and high ﬂexibility due to the simple modelingapproach that uses a small number of parameters. • Using COVIDHunter, we demonstrate that the spread of COVID-19 inSwitzerland is still active (i.e., R > 1.0) and curbing this spread requiresmaintaining the same strength of the currently applied mitigationmeasures for at least another 30 days. • We release the well-documented source code of COVIDHunter andshow how easy it is to ﬂexibly conﬁgure for any scenario and extendfor different measures and conditions than we account for.

The primary purpose of our COVIDHunter model is to monitor andpredict the spread of COVID-19 in a ﬂexibly-conﬁgurable and easy-to-use way, while accounting for changes in mitigation measures andenvironmental conditions over time. We employ a three-stage approach todevelop and deploy this model. (1) The COVIDHunter model predicts thedaily R value based on only three input parameters to maintain both quicksimulation and high ﬂexibility in conﬁguring these parameters. Each inputparameter is conﬁgured based on either existing research ﬁndings or user-deﬁned values. Our model allows for directly leveraging existing modelsthat study the effect of only mitigation measures (or only environmentalconditions) on the spread of COVID-19, as we show in Section 2.2. (2)The COVIDHunter model predicts the number of COVID-19 cases basedon the predicted R number. COVIDHunter simulates the entire populationof a region and labels each individual according to different stages ofthe COVID-19 infection timeline. Each stage has a different degree ofinfectiousness and contagiousness. The model simulates these stages foreach individual to maintain accurate predictions. (3) The COVIDHuntermodel predicts the number of hospitalizations and deaths based on both OVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model Table 1.

Comparison to other models used to inform government policymakers, as of January 2021.

Open Well- Accounting for Low Number ReportedModel Source Documented

Weather Changes of Parameters COVID-19 Statistics

COVIDHunter (this work) (cid:51) (cid:51) (cid:51) (cid:51) (cid:51) ( R , cases, hospitalizations, and deaths)IBZ (Huisman et al. , 2020) (cid:51) (cid:55) (cid:55) (cid:51) (cid:55) (only R )LSHTM (Russell et al. , 2020) (cid:51) (cid:55) (cid:55) (cid:51) (cid:55) (only cases)ICL (Flaxman et al. , 2020) (cid:51) (cid:51) (cid:55) (cid:55) (cid:51) ( R , cases, hospitalizations, and deaths)IHME (Reiner et al. , 2020) (cid:51) ∗ (cid:55) (cid:55) (cid:55) (cid:55) (cases, hospitalizations, and deaths) Based on each model’s GitHub page (all models are available on GitHub). ∗ The available packages are conﬁgured only for the IHME infrastructure.the predicted number of cases and the R number. Next, we explain theCOVIDHunter model in detail. The COVIDHunter model predicts the dynamic value of R for a populationat a given day while considering three key factors: 1) the transmissibilityof an infection into a susceptible host population, 2) mitigation measures(e.g., lockdown, social distancing, and isolating infected people), and 3)environmental conditions (e.g., air temperature). Our model calculates thetime-varying R number using Equation 1 as follows: R ( t ) = R ∗ (1 − M ( t )) ∗ C e ( t ) (1)The R number for a given day, t , is calculated by multiplying three terms:1) the base reproduction number ( R ) for the subject virus, 2) one minusthe mitigation coefﬁcient ( M ), for the given day t and 3) the environmentalcoefﬁcient ( C e ) for the given day t .The R number quantiﬁes the transmissibility of an infection into asusceptible host population by calculating the expected average number ofnew infections caused by an infected person in a population with no priorimmunity to a speciﬁc virus (as a pandemic virus is by deﬁnition novel to allpopulations). Hence, the R number represents the transmissibility of aninfection at only the beginning of the outbreak assuming the population isnot protected via vaccination. Unlike the R number, R number is a ﬁxedvalue and it does not depend on time. The R number is a time-dependentvariable that accounts for the population’s reduced susceptibility. The R number for the COVID-19 virus can be obtained from several existingstudies (such as in (Hilton and Keeling, 2020; Chang et al. , 2020; Shi et al. , 2020; de Souza et al. , 2020; Rahman et al. , 2020)) that estimate itby modeling contact patterns during the ﬁrst wave of the pandemic.The mitigation coefﬁcient ( M ) applied to the population is a time-dependent variable and it has a value between 0 and 1, where 1 representsthe strongest mitigation measure and 0 represents no mitigation measureapplied. In different countries, mitigation measures take different forms,such as social distancing, self-isolation, school closure, banning publicevents, and complete lockdown. These measures exhibit signiﬁcantheterogeneity and differ in timing and intensity across countries (Hale et al. , 2020; Davies et al. , 2020). Quantifying the mitigation measureson a scale from 0 to 1 across different countries is challenging. TheOxford Stringency Index (Hale et al. , 2020) maintains a twice-weekly-updated index that takes values from 0 to 100, representing the severityof nine mitigation measures that are applied by more than 160 countries.Another study (Brauner et al. , 2020) estimates the effect of only sevenmitigation measures on the R number in 41 countries. We can directly leverage such studies for calculating the mitigation coefﬁcient on a givenday after changing the scale from 0:100 to 0:1 by dividing each value of,for example, the Oxford Stringency Index by 100.The environmental coefﬁcient ( C e ) is a time-dependent variablerepresenting the effect of external environmental factors on the spreadof COVID-19 and it has a value between 0 and 2. Several related viralinfections, such as the Inﬂuenza virus, human coronavirus, and humanrespiratory, already show notable seasonality (showing peak incidencesduring only the winter (or summer) months) (Moriyama et al. , 2020;Fisman, 2012). The seasonal changes in temperature, humidity, andultraviolet light affect the pathogen infectiousness outside the host (Fares,2013; Kampf et al. , 2020; Riddell et al. , 2020; Xu et al. , 2020).However, the indoor environmental conditions are usually well-controlledthroughout the year, where human behavior and number of householdscan be the major contributor to the spread of the COVID-19 (Moriyama et al. , 2020). There are currently several studies that demonstrate thestrong dependence of the transmission of SARS-CoV-2 virus on oneor more environmental conditions, even after controlling (isolating) theimpact of mitigation measures and behavioral changes that reduce contacts.Several studies have demonstrated increased infectiousness by a country-dependent ﬁxed-rate with each 1 ◦ C fall in daytime temperature (Xieand Zhu, 2020; Prata et al. , 2020). Another study supports the sametemperature-infectiousness relationship, but it also ﬁnds that beforeapplying any mitigation measures, a one degree drop in relative humidityshows increased infectiousness by a rate lower (2.94 × less) than that oftemperature (Wang et al. , 2020).One of the most comprehensive studies that spans more than 3700locations around the world is HARVARD CRW (Xu et al. , 2020). It ﬁndsthe statistical correlation between the relative changes in the R numberand both weather (temperature, ultraviolet index, humidity, air pressure,and precipitation) and air pollution (SO2 and Ozone) after controllingthe impact of mitigation measures. The study provides a CRW Index thathas a value from 0.5 to 1.5. The percentage difference between any twoconsecutive values provided by the CRW Index represents the effect thatboth weather and air pollutants have on the R number. For example, a dropin the CRW Index by 10% in a given location points to a 10% reduction inthe R number due to weather changes and air pollutants. Our model enablesapplying any of these studies by adjusting our environmental coefﬁcient ona given day, as we experimentally demonstrate in Section 3. For example,if the COVIDHunter user chooses to consider the HARVARD CRW study,and the CRW Index shows, for example, a 10% drop compared to itsimmediately preceding data point, then the environmental coefﬁcient ofCOVIDHunter should be 0.9 so that the R value decreases by also 10%.Next, we explain how our model forecasts the number of COVID-19 casesbased on Equation 1. COVIDHunter tracks the number of infected and uninfected persons overtime by clustering the population into four main categories:

HEALTHY , INFECTED , CONTAGIOUS , and

IMMUNE . The model initially considersthe entire population as uninfected (i.e.,

HEALTHY ). For each simulatedday, the model calculates the R value using Equation 1 and decideshow many persons can be infected during that day. The day whenthe ﬁrst case of infection in a population introduced is deﬁned bythe user. For each newly infected person ( INFECTED ), the modelmaintains a counter that counts the number of days from being infected tobeing contagious (

CONTAGIOUS ). Several COVID-19 case studies showthat presymptomatic transmission can occur 1–3 days before symptomonset (Wei et al. , 2020; Slifka and Gao, 2020). COVID-19 patients candevelop symptoms mostly after an incubation period of 1 to 14 days (themedian incubation period is estimated to be 4.5 to 5.8 days) (Lauer et al. ,2020; Li et al. , 2020). We calculate the number of days of being contagiousafter being infected as a random number with a Gaussian distributionthat has user-deﬁned lowest and highest values. Each contagious personmay infect N other persons depending on mobility, population density,number of households, and several other factors (Ferguson et al. , 2020).We calculate the value of N to be a random number with a Gaussiandistribution that has the lowest value of 0 and the highest value determinedby the user. If N is greater than the R number (i.e., the target number ofinfections for that day has been reached), further infections are curtailedpreventing overestimation of N by infecting only R persons. Once thecontagious person infects the desired number of susceptible persons, thestatus of the contagious person becomes immune ( IMMUNE ). The immune

Alser et al. status indicates that the person has immunity to reinfection due to eithervaccination or being recently infected (Lumley et al. , 2020).Our model also simulates the effect of infected travelers (e.g., dailycross-border commuters within the European Union) on the value of R . These travelers can initiate the infection(s) at the beginning of thepandemic. If such infected travelers are absent (due to, for example,emergency lockdown) from the target population, the virus would dieout once the value of R decreases below 1 for a sufﬁcient period of time.Both the number and percentage of infected travelers entering a region areconﬁgurable in our model. The percentage of incoming infected travelersis not affected by the changes in the local mitigation measures, as thesetravelers were infected abroad.Our model predicts the daily number of COVID-19 cases for a givenday t , as follows: Daily _ Cases ( t ) = T INF ( t ) (cid:88) n =0 N ( n ) + U CON ( t ) (cid:88) m =0 N ( m ) (2)where T INF is the daily number of infected travelers that is a user-deﬁned variable, N () is a function that calculates the number of personsto be infected by a given person as a random number with a Gaussiandistribution, and U CON is the daily number of contagious personscalculated by our model.

There are currently two key approaches for calculating the estimatednumber of both hospitalizations and deaths due to COVID-19: 1) usinghistorical statistical probabilities, each of which is unique to each agegroup in a population (Bhatia and Klausner, 2020; Bi et al. , 2020) and 2)using historical COVID-19 hospitalizations-to-cases and deaths-to-casesratios (Kobayashi et al. , 2020). We choose to follow a modiﬁed versionof the second approach as it does not require 1) clustering the populationinto age-groups and 2) calculating the risk of each individual using thegiven probability, which both affect the complexity of the model and thesimulation time.The number of COVID-19 hospitalizations for a given day, t , can becalculated as follows: Daily _ Hospitalizations ( t ) = Daily _ Cases ( t ) ∗ X ∗ C X (3)where Daily _ Cases ( t ) is calculated using Equation 2 and X is thehospitalizations-to-cases ratio that is calculated as the average of dailyratios of the number of COVID-19 hospitalizations to the laboratory-conﬁrmed number of COVID-19 cases. As the true number of cases isunknown due to lack of population-scale testing, it is extremely difﬁcult tomake accurate estimates of the true number of COVID-19 hospitalizations.As such, we assume a ﬁxed multiplicative relationship between the numberof laboratory-conﬁrmed cases and the true number of cases. We use theuser-deﬁned correction coefﬁcient, C X , of the hospitalizations-to-casesratio to account for such a multiplicative relationship.The number of COVID-19 deaths for a given day t can be calculatedas follows: Daily _ Deaths ( t ) = Daily _ Cases ( t ) ∗ Y ∗ C Y (4)where Daily _ Cases ( t ) is calculated using Equation 2 and Y is thedeaths-to-cases ratio, which is calculated as the average of daily ratios ofthe number of COVID-19 deaths to the number of COVID-19 laboratory-conﬁrmed cases. The observed number of COVID-19 deaths can still beless than the true number of COVID-19 deaths due to, for example, under-reporting. We use the user-deﬁned correction coefﬁcient, C Y , to accountfor the under-reporting. One way to ﬁnd the true number of COVID-19deaths is to calculate the number of excess deaths. The number of excessdeaths is the difference between the observed number of deaths during timeperiod and expected (based on historical data) number of deaths during thesame time period. For this reason, C Y may not necessarily be equal to C X . We can validate our model using two key approaches. 1) Comparing thedaily R number predicted by our model (using Equation 1) with the dailyreported ofﬁcial R number for the same region. 2) Comparing the dailynumber of COVID-19 cases predicted by our model (using Equation 2)with the daily number of laboratory-conﬁrmed COVID-19 cases. As ofJanuary 2021, we have already witnessed one year of the pandemic, whichprovides us several observations and lessons. The most obvious sourceof uncertainty, affecting all models, is that the true number of personsthat are previously infected or currently infected is unknown (Wilke andBergstrom, 2020). This affects the accuracy of the reported R numbersince it is calculated as, for example, the ratio of the number of cases for aweek (7-day rolling average) to the number of cases for the precedingweek. Adjusting the parameters of our model to ﬁt the curve of thenumber of conﬁrmed cases is likely to be highly uncertain. The publicly-available number of COVID-19 hospitalizations and deaths can providemore reliable data.For these reasons, we decide to use a combination of reported numbersof cases, hospitalizations, and deaths for validating our model using threekey steps. 1) We leverage the more reliable data of reported number ofhospitalizations (or deaths) to estimate the true number of COVID-19cases using the ratio of number of laboratory-conﬁrmed hospitalizations(or deaths) to the number of laboratory-conﬁrmed cases during the secondwave of the COVID-19 pandemic. We assume that the COVID-19 statisticsduring the second wave is more accurate than that during the ﬁrst wavebecause generally more testing is performed in the second wave. 2) Weconsider a multiplicative relationship between the true number of COVID-19 cases and that estimated in step 1. In our experimental evaluation(Section 3), we use the true number of COVID-19 cases calculated usingdifferent multiplicative factor values (we refer to them as certainty ratelevels ) as a ground-truth for validating our model. A certainty rate of, forexample, 50% means that the true number of COVID-19 cases is actually double that calculated in step 1. 3) We use our model to calculate both thedaily R number (using Equation 1) and the number of COVID-19 cases(using Equation 2). We ﬁx the two terms of Equation 1, R and C e , usingpublicly-available data for a given region and change the third term, M ,until we ﬁt the curve of the number of cases predicted by our model to theground-truth plot calculated in step 2. We use the same methodology tovalidate our predicted numbers of hospitalizations and deaths with differentcertainty rate levels as we show in Section 3 and the Supplementary ExcelFile . We especially build COVIDHunter model to be ﬂexible to conﬁgure andeasy to extend for representing any existing or future scenario usingdifferent values of the three terms of Equation 1, 1) R , 2) M ( t ) , 3) C e ( t ) , in addition to several other parameters such as the population,number of travelers, percentage of expected infected travelers to the totalnumber of travelers, and hospitalizations- or deaths-to-cases ratios. Ourmodeling approach acts across the overall population without assumingany speciﬁc age structure for transmission dynamics. It is still possible toconsider each age group separately using individual runs of COVIDHuntermodel simulation, each of which has its own parameter values adjustedfor the target age group. The COVIDHunter model considers eachlocation independently of other locations, but it also accounts for potentialmovement between locations by adjusting the corresponding parametersfor travelers. By allowing most of the parameters to vary in time, t ,the COVIDHunter model is capable of accounting for any change intransmission intensity due to changes in environmental conditions andmitigation measures over time. As we explain in Section 2.2, the ﬂexibilityof conﬁguring the environmental coefﬁcient and mitigation coefﬁcientallows our proposed model to control for location-speciﬁc differences inpopulation density, cultural practices, age distribution, and time-variantmitigation responses in each location. Our modeling approach considersa single strain of the COVID-19 virus by using a single base reproduction https://github.com/CMU-SAFARI/COVIDHunter/blob/main/Evaluation_Results/SimulationResultsForSwitzerland.xlsx OVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model number, R . It is possible to consider multiple virus strains by runningthe model simulation multiple times, each of which considers one of thestrains individually. The model can be extended to consider multiple virusstrains by replacing the R number by multiple R numbers that representthe different strains (Reichmuth et al. , 2021). We evaluate the daily 1) R number, 2) mitigation measures, and 3)numbers of COVID-19 cases, hospitalizations, deaths. We also evaluate thedaily numbers of HEALTHY , INFECTED , CONTAGIOUS , and

IMMUNE in the Supplementary Excel File . We compare the predicted values totheir corresponding observed values whenever possible. We provide acomprehensive treatment of all datasets, models, and evaluation resultswith different model conﬁgurations in the Supplementary Materials andthe Supplementary Excel Files . We use Switzerland as a use-case for all the experiments. However, ourmodel is not limited to any speciﬁc region as the parameters it uses arecompletely conﬁgurable. To predict the R number, we use Equation 1 thatrequires three key variables. We set the base reproduction number, R , forthe SARS-CoV-2 in Switzerland as 2.7, as shown in (Hilton and Keeling,2020). We choose two main approaches for setting the value of the time-varying environmental coefﬁcient variable ( C e ). 1) Performing statisticalanalysis for the relationship between the daily number of COVID-19cases and average daytime temperature in Switzerland. As we providein the Supplementary Materials, Section 1, our statistical analysis showsthat each 1 ◦ C rise in daytime temperature is associated with a 3.67%( t -value = -3.244 and p -value = 0.0013) decrease in the daily numberof conﬁrmed COVID-19 cases. We refer to this approach as Cases-Temperature Coefﬁcient (CTC). 2) Applying the HARVARD CRW (Xu et al. , 2020) (CRW in short), which provides the statistical relationshipbetween the relative changes in the R number and both weather factorsand air pollutants after controlling for the impact of mitigation measures.We change the daily mitigation coefﬁcient, M ( t ) , value based on theratio of number of conﬁrmed hospitalizations to the number of conﬁrmedcases with two certainty rate levels of 100% and 50%, as we explain indetail in Section 2.5. This helps us to take into account uncertainty in theobserved number of COVID-19 cases, hospitalizations, and deaths. We setthe minimum and maximum incubation time for SARS-CoV-2 as 1 and 5days, respectively, as 5-day period represents the median incubation periodworldwide (Lauer et al. , 2020; Li et al. , 2020). We set the population to8654622. We empirically choose the values of N , the number of travelers,and the ratio of the number of infected travelers to the total number oftravelers to be 25, 100, and 15%, respectively. As the exact true number of COVID-19 cases remains unknown (dueto, for example, lack of population-scale COVID-19 testing), we expectthe true number of COVID-19 cases in Switzerland to be higher thanthe observed (laboratory-conﬁrmed) number of cases. We calculate theexpected true number of cases based on both numbers of deaths andhospitalizations, as we explain in Section 2.5. To account for possiblemissing number of COVID-19 deaths, we consider the excess deathsinstead of observed deaths. We calculate the excess deaths as the differencebetween the observed weekly number of deaths in 2020 and 5-year averageof weekly deaths. We ﬁnd that X (hospitalizations-to-cases ratio) and Y (deaths-to-cases ratio, using excess death data) to be 3.526% and 2.441%,respectively, during the second wave of the pandemic in Switzerland.We choose the second wave to calculate the values of X and Y as https://github.com/CMU-SAFARI/COVIDHunter/blob/main/Evaluation_Results/SimulationResultsForSwitzerland.xlsx https://github.com/CMU-SAFARI/COVIDHunter/blob/main/Evaluation_Results/ Switzerland has increased the daily number of COVID-19 testing by . × (21641/4074) on average compared to the ﬁrst wave. We calculate theexpected number of cases on a given day t with certainty rate levelsof 100% and 50% based on hospitalizations by dividing the number ofhospitalizations at t by X and X/ , respectively, as we show in Figure 1.We apply the same approach to calculate the expected number of cases ona given day t with certainty rate levels of 100% and 50% based on deathsusing Y and Y/ , respectively.Based on Figure 1, we make two key observations. 1) The plotfor the expected number of cases calculated based on the number ofdeaths is shifted forward by 10-20 days (15 days on average) from thatfor the expected number of cases calculated based on the number ofhospitalizations. This is due to the fact that each hospitalized patientusually spends some number of days in hospital before dying of COVID-19. We do not observe a signiﬁcant time shift between the plot of theexpected number of cases calculated based on the number hospitalizationsand the plot of observed (laboratory-conﬁrmed) cases. 2) The expectednumber of cases calculated based on the number of hospitalizations is onaverage . × higher than the expected number of cases calculated basedon the number of deaths (after accounting for the 15-day shift) for the samecertainty rate. This is expected as not all hospitalized patients die.We conclude that both numbers of hospitalizations and deaths can beused for estimating the true number of COVID-19 cases after accountingfor the time-shift effect. Expected cases based on hospitalizations (100%)Expected cases based on hospitalizations (50%)Expected cases based on deaths (100%)Expected cases based on deaths (50%)Observed cases N u m b e r o f C O V I D - D e a t h s Date

Fig. 1.

Observed (ofﬁcially reported) and expected number of COVID-19 cases inSwitzerland during the year of 2020. We calculate the expected number of cases basedon both the hospitalizations-to-cases and deaths-to-cases ratios for the second wave. Weassume two certainty rate levels of 50% and 100%. R number of SARS-CoV-2 We calculate the predicted R number using our model (Equation 1) andcompare it to the observed ofﬁcial R number and the R number of twostate-of-the-art models, ICL and IBZ, for the two years of 2020 and2021. We conﬁgure COVIDHunter using the following conﬁgurations: 1)CTC as environmental condition approach, 2) certainty rate levels of 50%and 100%, and 3) mitigation coefﬁcient value of 0.7. All our scripts areprovided in our GitHub page. We consider the mean R number provided bythe ICL model. We consider the median R number calculated by the IBZmodel based on observed number of hospitalized patients. IBZ providesthe predicted (after mid of December 2020) R number as the mean of theestimates from the last 7 days.Based on Figure 2, we make three key observations. 1) COVIDHunterpredicts the changes in R number much (4-13 days) earlier than thatpredicted by ICL model, which leads to a more accurate prediction. The R number predicted by COVIDHunter (with a certainty rate level of 50%)is on average 1.56 × less than that predicted by ICL model, IBZ model,and the observed ofﬁcial R number. Using a certainty rate level of 100%,COVIDHunter predicts the R number to be close in value to the observed R number. 2) Our model predicts that the current R number is still higherthan 1 (1.137 and 1.023 using certainty rate levels of 50% and 100%,respectively) during January 2021. This indicates that the spread of theSARS-CoV-2 virus is still active and it causes exponential increase innumber of new cases. 3) Our model predicts that if we keep the samemitigation measure strength as that of January 2021 for at least 30 days(M(t)= 0.7), then the R number would drop by 18.2% ( R = 0.929 and 0.836for certainty rate levels of 50% and 100%, respectively). However, if themitigation measures that are applied nationwide in Switzerland are relaxed Alser et al. by 50% (M(t)= 0.35) for only 30 days (22 January to 22 February 2021),then the R number increases by at least 2.17 × .We conclude that COVIDHunter’s estimation of the R number is moreaccurate than that calculated by the ICL and IBZ models, as validated bythe currently observed R number. Date

Date

Observed RICLCTC_50%_M(t)=0.35CTC_50%_M(t)=0.7CTC_100%_M(t)=0.7IBZ R e p r o du c t i o n N u m b e r , R ( t ) Monitoring Predicting

Fig. 2.

Observed and predicted reproduction number, R ( t ) , for the two years of 2020 and2021. We use CTC environmental condition approach, certainty rate levels of 50% and100%, and mitigation coefﬁcient values of 0.35 and 0.7 for COVIDHunter. We compareCOVIDHunter’s predicted R number to the observed R number and two state-of-the-artmodels, ICL and IBZ. The horizontal dashed line represents R ( t ) =1.0. We evaluate the mitigation coefﬁcient, M ( t ) , which represents themitigation measures applied (or to be applied) in Switzerland fromJanuary 2020 to May 2021. We use two different environmental conditionapproaches, CRW and CTC. We assume two certainty rate levels of 50%and 100% to account for uncertainty in the observed number of cases.We use ﬁve mitigation coefﬁcients, M ( t ) , values of 0.35, 0.4, 0.5, 0.6,and 0.7 for each conﬁguration of COVIDHunter during 22 January to22 February 2021. We compare the evaluated mitigation measures to thatevaluated by the Oxford Stringency Index (Hale et al. , 2020), as we providein Figure 3. We also evaluate the mitigation coefﬁcient when we ignorethe effect of environmental changes (i.e., by setting C e =1 in Equation 1),while maintaining the same number of COVID-19 cases of that providedwith a certainty rate level of 50%.Based on Figure 3, we make four key observations. 1) Excluding theeffect of environmental changes from the COVIDHunter model, by setting C e =1 in Equation 1, leads to an inaccurate evaluation of the mitigationmeasures. For example, during the summer of 2020 (between the twomajor waves of 2020), COVIDHunter ( WithoutCTC_50% ) evaluates themitigation coefﬁcient to be as high as 0.6. This means that the mitigationmeasures ( only mandatory of wearing mask on public transport) appliedduring the summer of 2020 are only

14% more relaxed compared to themitigation measures (e.g., closure of schools, restaurants, and borders,ban on small and large events) applied during the ﬁrst wave, which isimplausible. This highlights the importance of considering the effect ofexternal environmental changes on simulating the spread of COVID-19.Unfortunately, environmental change effects are not considered by any ofthe IBZ, LSHTM, ICL, and IHME models, which we believe is a seriousshortcoming of these prior models. 2) A drop by 3% (as we observe duringthe mid of November 2020) to 30% (as we observe during the end of August2020) in the strength of the mitigation measures for a certain period of time(10 to 20 days) is enough to double the predicted number of COVID-19cases. 3) We evaluate the strength of the mitigation measures applied inSwitzerland to be usually (65% of the time) up to 80% to 131% higherthan that provided by the Oxford Stringency Index. 4) The strength of themitigation measures has changed 11 times during the year of 2020, eachof which is maintained for at least 9 days and at most 66 days (32 days onaverage).We conclude that considering the effect of environmental changes (e.g.,daytime temperature) on the spread of COVID-19 improves simulationoutcomes and provides accurate evaluation of the strength of the past andcurrent mitigation measures. M i t i g a t i o n C o e ff i c i e n t M ( t ) Date

Oxford Stringency IndexCTC_50%CTC_100%CRW_50%CRW_100%WithoutCTC_50% Monitoring

Fig. 3.

Predictedstrengthofthemitigationmeasures(mitigationcoefﬁcient, M ( t ) )appliedin Switzerland from January 2020 to May 2021 provided by Oxford Stringency Index andCOVIDHunter. We use two different environmental condition approaches, CRW and CTC.We assume two certainty rate levels of 50% and 100%. We use ﬁve mitigation M ( t ) valuesof 0.35, 0.4, 0.5, 0.6, and 0.7 for each conﬁguration of our model during 22 January to22 February 2021. The plot called WithoutCTC_50% represents the evaluation of thecurrent mitigation measures while ignoring the effect of environmental changes.

We evaluate COVIDHunter’s predicted daily number of COVID-19 casesin Switzerland. We compare the predicted numbers by our model to theobserved numbers and those provided by three state-of-the-art models(ICL, IHME, and LSHTM), as shown in Figure 4. We calculate theobserved number of cases as the expected number of cases with a certaintyrate level of 100% (as we discuss in Section 3.2). We use three defaultconﬁgurations for the prediction of the ICL model: 1) strengtheningmitigation measures by 50%, 2) maintaining the same mitigation measures,and 3) relaxing mitigation measures by 50% which we refer to as

ICL+50% , ICL , and

ICL-50% , respectively, in Figures 4, 5, and 6.We use the mean numbers reported by the IHME model that representsthe most relaxed mitigation measures, called as "no vaccine" by the IHMEmodel. We use the median numbers reported by the LSHTM model.Based on Figure 4, we make four key observations. 1) Our modelpredicts that the number of COVID-19 cases reduces signiﬁcantly (lessthan 600 daily cases) within March 2021 if the same strength of thecurrently applied mitigation measure is maintained for at least 30 days. Ifthe authority decides to relax the mitigation measures to the lowest strengththat has been applied during the year of 2020 (i.e., M ( t ) = 0 . ), thenthe daily expected number of cases increases by an average of . × and . × (up to 288,827 daily cases) using the CRW and CTC environmentalapproaches, respectively. We provide a comprehensive evaluation for theeffect of different mitigation coefﬁcient values on the number of cases in theSupplementary Materials, Section 2. 2) COVIDHunter predicts the numberof COVID-19 cases to be equivalent to that predicted by the IHME modelduring the second wave with a certainty rate level of 50%. However, duringthe ﬁrst wave, the predictions of the IHME model matches the expectednumber of cases using a certainty rate level of 100%. This means that,unlike our model, the IHME model considers the laboratory-conﬁrmedcases to be as if the tests are done at a population-scale during the ﬁrst wave,which is very likely incorrect. This is in line with a recent study (Ioannidis et al. , 2020) that demonstrates the high inaccuracy of the IHME model. 3)Overall, our model predicts on average . × and . × smaller number ofCOVID-19 cases than that predicted by ICL model using CTC and CRWapproaches, respectively, and a certainty rate of 50%. This suggests thatthe multiplicative relationship between the conﬁrmed number of casesand the true number of cases can be represented by a certainty rate of22% to 33%, which our model can easily account for. The ICL modelalso shows that there is a sharp drop in the daily number of cases after 13November 2020, which corresponds to a 1.6 × , 1.4 × , and 1.3 × increasein the Oxford Stringency Index, CRW coefﬁcient, and CTC coefﬁcient,respectively, applied on 30 October 2020 as we show in Figure 3. 4) Thenumber of COVID-19 cases estimated by the LSHTM model during theﬁrst wave is 1) on average 24% less than that estimated by COVIDHunterand 2) 10 days late from that predicted by COVIDHunter, IHME, andICL. The prediction of the LSHTM model during the second wave is notavailable by the model’s pre-computed projections.We conclude that COVIDHunter provides more accurate estimationof the number of COVID-19 cases, compared to IHME (which provides OVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model inaccurate estimation during the ﬁrst wave) and ICL (which provides over-estimation), with a complete control over the certainty rate level, mitigationmeasures, and environmental conditions. Unlike LSHTM, COVIDHunteralso ensures no prediction delay. Expected CasesCRW_50%-M(T)=0.7CRW_50%_M(t)=0.35CTC_50%_M(t)=0.7CTC_50%_M(t)=0.35CTC_100%_M(t)=0.7IHMELSHTMICLICL+50%ICL-50% N u m b e r o f C O V I D - C a s e s Date Date

Fig. 4.

Observed and predicted number of COVID-19 cases by our model and other threestate-of-the-art models. We use two different environmental condition approaches, CRWand CTC with two certainty rate levels of 50% and 100%. We use two mitigation coefﬁcient, M ( t ) , values of 0.35 and 0.7 for each conﬁguration of our model during 22 January to 22February 2021. We evaluate COVIDHunter’s predicted daily number of COVID-19hospitalizations in Figure 5. We use the observed ofﬁcial numberof hospitalizations as is. Using the number of cases calculated withEquation 2, we ﬁnd X (hospitalizations-to-cases ratio) to be 4.288% and2.780%, using CRW and CTC, respectively, during the second wave.We make ﬁve key observations based on Figure 5. 1) The number ofhospitalizations calculated by COVIDHunter with a certainty rate levelof 50% matches that calculated by the IHME model. However, IHMEmodel provides a 10-12-day late prediction compared to that providedby COVIDHunter and the ICL model. 2) The ICL model predicts thenumber of hospitalizations to be × and × higher than that predicted byCOVIDHunter during the ﬁrst wave ( . × and . × during the secondwave), using the CTC and CRW approaches, respectively, for evaluatingthe environmental conditions and a certainty rate of 50%. This suggests thatthe ICL model provides × and . × higher number of hospitalizationscompared to the observed number of hospitalizations, during ﬁrst andsecond waves, respectively, which is highly unlikely and overestimated.3) COVIDHunter with a certainty rate level of 100% predicts the number ofcases to perfectly ﬁt the curve of the observed number of hospitalizations,reaching up to 257 hospitalized patients a day. 4) Our model predicts thatthe number of COVID-19 hospitalizations reduces with stricter mitigationmeasures maintained for at least 30 days. Relaxing the mitigation measuresby 50% ( M is changed from 0.7 to 0.35) exponentially increases thenumber of hospitalizations by an average of . × and . × , reachingup to 12385 new daily hospitalized patients, as predicted by COVIDHunterusing CRW and CTC environmental condition approaches, respectively.This is in line with what the ICL model ( ICL-50% ) predicts, when ICLmodel is conﬁgured to 50% relaxation in the mitigation measures. 5) Theuse of the CTC approach for determining the environmental coefﬁcientvalue yields a slightly different number of hospitalizations comparedto that provided by the use of the CRW approach. This is expectedas the CTC approach considers only the monthly average change intemperature, whereas the CRW approach considers the daily change in several environmental conditions.We conclude that 1) unlike the IBZ and LSHTM models,COVIDHunter is able to predict the number of hospitalizations and2) COVIDHunter provides more accurate estimation of the numberof hospitalizations compared to that calculated by ICL (whichprovides overestimation) and IHME (which provides late estimation).COVIDHunter predicts the number of COVID-19 hospitalizations in asimple, convenient and ﬂexible way that requires calculating only the dailynumber of cases and the hospitalization-to-cases ratio, C X . N u m b e r o f C O V I D - H o s p i t a li z a t i o n s Date Date

Observed HospitalizationsCRW_50%_M(t)=0.7CRW_50%_M(t)=0.35CTC_50%_M(t)=0.7CTC_50%_M(t)=0.35CTC_100%_M(t)=0.7IHMEICLICL+50%ICL-50%

Monitoring Predicting

Fig. 5.

ObservedandpredictednumberofCOVID-19hospitalizations. Weusetwodifferentenvironmental condition approaches, CRW and CTC with two certainty rate levels of 50%and 100%. We use two mitigation coefﬁcient values, M ( t ) , of 0.35 and 0.7 for eachconﬁguration of our model during 22 January to 22 February 2021. We evaluate COVIDHunter’s predicted daily number of COVID-19 deathsin Figure 6 after accounting for the 15-day shift (as we discuss inSection 3.2). We calculate the observed number of deaths as the number ofexcess deaths (Section 2.4) to account for uncertainty in reporting COVID-19 deaths. Using the number of cases calculated using Equation 2, weﬁnd Y (deaths-to-cases ratio, using excess death data) to be 2.730% and1.739%, using CRW and CTC, respectively, during the second wave.We make three key observations based on Figure 6. 1) COVIDHunterwith a certainty rate of 100% predicts the number of deaths to perfectlyﬁt the three curves of the observed number of excess deaths, ICL deaths,and IHME deaths, reaching up to 160 hospitalized patients a day. Duringthe second wave, the ICL curve is shifted (late prediction) by 5-10 daysfrom that of other models. 2) Similar to what we observe for the number ofhospitalizations, our model predicts that the number of COVID-19 deathssigniﬁcantly reduces with stricter mitigation measures maintained for atleast the upcoming 30 days. Relaxing the mitigation measures by 50%( M ( t ) is changed from 0.7 to 0.35) exponentially increases the death toll byan average of . × and . × , reaching up to 7885 new daily deaths, aspredicted by COVIDHunter using CRW and CTC environmental conditionapproaches, respectively. 3) During the ﬁrst wave, the use of a certainty rateof 50% provides . × and . × ( . × and . × during the secondwave) higher number of deaths compared to that provided by ICL andIHME models, when COVIDHunter uses CRW and CTC environmentalcondition approaches, respectively.We conclude that 1) unlike the IBZ and LSHTM models,COVIDHunter is able to predict the number of deaths, 2) COVIDHunterpredicts the number of deaths to be similar to that predicted by the ICLand IHME models. Yet, COVIDHunter provides more accurate estimationof other COVID-19 statistics ( R , number of cases and hospitalizations)compared to ICL and IHME, as we comprehensively evaluate in theprevious sections, and 3) COVIDHunter requires calculating only the dailynumber of cases and the deaths-to-cases ratio, C Y , to predict the dailynumber of deaths. N u m b e r o f C O V I D - D e a t h s Monitoring Predicting

Date Date

Observed Excess DeathsCRW_50%_M(t)=0.7CRW_50%_M(t)=0.35CTC_50%_M(t)=0.7CTC_50%_M(t)=0.35CTC_100%_M(t)=0.7IHMEICLICL+50%ICL-50%

Fig. 6.

Observed and predicted number of COVID-19 deaths. We use two differentenvironmental condition approaches, CRW and CTC with two certainty rate levels of 50%and 100%. We use two mitigation coefﬁcient values, M ( t ) , of 0.35 and 0.7 for eachconﬁguration of our model during 22 January to 22 February 2021. Alser et al.

We demonstrate that we can monitor and predict the spread of COVID-19in an easy-to-use, ﬂexible, and validated way using our new simulationmodel, COVIDHunter. We show how to ﬂexibly conﬁgure our model forany scenario and easily extend it for different mitigation measures andenvironmental conditions. The use of a small number of variables in ourmodel enables a simple and ﬂexible yet powerful way of adapting ourmodel to different conditions for a given region. We demonstrate theimportance of considering the effect of environmental changes on thespread of COVID-19 and how doing so can greatly improve simulationaccuracy. COVIDHunter ﬂexibly offers the ability to directly make the bestuse of existing models that study the effect of one or both of environmentalconditions and mitigation measures on the spread of COVID-19.We benchmark our model against major alternative models of theCOVID-19 pandemic that are used to assist governments. Compared tothese models, COVIDHunter achieves more accurate estimation, providesno prediction delay, and provides ease of use and high ﬂexibility due tothe simple modeling approach that uses a small number of parameters.Using COVIDHunter, we demonstrate that the spread of COVID-19 inSwitzerland (as a case study) is still active (i.e., R > 1.0) and curbingthis spread requires maintaining the same strength of the currently appliedmitigation measures for at least another 30 days. Using COVIDHunter( CTC_100%_M(t)=0.7 ) on 7 January 2021, we predicted that on 27January 2021 the number of cases, hospitalizations, and deaths willdrop by 19%, 20%, and 30%, respectively. The predicted drop is inline with the observed ofﬁcial number of cases, hospitalizations, anddeaths (as shown by the Federal Ofﬁce of Public Health in Switzerland ) but with different ratios (41%, 59%, and49%, respectively). We believe the difference between the observed and theCOVIDHunter’s predicted numbers of cases, hospitalizations, and deathsis due to one or more of the following reasons: 1) The lack of population-scale COVID-19 testing, 2) the use of a more stricter mitigation measurethan M ( t ) = 0 . , and 3) the lack of information about ground truthon number of COVID-19 cases, hospitalizations, and deaths. We provideinsights on the effect of each change in the strength of the applied mitigationmeasure on the number of daily cases, hospitalizations, and deaths. Wemake all the data, statistical analyses, and a well-documented modelimplementation publicly and freely available to enable full reproducibilityand help society and decision-makers to accurately and openly review thecurrent situation and estimate future impact of decisions.We suggest and plan at least ﬁve main directions/additions to furtherimprove the predictive power and beneﬁts of our COVIDHunter model. 1)Clustering the population based on age-groups. This has potential differenteffects on, for example, population, environmental conditions, mitigationmeasures (Bhatia and Klausner, 2020; Bi et al. , 2020). 2) Consideringvaccinated persons as another new category of persons in a population.3) Considering reinfection after immunity (Lumley et al. , 2020). 4)Considering the average number of households (or population density),as well as other potential population-level effects, while calculatingthe number of new infected persons caused by an infected person. 5)Considering different strains of the COVID-19 virus by allowing formultiple base reproduction numbers. Our goal is to update COVIDHunterwith such improvements and capabilities while keeping its simplicity, easeof use, and ﬂexibility of its modeling strategy. References

Andersen, K. G., Rambaut, A., Lipkin, W. I., et al. (2020). The proximal origin ofSARS-CoV-2.

Nature medicine , (4), 450–452.Ashcroft, P., Huisman, J. S., Lehtinen, S., et al. (2020). COVID-19 infectivity proﬁlecorrection. Swiss Medical Weekly , (3132).Bhatia, R. and Klausner, J. (2020). Estimating individual risks of COVID-19-associated hospitalizationand death usingpublicly available data. PloSone , (12),e0243026.Bi, Q., Wu, Y., Mei, S., et al. (2020). Epidemiology and transmission of COVID-19in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospectivecohort study. The Lancet Infectious Diseases , (8), 911–919.Brauner, J. M., Mindermann, S., Sharma, M., et al. (2020). Inferring the effectivenessof government interventions against COVID-19. Science .Chang, S. L., Harding, N., Zachreson, C., et al. (2020). Modelling transmission andcontrol of the COVID-19 pandemic in Australia.

Nature Communications .Davies, N. G., Kucharski, A. J., Eggo, R. M., et al. (2020). Effects of non-pharmaceutical interventions on COVID-19 cases, deaths, and demand for hospital services in the UK: a modelling study.

The Lancet Public Health .de Souza, W. M., Buss, L. F., da Silva Candido, D., et al. (2020). Epidemiologicaland clinical characteristics of the early phase of the COVID-19 epidemic in Brazil.

Nature Human Behaviour , , 856–865.Du Toit, A. (2020). Outbreak of a novel coronavirus. Nature Reviews Microbiology , (3), 123–123.Fares, A. (2013). Factors inﬂuencing the seasonal patterns of infectious diseases. International journal of preventive medicine , (2), 128.Ferguson, N., Laydon, D., Nedjati-Gilani, G., et al. (2020). Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcaredemand. Imperial College London , , 77482.Fisman, D. (2012). Seasonality of viral infections: mechanisms and unknowns. Clinical Microbiology and Infection , (10), 946–954.Flaxman, S., Mishra, S., Gandy, A., et al. (2020). Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature , (7820), 257–261.Hale, T., Petherick, A., Phillips, T., and Webster, S. (2020). Variation in governmentresponses to COVID-19. Blavatnik school of government working paper , .Hilton, J. and Keeling, M. J. (2020). Estimation of country-level basic reproductiveratios for novel Coronavirus (SARS-CoV-2/COVID-19) using synthetic contactmatrices. PLOS Computational Biology , (7), 1–10.Huisman, J. S., Scire, J., Angst, D. C., et al. (2020). Estimation and worldwidemonitoring of the effective reproductive number of SARS-CoV-2. medrxiv .Ioannidis, J. P., Cripps, S., and Tanner, M. A. (2020). Forecasting for COVID-19 hasfailed. International journal of forecasting .Kampf, G., Todt, D., Pfaender, S., and Steinmann, E. (2020). Persistence ofcoronaviruses on inanimate surfaces and their inactivation with biocidal agents.

Journal of Hospital Infection , (3), 246–251.Kobayashi, T., Jung, S.-m., Linton, N. M., et al. (2020). Communicating the Risk ofDeath from Novel Coronavirus Disease (COVID-19). Journal of Clinical Medicine , (2).Lauer, S. A., Grantz, K. H., Bi, Q., et al. (2020). The incubation period of coronavirusdisease 2019 (COVID-19) from publicly reported conﬁrmed cases: estimation andapplication. Annals of internal medicine , (9), 577–582.Li, Q., Guan, X., Wu, P., et al. (2020). Early transmission dynamics in Wuhan, China,of novel coronavirus–infected pneumonia. New England Journal of Medicine .Lumley, S. F., O’Donnell, D., Stoesser, N. E., et al. (2020). Antibody status andincidence of SARS-CoV-2 infection in health care workers.

New England Journalof Medicine .Moriyama, M., Hugentobler, W. J., and Iwasaki, A. (2020). Seasonality of respiratoryviral infections.

Annual review of virology , .Pachetti, M., Marini, B., Benedetti, F., et al. (2020). Emerging SARS-CoV-2mutation hot spots include a novel RNA-dependent-RNA polymerase variant. Journal of Translational Medicine , , 1–9.Phan, T. (2020). Genetic diversity and evolution of SARS-CoV-2. Infection, geneticsand evolution , , 104260.Prata, D. N., Rodrigues, W., and Bermejo, P. H. (2020). Temperature signiﬁcantlychanges COVID-19 transmission in (sub) tropical cities of Brazil. Science of theTotal Environment , page 138862.Rahman, B., Sadraddin, E., and Porreca, A. (2020). The basic reproduction numberof SARS-CoV-2 in Wuhan is about to die out, how about the rest of the World?

Reviews in Medical Virology , page e2111.Reichmuth, M., Hodcroft, E., Riou, J., et al. (2021). Transmission of SARS-CoV-2variants in Switzerland. https://ispmbern.github.io/covid-19/variants/index.pdf . Accessed: 2021-1-30.Reiner, R. C., Barber, R. M., Collins, J. K., et al. (2020). Modeling COVID-19scenarios for the United States.

Nature Medicine .Riddell, S., Goldie, S., Hill, A., et al. (2020). The effect of temperature on persistenceof SARS-CoV-2 on common surfaces.

Virology journal , (1), 1–7.Russell, T. W., Golding, N., Hellewell, J., et al. (2020). Reconstructing the earlyglobal dynamics of under-ascertained COVID-19 cases and infections. BMCmedicine , (1), 1–9.Shi, Q., Hu, Y., Peng, B., et al. (2020). Effective control of SARS-CoV-2 transmissionin Wanzhou, China. Nature medicine , pages 1–8.Slifka, M. K. and Gao, L. (2020). Is presymptomatic spread a major contributor toCOVID-19 transmission?

Nature Medicine , (10), 1531–1533.Toyoshima, Y., Nemoto, K., Matsumoto, S., et al. (2020). SARS-CoV-2 genomicvariations associated with mortality rate of COVID-19. Journal of human genetics , (12), 1075–1082.Tradigo, G., Guzzi, P. H., Kahveci, T., and Veltri, P. (2020). A method to assessCOVID-19 infected numbers in Italy during peak pandemic period. In , pages 3017–3020. IEEE.Wang, J., Tang, K., Feng, K., and Lv, W. (2020). High temperature and high humidityreduce the transmission of COVID-19. Available at SSRN 3551767 .Wei, W. E., Li, Z., Chiew, C. J., et al. (2020). Presymptomatic Transmission ofSARS-CoV-2—Singapore, January 23–March 16, 2020.

Morbidity and MortalityWeekly Report , (14), 411.Wilke, C. O. and Bergstrom, C. T. (2020). Predicting an epidemic trajectory isdifﬁcult. ProceedingsoftheNationalAcademyofSciences , (46), 28549–28551.Xie, J. and Zhu, Y. (2020). Association between ambient temperature and COVID-19infection in 122 cities from China. Science of the Total Environment , , 138201.Xu, R., Rahmandad, H., Gupta, M., et al. (2020). The Modest Impact of Weatherand Air Pollution on COVID-19 Transmission. medRxiv . upplementary Material for COVIDHunter: An Accurate, Flexible,and Environment-Aware Open-SourceCOVID-19 Outbreak Simulation Model

Mohammed Alser, Jeremie S. Kim, Nour Almadhoun Alserr,Stefan W. Tell, and Onur Mutlu

ETH Zurich, Zurich 8006, Switzerland

The purpose of this study is to explore the relationship between the daily new conﬁrmed COVID-19case counts or death counts and temperature in Switzerland. We obtain the daily number of conﬁrmedCOVID-19 cases and deaths in Switzerland from oﬃcial reports of the Federal Oﬃce of Public Health(FOPH) in Switzerland [1] starting from March 2020 until January 2020. We obtain the air temperaturedata from the Federal Oﬃce of Meteorology and Climatology (MeteoSwiss) in Switzerland [2]. Wecalculate the daily average air temperature during the same time period (March 2020 to December 2020)for all the 26 cantons in Switzerland.To evaluate the correlation between the temperature data and the number of daily conﬁrmed COVID-19 cases or the daily counts of death, we use a generalized additive model (GAM). GAM is usually used tocalculate the linear and non-linear regression models between meteorological factors (e.g., temperature,humidity) with COVID-19 infection and transmission [3, 4, 5]. Our analyses are performed with Rsoftware version 4.0.3., where p − value < .

05 is considered statistically signiﬁcant. Our model attemptsto represent the linear behavior of the growth curve of the counts of the new conﬁrmed cases or deaths inSwitzerland. Therefore, we can test the hypothesis of whether there is a signiﬁcant negative correlationbetween the COVID-19 conﬁrmed daily case or death counts and temperature.The results demonstrate a signiﬁcant negative correlation between temperature and COVID-19 dailycase and death counts. Speciﬁcally, the relationship is linear for the average temperature in the rangefrom 1-26 ◦ C. Based on Figure S1, we make two key observations. 1) For each 1 ◦ C rise in temperature,there is a 3.67% ( t -value = 3.244 and p -value = 0.0013) decrease in the daily number of COVID-19conﬁrmed cases (Figure S1(a)). 2) For each 1 ◦ C rise in temperature, there is a 23.8% decrease in thedaily number of COVID-19 deaths ( t -value = 9.312 and p -value = 0.0), as shown in Figure S1(b).1 a)(b) Figure S1: Correlation between temperature and COVID-19 conﬁrmed (a) case count and (b) deathcount in 26 cantons of Switzerland. 2

Evaluating the Eﬀect of Diﬀerent Mitigation Coeﬃcient Val-ues on COVIDHunter’s Predicted Number of Cases, Hospi-talizations, and Deaths

Using COVIDHunter, we predict the number of COVID-19 cases, hospitalizations, and deaths during 22January to 31 March 2021. We show the maximum and the average of daily number of COVID-19 cases,hospitalizations, and deaths over 22 January to 31 March 2021 in Figures S2 and S3, respectively. Weuse two environmental condition approaches, CRW and CTC, with a certainty rate level of 50%. Weassume ﬁve mitigation coeﬃcient, M ( t ), values of 0.35, 0.4, 0.5, 0.6, and 0.7 for each conﬁguration ofCOVIDHunter during 22 January to 22 February 2021.This range of mitigation coeﬃcient values covers the lowest (i.e., M ( t )=0.35) and the highest (i.e., M ( t )=0.7) strengths of mitigation measures that have been applied during the year of 2020.Based on Figures S2 and S3, we make three key observations. 1) COVIDHunter predicts that the maximum of daily number of COVID-19 cases, hospitalizations, and deaths over 22 January to 31 March2021 would be 4972, 213, and 136, respectively, using CRW and M ( t )=0.7, as we show in Figure S2(a-c).Using our environmental condition approach, CTC, and M ( t )=0.7, the maximum of daily number ofCOVID-19 cases over 22 January to 31 March 2021 would be 7580 and the maximum of daily numberof COVID-19 hospitalizations and deaths would be almost same as that calculated by COVIDHunterwith CRW, as we show in Figure S2(d-f). 2) Relaxing the mitigation measures by 50% ( M is changedfrom 0.7 to 0.35) exponentially increases the maximum of daily number of cases, hospitalizations, anddeaths by 58 × , reaching up to 288827, 12385, and 7885, respectively, as predicted by COVIDHunter withthe CRW approach (Figure S2(a-c)). Using the CTC appraoch and M ( t )=0.35, COVIDHunter predictsan exponential increase in the maximum of daily number of cases, hospitalizations, and deaths by only . × , as we show in Figure S2(a-c). This is expected as the CTC approach considers only the drop intemperature rather than the average eﬀect of many environmental conditions as the CRW approach does.3) Relaxing the mitigation measures by 50% ( M is changed from 0.7 to 0.35) causes the daily number ofcases, hospitalizations, and deaths to exponentially increase by an average of 29 . × and 23 . × over 22January to 31 March 2021 using CRW and CTC environmental approaches, respectively, as we show inFigure S3.We conclude that COVIDHunter provides ﬂexible evaluation of the eﬀect of diﬀerent strength of thepast and current mitigation measures on the number of COVID-19 cases, hospitalizations, and deaths.COVIDHunter evaluates the applied mitigation measures with high ﬂexibility of conﬁguring the envi-ronmental coeﬃcient and mitigation coeﬃcient, which helps society and decision-makers to accuratelyreview the current situation and estimate future impact of decisions.3 a) (b) (c)(d) (e) (f) -11.79x -11.79x -11.79x -10.41x -10.41x -10.41x M a x . N u m b e r o f C a s e s M a x . N u m b e r o f H o s p i t a li z a t i o n s M a x . N u m b e r o f D e a t h s M a x . N u m b e r o f C a s e s M a x . N u m b e r o f H o s p i t a li z a t i o n s M a x . N u m b e r o f D e a t h s Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t )Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t ) CRW CRW CRWCTC CTC CTC

Figure S2: The maximum of daily number of COVID-19 cases, hospitalizations, and deaths as predictedby COVIDHunter over 22 January to 31 March 2021. We use ﬁve mitigation coeﬃcient, M ( t ), values of0.35, 0.4, 0.5, 0.6, and 0.7 for each conﬁguration of our model during 22 January to 22 February 2021. Weuse two diﬀerent environmental condition approaches, CRW (a)-(c) and CTC (d)-(f ) with a certaintyrate level of 50%. Dashed line represents exponential model ﬁt to data. (a) (b) (c)(d) (e) (f) -9.714x -9.714x -9.714x A v e r a g e N u m b e r o f C a s e s A v e r a g e N u m b e r o f H o s p i t a li z a t i o n s A v e r a g e N u m b e r o f D e a t h s -9.018x -9.018x -9.018x A v e r a g e N u m b e r o f C a s e s A v e r a g e N u m b e r o f H o s p i t a li z a t i o n s A v e r a g e N u m b e r o f D e a t h s Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t )Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t ) Mitigation Coefficient, M ( t ) CRW CRW CRWCTC CTC CTC

Figure S3: The average of daily number of COVID-19 cases, hospitalizations, and deaths as predictedby COVIDHunter over 22 January to 31 March 2021. We use ﬁve mitigation coeﬃcient, M ( t ), values of0.35, 0.4, 0.5, 0.6, and 0.7 for each conﬁguration of our model during 22 January to 22 February 2021. Weuse two diﬀerent environmental condition approaches, CRW (a)-(c) and CTC (d)-(f ) with a certaintyrate level of 50%. Dashed line represents exponential model ﬁt to data.4 Evaluated Datasets

Our experimental evaluation uses a large number of diﬀerent real datasets, including 1) daily R numbervalues, 2) observed daily number of COVID-19 cases, 3) observed daily number of COVID-19 hospital-izations, 4) observed daily number of COVID-19 deaths, 5) number of excess deaths, 6) the estimatedstrength of mitigation measures as calculated by the Oxford Stringency Index, 7) estimation of COVID-19 statistics as calculated by existing state-of-the-art simulation models, ICL, IHME, LSHTM, and IBZ,from seven diﬀerent sources as we list below. The raw datasets are provided in the Supplementary ExcelFile and it can be also obtained from the original sources as we list below: • Observed COVID-19 statistics (R number values and number of cases, hospitalizations, and deaths) – Oﬃcial reports (January 7, 2021): – Smoothed data (January 7, 2021): https://ourworldindata.org/coronavirus/country/switzerland?country=~CHE • Excess deaths: – Information: – Direct link (January 7, 2021): • Oxford Stringency Index – • Imperial College London (ICL) Model: – Information: https://mrc-ide.github.io/global-lmic-reports/ – Direct link (January 15, 2021): https://github.com/mrc-ide/global-lmic-reports/raw/master/data/2021-01-30 v7.csv.zip • Institute for Health Metrics and Evaluation (IHME) Model: – Information: https://mrc-ide.github.io/global-lmic-reports/ – Direct link (January 15, 2021): • The London School of Hygiene Tropical Medicine (LSHTM) Model: – Information: https://cmmid.github.io/topics/covid19/global cfr estimates.html – Direct link (January 15, 2021): https://raw.githubusercontent.com/cmmid/cmmid.github.io/master/topics/covid19/reports/under reporting estimates/under ascertainment estimates.csv • The Theoretical Biology Group at ETH Zurich (IBZ) Model: – Information: https://ibz-shiny.ethz.ch/covid-19-re-international/ – Direct link (January 15, 2021): https://github.com/covid-19-Re/dailyRe-Data

References [1] Coronavirus - The Federal Oﬃce of Public Health in Switzerland. . Accessed: 2020-12-31.[2] Switzerland forecast - The Federal Oﬃce of Meteorology and Climatology MeteoSwiss. . Accessed: 2020-12-31. https://github.com/CMU-SAFARI/COVIDHunter/blob/main/Evaluation Results/ComparisonToOtherModels.xlsx

53] Jiangtao Liu, Ji Zhou, Jinxi Yao, Xiuxia Zhang, Lanyu Li, Xiaocheng Xu, Xiaotao He, Bo Wang,Shihua Fu, Tingting Niu, et al. Impact of meteorological factors on the COVID-19 transmission: Amulti-city study in China.

Science of the Total Environment , page 138513, 2020.[4] David N Prata, Waldecy Rodrigues, and Paulo H Bermejo. Temperature signiﬁcantly changes COVID-19 transmission in (sub) tropical cities of Brazil.

Science of the Total Environment , page 138862, 2020.[5] Jingui Xie and Yongjian Zhu. Association between ambient temperature and COVID-19 infection in122 cities from China.