[PDF] Determination and estimation of optimal quarantine duration for infectious diseases with application to data analysis of COVID-19

Abstract

Quarantine measure is a commonly used non-pharmaceutical intervention during the outbreak of infectious diseases. A key problem for implementing quarantine measure is to determine the duration of quarantine. In this paper, a policy with optimal quarantine duration is developed. The policy suggests different quarantine durations for every individual with different characteristic. The policy is optimal in the sense that it minimizes the average quarantine duration of uninfected people with the constraint that the probability of symptom presentation for infected people attains the given value closing to 1. The optimal solution for the quarantine duration is obtained and estimated by some statistic methods with application to analyzing COVID-19 data.

Full PDF

DDetermination and estimation of optimal quarantineduration for infectious diseases with application to dataanalysis of COVID-19

Ruoyu Wang and Qihua Wang ∗ Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences,Beijing 100190, China. School of Statistics and Mathematics, Zhejiang Gongshang University, Zhejiang310018, China.

July 3, 2020

Abstract

Quarantine measure is a commonly used non-pharmaceutical intervention during the out-break of infectious diseases. A key problem for implementing quarantine measure is to de-termine the duration of quarantine. Diﬀerent from the existing methods that determine aconstant quarantine duration for everyone, we develop an individualized quarantine rule thatsuggests diﬀerent quarantine durations for individuals with diﬀerent characteristics. Theproposed quarantine rule is optimal in the sense that it minimizes the average quarantineduration of uninfected people with the constraint that the probability of symptom presen-tation for infected people attains the given value closing to 1. The optimal solution for thequarantine duration is obtained and estimated by some statistical methods with applicationto analyzing COVID-19 data. ∗ Corresponding author a r X i v : . [ s t a t . A P ] J u l introduction During the outbreak of infectious diseases (e.g. EVD, SARS, MERS and COVID-19), quaran-tine measures are commonly implemented to limit disease transmission and morbidity. Extensiveresearch has shown that quarantine is important in reducing the number of people infected andthe number of deaths [Lipsitch et al., 2003, Ferguson et al., 2006], especially when there is noeﬀective treatment for the disease, see Nussbaumer-Streit et al. [2020] for a recent review. Toestablish a quarantine strategy, some studies use epidemic models such as SEIR type model todetermine the optimal time-varying quarantine rate by optimal control theory, see for instanceBehncke [2000], Yan and Zou [2008], Ahmad et al. [2016]. Lipsitch et al. [2003] discussed therelationship between the quarantine fraction of each infectious cases contacts and the number ofperson-days in quarantine. However, a key problem when imposing the quarantine measure is todetermine the quarantine duration. An extremely long quarantine duration makes sure that mostinfected individuals would exhibit symptom under quarantine and then get further quarantine andmedical treatment. That is, a long quarantine duration can stop virus from spreading to others.Nevertheless, this may inconvenience uninfected individuals, incur many extra ﬁnancial and socialcosts and even aﬀect economic development [Reich et al., 2018]. Hence a good quarantine measureshould balance eﬀectiveness and cost of the quarantine measure and have a proper duration.Farewell et al. [2005] proposed to determine the quarantine duration based on the distribu-tion of incubation period. Nishiura [2009] analyzed the apporpriate quarantine period using thequantiles of the incubation period distribution. The existing methods do not consider the charac-teristics of quarantined individuals and suggest the same quarantine duration for every individual.Nevertheless, diﬀerent people may have diﬀerent probability of being infected and diﬀerent incuba-tion period of a disease. Indeed, the probability of being infected for every individual is unknown.However, some individual characteristics such as age, sex, infection rate in the region from whichthe individual comes and whether an individual is a close contact, which may aﬀect the incubation2eriod distribution or the infected probability, can be observed. Thus to guarantee the eﬀective-ness and minimize the cost of the quarantine measure, one may intend to set a proper quarantineduration for each potentially exposed individual based on his or her characteristics. To the bestof our knowledge, no literature addresses this issue.In this paper, we consider the problem and develop an optimal quarantine rule. The proposedquarantine rule implements diﬀerent quarantine duration for diﬀerent individual depending on hisor her characteristics. We make the rule optimal by minimizing the average quarantine durationof uninfected people with the constraint that the probability of symptom presentation for infectedpeople attains any given value, which may be close to 1. We obtain the optimal solution forthe problem and estimate the optimal solution by some statistical methods. Coronavirus diseaseCOVID-19 pandemic is known to become a global health crisis since its emergence in Asia latelast year. Considerable attention has paid to studying the optimal prevent and control strategyof COVID-19 and various public health measures such as testing, social distancing, lockdown andquarantine in a macro prespective [Piguillem and Shi, 2020, Charpentier et al., 2020, Acemogluet al., 2020, Alvarez et al., 2020]. Quarantine is one of the key aspects of infection control duringthe pandemic of COVID-19. This paper focuses on study of the optimal quarantine duration forinfectious diseases with application to data analysis of COVID-19, which is not discussed in allthe aforementioned literature. Comparing to the standard quantile methods due to Farewell et al.[2005], Nishiura [2009] and Liu et al. [2020], the data analysis results demonstrate that our methodsuggests a shorter average quarantine duration while keep the risk of virus spreading below a givenlevel. That is, the proposed method can keep the risk of virus spreading at the same low levelas the standard methods in addition to saving cost of days lost. After quarantine, uninfectedindividuals may work and study by keeping some social distances or some other simpler measures.Hence, this papers make signiﬁcant contribution to decrease ﬁnancial and social costs and impacton economic development with the assurance of controlling the epidemic.3

Optimal quarantine rule

Let X be a feature vector describing the characteristics of a potentially exposed individual. Let X be the support of X and let I be a variable indicating whether or not the individual has beeninfected ( I = 1 if infected and I = 0 otherwise). Clearly, I is unobservable before quarantine. Aquarantine rule t ( · ) is a map from X to R + , that is, t : X → R + . Let Y > I = 1 and the incubation period is not deﬁned for I = 0.An infected individual has low risk of infecting others if the individual has symptom presentationand hence is diagnosed during the quarantine. A good quarantine duration should ensure a largeenough probability that an infected individual has sympton presentation during the quarantine andminimize the average quarantine duriation of uninfected individuals. Then the problem of ﬁndingthe optimal quarantine rule can be expressed as ﬁnding a map that minimizes the following problemmin t E t ( X ) s.t. P ( Y ≤ t ( X )) ≥ − (cid:15), (1)where (cid:15) is a predeﬁned small positive number (e.g. 0.05) and the subscript 0 or 1 denotes thatthe expection or probability is taken conditional on I = 0 or 1. In this paper, we call E t ( X ) theredundant length of the quarantine rule t ( · ) and call P ( Y ≤ t ( X )) the ﬁnding probability of thequarantine rule t ( · ).If there is no available feature X , problem [1] reduces tomin t t s.t. P ( Y ≤ t ) ≥ − (cid:15). This just deﬁnes the 1 − (cid:15) quantile of incubation period distribution. In particular, this suggeststhe 0.95 quantile method due to Farewell et al. [2005] when (cid:15) = 0 . Remark 1

Suppose θ is the proportion of quarantined infected people in all the infected people nd R is the basic reproductive number of the disease. Then every infected individual who isnot quarantined or released early causes R infections approximately. Hence an infected individualcauses R ( θ, (cid:15), R ) = (1 − θ ) R + θ(cid:15)R infections on average. If R ( θ, (cid:15), R ) < , the virus spreadingcan be controlled and the disease will die out exponentially. For example, suppose θ = 0 . and R = 4 , then the epidemic will be controlled if we choose an (cid:15) smaller than / . However, themain purpose of quarantine is to stop the spread of the virus as soon as possible, and hence weusually take (cid:15) to be a smaller constant such as . . Suppose X = ( C, W ), where C is a categorical variable that takes value in { , . . . , K } and W ∈ R d is a vector of continuous variables. Let µ be the product of the counting measure on { , . . . , K } and the Lebesgue measure on R d . Let f ( x ) be the density function of X conditional on I = 1w.r.t. µ and f ( x ) be the density function of X conditional on I = 0 w.r.t. µ . We use F ( y | x )to denote the distribution function of Y conditional on X = x and I = 1 and use f ( y | x ) todenote the corresponding density function w.r.t. the Lebesgue measure. Then problem [1] can bereformulated as min t (cid:90) t ( x ) f ( x ) dµ ( x ) s.t. (cid:90) F ( t ( x ) | x ) f ( x ) dµ ( x ) ≥ − (cid:15) This is a variation problem and not easy to solve in general. However, we ﬁnd that the solutionof this problem is easy to handle under the following conditions.

Condition 1 ∀ x ∈ X , f ( y | x ) > for any y > and f ( y | x ) is continuous with respect to y . Moreover, f ( y | x ) is either strictly monontonous with respect to y or unimodal and strictlymonontonous with respect to y on both of the monotone intervals. ondition 2 < inf x P ( I = 1 | X = x ) ≤ sup x P ( I = 1 | X = x ) < and inf x sup y f ( y | x ) > . Condition 1 is a mild condition and can be satisﬁed by many commonly used parameterizationsof the incubation period (e.g. weibull, lognormal, gamma and Erlang distributions). Condition 2is a mild regular condition. It is not of practical signiﬁcant to consider the case where P ( I = 1 | X = x ) = 0 ,

1. If we assume for any x , the conditional distribution f ( y | x ) = f ( y | α x , λ x ) isweibull distribution with shape parameter α x and scale parameter λ x , then a suﬃcient conditionfor inf x sup y f ( y | x ) > x λ x < ∞ . By Bayes formula, f ( x ) f ( x ) = 1 − P ( I = 1) P ( I = 1) P ( I = 1 | X = x )1 − P ( I = 1 | X = x ) . By Condition 2, we have inf x f ( x ) f ( x ) > x sup y f ( y | x ) f ( x ) f ( x ) ≥ inf x sup y f ( y | x ) inf x f ( x ) f ( x ) > . Let c ∗ = inf x sup y f ( y | x ) f ( x ) f ( x ) , then we can establish the following theorem. Theorem 1

For any < c ≤ c ∗ and x ∈ X , deﬁne t c ( x ) = sup { y : f ( y | x ) f ( x ) /f ( x ) ≥ c } .Under Conditions 1 and 2, if (cid:15) is small enough such that (cid:15) ≤ − E [ F ( t c ∗ ( X ) | X )] , then there is aunique constant c ∈ (0 , c ∗ ] such that E [ F ( t c ( X ) | X )] = 1 − (cid:15) and t c ( · ) is the unique minimumpoint of problem [1]. The proof of Theorem 1 is given in SI Appendix. In what follows, let us make some intuitive6xplaination for Theorem 1. Our optimal quarantine rule is determined based on the density ratio f ( y | x ) f ( x ) f ( x )which is kind of like the likelihood ratio in hypothesis testing Lehmann [2005]. Suppose we needto determine the quarantine duration for an individual with feature value X = x , then f ( y | x ) f ( x ) /f ( x ) is a curve of y . For a given c , we call the set { y : f ( y | x ) f ( x ) /f ( x ) ≥ c } the high density ratio period, see the following picture for an illustration (Fig. 1).Figure 1: An example for high density ratio period with threshold 0 .

02: the period between theleft and right endpoints of the gray area.In the high density ratio period, the individual has relatively high probability density of symtompresentation if an individual is infected. A possible quarantine policy is“release the individualif an individual does not develop any symtom until the end of the high density ratio period”and we denote the resulting quarantine duration by t c ( x ). A question is how to determine the7hreshold value c . Clearly, for every x , c cannot be larger than sup y f ( y | x ) f ( x ) /f ( x ), thepeak of the curve. This implies that c cannnot be larger than c ∗ . The larger c is, the smallerthe ﬁnding probability is. If (cid:15) ≤ − E [ F ( t c ∗ ( X ) | X )], then the ﬁnding probability at c ∗ E [ F ( t c ∗ ( X ) | X )] is less than or equal to 1 − (cid:15) . Condition 1 implies the monotonicity andcontinuity of E [ F ( t c ( X ) | X )] on c . Theorem 1 states that there exists a unique constant c ∈ (0 , c ∗ ] such that E [ F ( t c ( X ) | X )] = 1 − (cid:15) and t c ( · ) is the optimal quarantine rule. Remark 2

In practice, the loss of being quarantined for diﬀerent individual may also be diﬀerent.We can easily adapt our framework to this scenario by extending problem [1] to a more generalform min t E w ( X ) t ( X ) s.t. P ( Y ≤ t ( X )) ≥ − (cid:15) where w ( x ) > is a weighting function which indicates diﬀerent costs of quarantine for diﬀerentindividuals. In this case a modiﬁed version of Theorem 1 with f ( x ) in the deﬁnition of t c ( x ) replaced by w ( x ) f ( x ) follows directly under Conditions 1 and 2 if < inf x w ( x ) < sup x w ( x ) < ∞ . Now we propose an estimation procedure for the optimal quarantine duration for any x ∈ X . Toestimate the optimal quarantine duration given in Theorem 1, we need to estimate f ( y | x ), f ( x )and f ( x ). Suppose we have historical quarantine data denoted by ( Y , X , I ) , . . . , ( Y n , X n , I n ).Note that in the historical data we know whether an individual is infected. Here we deﬁne Y i = 0 forsamples with I i = 0 for i = 1 , . . . , n . Then f ( y | x ), f ( x ) and f ( x ) can be estimated consistentlyby either standard parametrical or nonparametrical methods, e.g. maximum likelihood method orkernel smooth method Hansen [2008], van der Vaart [1998]. Suppose ˆ f ( y | x ), ˆ f ( x ) and ˆ f ( x )are the resulting estimators. Then c ∗ can be estimated by ˆ c ∗ = inf x sup y ˆ f ( y | x ) ˆ f ( x ) ˆ f ( x ). Let8 F ( y | x ) = (cid:82) y ˆ f ( s | x ) ds be the estimated conditional distribution and ˆ t c ( x ) = sup { y : ˆ f ( y | x ) ˆ f ( x ) / ˆ f ( x ) ≥ c } , then c can be estimated by the solution of (cid:80) I i =1 ˆ F (ˆ t c ( X i ) | X i ) /n = 1 − (cid:15) as an equation of c on the interval (0 , ˆ c ∗ ], where (cid:15) is a user speciﬁed positive number smaller than1 − (cid:80) I i =1 ˆ F (ˆ t ˆ c ∗ ( X i ) | X i ) /n . The resulting estimator of c is denoted by ˆ c . Finally, the estimatorof the optimal quarantine duration is ˆ t opt ( x ) = sup { y : ˆ f ( y | x ) ˆ f ( x ) / ˆ f ( x ) ≥ ˆ c } . In this subsection, we apply our method to analyzing COVID-19 data. Demographic featuressuch as age, sex and comorbidities are important in analyzing epidemiological data Dowd et al.[2020]. The incubation period data along with age information are available from the websites ofthe centres of disease control, or the daily public reports on COVID-19 in 29 provinces in Chinaand are reported by Liu et al. [2020]. In this subsection we use this dataset to construct theoptimal quarantine rule using age as the feature X . Here we only use the information of patientswho are infected before Jan 23th to avoid the biased sampling problem disscussed in Liu et al.[2020]. The total number of samples is 1770. We use these data to estimate f ( x ) and f ( y | x ).In the dataset, the proportions of patients younger than 11 and patients older than 80 are verysmall (1.9% and 0.6% respectively). Considering the accuracy of the estimation we focus on thepeople aged between 11 and 80 and take these people as the whole population in our analysis (i.e. X = { , . . . , } ). We apply the kernel method with a Gaussian associate kernel introduced inKokonendji and Kiesse [2011] to estimate f ( x ).The reported integer value incubation period is regarded as the least integer greater than orequal to the true incubation period. Let Z = (cid:100) Y (cid:101) where (cid:100)·(cid:101) is the ceiling function, then the dataare regarded as i.i.d. sample from Z, X | I = 1 and denoted by ( Z , X ) , . . . , ( Z n , X n ). We assume9onditional on X = x the incubation period Y follows a weibull distribution, which is commonlyused in analyzing incubation period Lauer et al. [2020]. And we further assume the conditionaldensity has the form f ( y | x, α , γ ) = α γ T0 v ( x ) (cid:16) yγ T0 v ( x ) (cid:17) α − exp (cid:104) − (cid:16) yγ T0 v ( x ) (cid:17) α (cid:105) , where v ( x ) = (1 , x, x ) T and α and γ = ( γ , γ , γ ) T are unknown parameters satisfying α and γ T0 v ( x ) >

0. Let α > γ = ( γ , γ , γ ) T and V i = (1 , X i , X i ) T for i = 1 , . . . , n , then the loglikelihood function Odell et al. [1992] is l ( α, γ ) = 1 n n (cid:88) i =1 log (cid:8) exp (cid:104) − (cid:16) Z i − γ T V i (cid:17) α (cid:105) − exp (cid:104) − (cid:16) Z i γ T V i (cid:17) α (cid:105)(cid:9) and f ( y | x ) can be estimated by f ( y | x, ˆ α, ˆ γ ) where ( ˆ α, ˆ γ T ) T is the maximum likelihoodestimator. Here we use a quadratic function to ﬁt the conditional distribution based on theexploratory data analysis. The estimated value of the parameters are listed as follow:Table 1: Estimated parametersParameter Estimation α γ (9.09, -0.11, 0.0015)Since the number of infected people in China is relative small compared to the entire population,we use the age distribution of the entire population of China to estimate the age distributionconditional on I = 0 and apply the kernel method with a Gaussian associate kernel to estimate f ( x ).In this section, we choose (cid:15) = 0 .

05 which is suﬃcient to control the epidemic under the scenariodiscussed in Remark 1. There are two other ways to make sure P ( Y ≤ t ( X )) ≥ .

95. One is10o omit the feature and use the 0 .

95 sample quantile of the incubation period as the quarantineduration for everyone Farewell et al. [2005] and another is to use the 0 .

95 estimated quantileof the conditional incubation period distribution as the quarantine duration for people at thecorresponding age Liu et al. [2020]. Quarantine durations for people at diﬀerent ages obtained bythe proposed method and the two quantile methods are plotted in Fig. 2.Figure 2: Quarantine duration for people at diﬀerent ages: 0.95 quantle, dashed line; 0.95 condi-tional quantile, dashes dotted line; optimal duration, solid line.Figure 2 shows that the 0.95 sample quantile of incubation period is 15 days, which is one daylonger than the current quarantine duration in China. The estimated 0.95 conditional quantileof incubation period of middle-aged people is shorter compared to the young people and the oldpeople. The estimated optimal quarantine duration is close to 15 days for people older than 30 andare shorter than 15 days for people younger than 30. This is because the optimal quarantine ruledenpends on the probability that an individual is infected and young people is less likely infectedin the dataset we consider. For x ∈ X , let ˆ t q ( x ) and ˆ t cq ( x ) be the quarantine durations obtainedby 0 .

95 sample quantile and 0 .

95 estimated conditional quantile, respectively. To compare theperformance of these two methods and the optimal quarantine rule, we calculate the redundant11ength and ﬁnding probability by (cid:88) j =11 p j ˆ t s ( j )and (cid:80) i =1 { Z i ≤ ˆ t s ( X i ) } , where s denote q , cq or opt respectively. Because non-integer quarantine duration is not practical,the quarantine duration is rounded to the nearest integer in calculation. The results are listed inTable 2.Table 2: Redundant length and ﬁnding probability associated with the three quarantine rules usingage as a feature Method Redundant Length Finding Probability0.95 quantile 15.00 96.7%0.95 conditional quantile 15.04 96.7%optimal quarantine rule 14.32 96.2%Table 2 shows that the optimal quarantine rule has the shortest redundant length with guar-anteed ﬁnding probability. The 0.95 conditional quantile and the optimal quarantine rule arederived based on the conditional distribution model of incubation period. The reasonable ﬁndingprobabilities in Table 2 also justify our model assumption. The improvement is not great in termsof redundant length. The reason may be that age does not provide suﬃcient information for ob-taining a quarantine rule with good performance. Next, let us consider an example with infectionrate in the individual’s origin country observed in addition to age.12 .2 Optimal quarantine rule based on age and infection rate of origincountry Travel quarantine for out-of-country travelers and residents from another country is a commonpolicy around the world during COVID-19 pandemic. When determining quarantine duration, thetraveller’s age and infection rate of the disease in the origin country can be observed. In this case,infection rate in a traveller’s origin country is an important feature that reﬂects the probabilitythat the traveller is infected. For every country we can calculate a current infection index (CII):CII = 10 ∗ a/b where a is the number of new cases in the country during the last two weeks and b is the total population of the country. Here we multiply the rate by a constant 10 to avoid thisindex being too small. We only consider the number of infections in the last two weeks because thenumber of infections before two weeks provide little information about the infection probabilityof current traveller. We divide the countries with diﬀerent CII into three groups because manycountries have similar infection rates. Countries with CII larger than 300 are divided into the highrisk group, countries with 50 < CII ≤

300 are divided into the medium risk group and countrieswith CII ≤

50 are divided into the low risk group. Besides age, we take the risk level of thetraveller’s origin country as a feature.In this subsection, we obtain the optimal quarantine rule using information from multipledatasets. We consider 79 countries in our model since their data are relatively complete in all thedata sources. The number of conﬁrmed cases of each country is reported by the Center for SystemsScience and Engineering (CSSE) at Johns Hopkins University (JHU). We use the number of casesconﬁrmed between May 1st to May 14th in each country to calculate the current infection index.SI Appendix, Table S1, shows countries with diﬀerent risk levels.As in the previous subsection, we focus on the people aged between 11 and 80. We approximatethe feature distribution of uninfected people by the distribution of the entire population (peoplein the 79 countries) and estimate f ( x ) by the kernel method with a Gaussian associate kernel13okonendji and Kiesse [2011] using data from the website of United Nations. Data of 5008 COVID-19 patients from Xu et al. [2020] are used to estimate f ( x ). However, we take the proportion ofconﬁrmed cases from diﬀerent countries reported by CSSE at JHU instead of that in the datasetof Xu et al. [2020] since the proportion reported by CSSE at JHU are regarded more exact.The dataset of Xu et al. [2020] does not contain incubation period of the patients. To overcomethis diﬃculty, we assume that the distribution of incubation period for patients at the same age arethe same across countries at diﬀerent risk levels. Thus we can use the conditional distribution modelof incubation period ﬁtted in the previous subsection to impute the missing incubation period.Then we can estimate the three individualized quarantine durations using the imputed dataset.Here we employ the mutiple imputation method which is standard in missing data literature Littleand Rubin [2019]. We impute the dataset for ten times and average the resulting estimatorsover diﬀerent imputed datasets. Quarantine durations obtained by the sample 0.95 quantile, theestimated 0.95 conditional quantile and the optimal quarantine rule are plotted in Fig. 3.Figure 3: Quarantine durations for people at diﬀerent ages: 0.95 quantile, dashed line; conditionalquantile, dashed dotted line; optimal quarantine duration for travellers from high risk countries,dotted line; optimal quarantine duration for travellers from medium risk countries, short dashedline; optimal quarantine durationfor travellers from low risk countries, solid line.14t can be seen that the optimal quarantine rule gives a much longer duration to travellers fromthe high risk countries, a duration slightly longer than the 0.95 quantile to travellers from themedium risk countries and a very short duration to travellers from the low risk countries. Optimalquarantine durations for travellers from high, medium and low risk countries shows diﬀerent trendson age. The trend of high and medium risk countries is consistent with the trend of the conditionalquantile curve. This may be because if the infection rate is relatively high, optimal quarantineduration mainly depends on the incubation period. For travellers from low risk countries, theoptimal quarantine rule gives shorter quarantine duration for young people compared to old people.The reason may be that in the low risk countries, infection rate of young people is relatively low.The sample 0.95 quantile and the estimated 0.95 conditional quantile methods give quarantinedurations that are not dependent on the risk level of the orgin country since the these two methodsare independent of the national infected rate by deﬁnition.We calculate the reduncant length and the ﬁnding probability for the three methods by asimilar procedure as in the previous subsection. The results are reported in Table 3.Table 3: Redundant length and ﬁnding probability associated with the three quarantine rules usingage and the risk level of the traveller’s origin country as featuresMethod Redundant Length Finding Probability0.95 quantile 15.00 94.9%0.95 conditional quantile 14.99 95.1%optimal duration 10.94 95.8%Table 3 shows that our optimal quarantine rule shorten the average quarantine duration ofuninfected people greatly with the guaranteed probability of ﬁnding infected individual. Com-paring the results in Table 2 and 3, we can see that it is signiﬁcant to add the risk level of thetraveller’s origin country as a feature for the optimal rule. If one can collect other features whichare associated with the incubation period or the probability that an individual is infected, the15ptimal quarantine rule may perform even better. Although we apply our method to analyzing the COVID-19 data, our method is general and can beapplied to establishing optimal quarantine rule for any infectious disease as long as some historicalquarantine data are available. Clearly, the conception “optimal” depends on the available features.As mentioned before, if there is no available feature, then our optimal quarantine duration reducesto the 1 − (cid:15) quantile of the incubation peiod. There may be some other features that are usefulto determine the quarantine duration. For example, a test result for the pathogen can serve asan important feature even though sensitivity and speciﬁcity of the test are not that high. It is ofgreat importance to select features which are useful to determine the quarantine duration. Thismay be an interesting topic for future works. Data accessibility

Age-speciﬁc population data of each country are available from the website of United Nations:https:// population.un.org/ wpp/ Download/ Standard/ CSV/ total. Age information of 5008COVID-19 patients from diﬀerent countries is available from the website https:// github.com/beoutbreakprepared/ nCoV2019. The number of conﬁrmed cases of each country reported by theCenter for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) can befound on the website https:// github.com/ CSSEGISandData/ COVID-19. All the analyses areperformed with the use of R software, version 3.6.3. All the code and data involved in this paperare deposited in Open Science Framework, dio: 10.17605/OSF.IO/5437G.16 uthors’ contributions

R.W. designed research, performed research, analyzed data and wrote the manuscript; Q.W. over-saw the project, designed research, assisted with conceptualization, edited the manuscript.

Competing interests

The authors declare no conﬂict of interest.

Funding

This research was supported by the National Natural Science Foundation of China (General pro-gram 11871460 and program for Creative Research Group in China 61621003), and a grant fromthe Key Lab of Random Complex Structure and Data Science, CAS.

References

Daron Acemoglu, Victor Chernozhukov, Ivn Werning, and Michael D Whinston. Optimal targetedlockdowns in a multi-group sir model. Working Paper 27102, National Bureau of EconomicResearch, 2020.M. D. Ahmad, M. Usman, A. Khan, and Imran M. Optimal control analysis of ebola disease withcontrol strategies of quarantine and vaccination.

Infect Dis Poverty , 5(72), 2016.Fernando E Alvarez, David Argente, and Francesco Lippi. A simple planning problem for covid-19lockdown. Working Paper 26981, National Bureau of Economic Research, 2020.Horst Behncke. Optimal control of deterministic epidemics.

Optimal Control Application andMethods , 21:269–285, 2000. 17rthur Charpentier, Romuald Elie, Mathieu Lauri`ere, and Viet Chi Tran. Covid-19 pandemiccontrol: balancing detection policy and lockdown intervention under icu sustainability. medRxiv ,2020.J. Dowd, L. Andriano, D. M. Brazel, V. Rotondi, P. Blick, X. Ding, Y. Liu, and M. C. Mills.Demographic science aids in understanding the spread and fatality rates of COVID-19.

ProcNatl Acad Sci USA , 117(18):9696–9698, 2020.V. T. Farewell, A. M. Herzberg, K. W. James, L. M. Ho, and G. M. Leung. Sars incubation andquarantine times: when is an exposed individual known to be disease free?

Statist. Med. , 24:3431–3445, 2005.Neil M. Ferguson, Derek A. T. Cummings, Christophe Fraser, James C. Cajka, Philip C. Cooley,and Donald S. Burke. Strategies for mitigating an inﬂuenza pandemic.

Nature , 442:448–452,2006.B. E Hansen. Uniform convergence rates for kernel estimation with dependent data.

EconometricTheory , 24(3):726–748, 2008.C. C. Kokonendji and T. Senga Kiesse. Discrete associated kernels method and extensions.

Sta-tistical Methodology , 8(6):497–516, 2011.S. A. Lauer, K. H. Grantz, Q. Bi, F. K. Jones, Q. Zheng, H. R. Meredith, A. S. Azman, N. G. Reich,and J. Lesser. the incubation period of Coronavirus Disease 2019 (COVID-19) from publiclyreported conﬁrmed cases: estimation and application.

Annals of Internal Medicine , 2020.E. L. Lehmann.

Testing Statistical Hypothesis . Springer, 2005.Marc Lipsitch, Ted Cohen, Ben Cooper, James M. Robins, Stefan Ma, Lyn James, Gowri Gopalakr-ishna, Suok Kai Chew, Chorh Chuan Tan, Matthew H. Samore, David Fisman, and MeganMurray. Transmission dynamics and control of severe acute respiratory syndrome.

Science , 300:1966–1970, 2003.R. J. A. Little and D. B. Rubin.

Statistical Analysis with Missing Data . 3 edition, 2019.Xiaohui Liu, Lei Wang, Xiansi Ma, and Jiewen Wang. Conditional quantiles estimation of theincubation period of covid-19.

Preprint , 2020.H. Nishiura. Determination of the appropriate quarantine period following smallpox expsure: Anobjective approach using the incubation period distribution.

Int. J. Hyg. Environ. Health , 212:97–104, 2009. 18. Nussbaumer-Streit, V. Mayr, A. lulia Dobrescu, A. Chapman, E. Persad, I. Klerings, G. Wagner,U. Siebert, C. Christof, C. Zachariah, and G. Gartlehner. Quarantine alone or in combinationwith other public health measures to control covid-19: a rapid review.

Cochrane Database ofSystematic Reviews 2020 , 4, 2020.P. M. Odell, K. M. Anderson, and R. B. D’Agostino. Maximum likelihood estimation for interval-censored data using a weibull-based accelerated failure time model.

Biometrika , 48:951–959,1992.Facundo Piguillem and Liyan Shi. Optimal covid-19 quarantine and testing policies.

CEPRDiscussion Paper , 2020.N. G. Reich, J. Lessler, J.K. Varma, and N.M. Quantifying the risk and cost of active monitoringfor infectious diseases.

Scientiﬁc Reports , 8:1093, 2018.A. W. van der Vaart.

Asymptotic Statistics , volume 3. Cambridge University Press, 1998.B. Xu, B. Gutierrez, S. Mekaru, K. Sewalk, L. Goodwin, A. Loskill, E.L. Cohn, Y. Hswen, S.C.Hill, M.M. Cobo, A.E. Zarebski, S. Li, C. Wu, E. Hulland, J.D. Morgan, L. Wang, K. O’Brein,S.V. Scarpino, Brownstein J.S., O.G. Pybus, D.M. Pigott, and U.G.K. Moritz. Epidemiologicaldata from the covid-19 outbreak.

Scientiﬁc Data , 7(106), 2020.X. Yan and Y. Zou. Optimal and sub-optimal quarantine and isolation control in sars epidemics.