[PDF] Evaluating the effect of city lock-down on controlling COVID-19 propagation through deep learning and network science models

Abstract

The special epistemic characteristics of the COVID-19, such as the long incubation period and the infection through asymptomatic cases, put severe challenge to the containment of its outbreak. By the end of March 2020, China has successfully controlled the within-spreading of COVID-19 at a high cost of locking down most of its major cities, including the epicenter, Wuhan. Since the low accuracy of outbreak data before the mid of Feb. 2020 forms a major technical concern on those studies based on statistic inference from the early outbreak. We apply the supervised learning techniques to identify and train NP-Net-SIR model which turns out robust under poor data quality condition. By the trained model parameters, we analyze the connection between population flow and the cross-regional infection connection strength, based on which a set of counterfactual analysis is carried out to study the necessity of lock-down and substitutability between lock-down and the other containment measures. Our findings support the existence of non-lock-down-typed measures that can reach the same containment consequence as the lock-down, and provide useful guideline for the design of a more flexible containment strategy.

Full PDF

EEvaluating the eﬀect of city lock-down on controlling COVID-19propagation through deep learning and network science models

Xiaoqi Zhang , Zheng Ji , Yanqiao Zheng , Xinyue Ye , Dong Li Abstract

The special epistemic characteristics of the COVID-19, such as the long incubation periodand the infection through asymptomatic cases, put severe challenge to the containment of itsoutbreak. By the end of March, 2020, China has successfully controlled the within- spreadingof COVID-19 at a high cost of locking down most of its major cities, including the epicenter,Wuhan. Since the low accuracy of outbreak data before the mid of Feb. 2020 forms a majortechnical concern on those studies based on statistic inference from the early outbreak. Weapply the supervised learning techniques to identify and train NP-Net-SIR model whichturns out robust under poor data quality condition. By the trained model parameters, weanalyze the connection between population ﬂow and the cross-regional infection connectionstrength, based on which a set of counterfactual analysis is carried out to study the necessityof lock-down and substitutability between lock-down and the other containment measures.Our ﬁndings support the existence of non-lock-down-typed measures that can reach the samecontainment consequence as the lock-down, and provide useful guideline for the design of amore ﬂexible containment strategy.

Keywords:

COVID-19; city lock-down; counterfactual analysis; deep learning; networkscience; China

1. Introduction

The novel coronavirus COVID-19 that was ﬁrst reported in Wuhan, China at the end of2019 quickly spread. Early 2020 has witnessed many eﬀorts to contain the virus, such as thecity lock-down, quarantining the suspected infectious cases and their close-contacts, settinghealth check point at crucial traﬃc nodes. By the mid of March 2020, the cumulativeinfectious cases have stopped growth in most of the major cities of China, including theepicenter of Wuhan. Although a growing list of published papers and reports claimed that the

Email addresses: [email protected] (Xiaoqi Zhang), [email protected] (Zheng Ji), [email protected] (Yanqiao Zheng), [email protected] (Xinyue Ye), [email protected] (Dong Li) National School of Development, Southeast University, Nanjing, China School of Finance, Zhejiang University of Finance and Economics, Hangzhou, China Urban Informatics-Spatial Computing Lab & College of Computing, New Jersey Institute of Technoloy, New Jersey, U.S.A Innovation Center for Technology, Beijing Tsinghua Tongheng Urban Planning & Design Institute, China

Preprint submitted to Journal Name September 7, 2020 a r X i v : . [ q - b i o . P E ] S e p uccessful containment of COVID-19 in China was due to the national-wide travel ban andlock-down(Li et al., 2020; Qiu et al., 2020; Tian et al., 2020), these studies focus exclusivelyon the aggregated number. The micro-mechanism how lock-down stops outbreak has rarelybeen analyzed based on real data. On the other hand, many countries, such as Italy, adoptedsimilar lock-down policy, but failed to contain the outbreak of COVID-19 as China did. TheSouth Korea and Japan didn’t close up their major cities nor impose severe travel restrictionto those uninfected people(Iwasaki and Grubaugh, 2020; Park et al., 2020; Shaw et al., 2020),but they both reported a low growth rate of infections within a relatively short time. Hence,it cannot be conﬁrmed that the containment is achieved by lock-down unless the eﬀect ofother confounding non-lock-down measures, such as the conditional quarantine and socialdistancing, can be separated.To this end, a thorough investigation on the necessity of lock-down and its functioningmechanisms is needed. At the same time, due to the severe social-economic cost of lock-down (Atkeson, 2020; Barro et al., 2020; Bootsma and Ferguson, 2007; Halder et al., 2011;Jorda et al., 2020; Kelso et al., 2020), it is neither feasible for the other countries thatare still struggling in the outbreak of COVID-19. Nor is an option for China given therisk of experiencing a second-wave outbreak. Therefore, alternative measures to the lock-down are recalled. A deep mechanism analysis of the lock-down can shed light on thesearching of alternative containment measures and understanding their eﬀectiveness, it ishighly demanding at such a special moment when both US and China have experienced asecond-wave outbreak of COVID-19 in recent weeks but neither of the top 2 economies canaﬀord another round of lock-down.In this paper, we attempt to explore the mechanism issue mentioned above. Our analysisis based on a novel network-based SIR (Susceptible-Infection-Recovery) model framework(Li et al., 2020; Qiu et al., 2020) in which a time-lagged latent random infection mechanismis added to capture the epidemic characteristics of undocumented infectious and long incu-bation period, which cannot be handled in the classical SIR models (Heymann and Shindo,2020; Keeling and Rohani, 2011; Li et al., 2020; Mizumoto et al., 2020; Qiu et al., 2020;Wang et al., b; Zou et al., 2020). Unlike the model in Li et al. (2020); Qiu et al. (2020), tocapture hidden infection channels that are not directly linked to inter-regional populationﬂow, such as the infection through panic-induced gathering(Fang et al., 2020; Garﬁn et al.,2020; Wang et al., 2020), we won’t adopt the prior assumption that the infection propagationcross regions can only be via the inter-regional population ﬂow. Instead, a non-parametricnetwork-based SIR model is applied, in which we do not impose any prior knowledge on thelink weights across regions and let it be fully inferred from the COVID-19 outbreak data viadeep learning methods. Using the inferred network, the connections between the infectionlink weights and the population ﬂows are established through standard regression techniqueand tested for signiﬁcance, by which, a counterfactual evaluation for the real eﬀect of lock-down is carried out. Diﬀerent from the counterfactual analysis done in existing literature(Chinazzi et al., 2020; Li et al., 2020; Qiu et al., 2020; Tian et al., 2020) that attempt tojustify the travel ban and the city lock-down measures as a suﬃcient condition for China’sachievements in ﬁghting COVID-19, we evaluate the necessity of these measures in the sense2hether or not there exists much more moderate prevention measures that are as eﬀective asthe travel ban and city lock-down in terms of containing the outbreak of COVID-19, whilehave less negative impact on the social-economic development. We give positive evidencefor the existence of such alternative measures, and also discuss the substitutability betweenlock-down and the other containment measures. We highlight that the substitutability canhelp quantitatively design the combination of containment measures that reach the balancebetween containing COVID-19 and the social-economic cost.

2. Methodology

The NP-Net-SIR model is set up as the following: d n ( t ) dt = (cid:90) tt − incub p I ( τ, t ) W ( t ) · n ( τ ) dτ − r ( t ) n ( t ) m ( t ) = (cid:90) tt − incub p B ( τ, t ) n ( τ ) dτ (1)where n ( t ) = ( n ( t ) , . . . , n k ( t )) (cid:62) is the vector of the cumulative number of infectious cases of k regions by time t , both the documented and undocumented cases are included within n ( t ), m ( t ) = ( m ( t ) , . . . , m k ( t )) (cid:62) denotes the documented number of infectious case by time t , r ( t ) is the time dependent recovery rate. To reﬂect the epidemic characteristics of COVID-19 that the incubation period (14 days) is very long (Heymann and Shindo, 2020; Li et al.,2020; Mizumoto et al., 2020; Qiu et al., 2020; Wang et al., b; Zou et al., 2020) and theasymptomatic infectious cases can proceed the transmission, we add two time-dependentprobability p B ( · , t ) and p I ( · , t ), they capture the time-lagged randomness within the twoprocesses that the hidden infectious cases get discovered ( p B ) and that the hidden infectiouscases transmit the virus to healthy people ( p I ). Without loss of generality, we let r , p B and p I depend on time continuously so as to capture the impact of time and the governmentprevention measures.To formulate the spatial interactions of COVID-19 outbreak, we set a family weightednetwork adjacency matrices { W ( t ) } with W ij ( t ) ∈ [0 ,

1] for all ij entries and all t . Theadjacency matrix W ( t ) captures the cross-regional link weight of COVID-19 outbreak andcan be interpreted as the proportion of the past cumulative infectious cases in region j thatcontribute to the newly infected cases in region i . In previous studies (Li et al., 2020; Qiuet al., 2020), the adjacency matrices are directly identiﬁed as a constant multiple of thepopulation ﬂow matrix cross regions. This assumption is not suﬃcient to capture outbreakchannels other than the point-to-point travel, such as the infection by panic-driven gathering,multi-destination travelling and the like (Cohen and Kupferschmidt, 2020; Ferguson et al.,2020; Harris, 2020; Pueyo, 2020; Wang et al., 2020). To account for these hidden channels,we take the non-parametric speciﬁcation of W rather than impose prior knowledge. Wealso let W continuously depends on time t , accounting for the eﬀect of time and variousprevention measures. Model (1) is trained by deep learning technique, and the details arepresented in Appendix A. 3 .2. Counterfactual evaluation on the eﬀects of travel ban and lock-down The eﬀects of travel ban and city lock-down on containing the outbreak of COVID-19can be evaluated based on the population ﬂow data from Baidu Migration Index (avail-able through the url “https://qianxi.baidu.com/”) and the trained NP-Net-SIR. The tem-poral network adjacency matrix W ( t )s sketches the cross-regional outbreak link strength ofCOVID-19 and its variation trend over time. The variation of W ( t ) is by and large the con-sequence of the travel restriction policies, but as we comment in the introduction, it cannotexclude the impact of panic and the other type of unaware driving force. To single out theimpact of travel ban and city lock-down, we apply the following regression analysis: W kj ( t i ) = α + β T kj ( t i ) + ε kji (2)where the temporal matrix T ( t i ) is the weighted average of the singe-day population ﬂowmatrices T ( t i )s (extracted from Baidu migration index) by the infection probability p I ( · , t i ) T ( t i ) = incub − (cid:88) j =0 p I ( j, t i ) T ( t i − incub + j ) . (3)Since Wuhan City was locked down, a bucket of containment measures had been appliedby both Wuhan and the other major cities in China, such as the social distancing, conditionalquarantine and setting health checkpoint in major transportation facilities etc., all of whichcould aﬀect the link weights and contribute to contain the outbreak of COVID-19. Todiﬀerentiate the eﬀect of lock-down from the other measures, we ﬁt the equation (2) onlyusing the data with Jan. 23, 2020 when Wuhan start to lock down. Given the estimatescoeﬃcient ˆ α , ˆ β , the estimate to the residuals ˆ ε kji for the t i s after Jan. 23, 2020 are calculatedand interpreted as the part of infection link weights unexplainable by population ﬂows,accounting for the eﬀect of the non-lock-down measures. Fix ˆ α , ˆ β and ˆ ε kji s, (2) will be usedto evaluate the impact of counter-factually increasing the population ﬂow intensity betweenregion pairs.Unlike the existing studies (Anderson et al., 2019; Chinazzi et al., 2020; Fang et al.,2020; Li et al., 2020; Qiu et al., 2020; Tian et al., 2020; Zhang et al., 2020) that focusalmost exclusively on the suﬃciency question, i.e. whether lock-down really help mitigatethe outbreak of COVID-19, this study attempts to answer the inverse problem. That is thenecessity of lock-down, i.e. whether or not there exists an alternative prevention strategythat causes less damage to the social-economic development while performs as eﬀective asthe travel ban and lock-down in containing the outbreak of COVID-2019. Since the existenceof such alternatives might be timing-sensitive, we consider diﬀerent initialization time t s forthe counterfactual worlds in which the travel ban and city lock-down are relaxed. The degreeof relaxation is measured by a proportion r jk for each pair of destinations j and k such thatafter the the relaxation, the traﬃc ﬂow intensity is increased to T rjk ( t i ) = (1 + r jk ) × T jk ( t i )for t i ≥ t s . Given the relaxed population/traﬃc ﬂow matrix T r ( t i )s, we can update theadjacency matrix W ( t i ) to W r ( t i ) via (2). Denote r as the matrix consisting of all r jk s,then we evaluate the necessity of travel ban by asking whether there exist a positive r W r ( t i )s by r , the outbreak status of COVID-19 are no worse than thecurrent for every t i ≥ t s . The comparison of outbreak status between the real case and thecounterfactual case can be measured in various diﬀerent ways. In this study, we focus onthree measures that are summarized through the following three constraints: R ( W r ( t i ) , r ( t i )) ≤ max( R ( W ( t i ) , r ( t i )) , , ∀ t i ≥ t s (4) m r ( t i ) ≤ m ( t i ) , ∀ t i ≥ t s (5) D r ( t i ) ≤ D ( t i ) , ∀ t i ≥ t s (6)where R is the basic reproduction number which depends on the maximal eigenvalue ofadjacency matrix and recovery rate. m r ( t i ) denotes the estimated documented infectiouscase by the updated W r matrix through (1). D ( t i ) ( D r ( t i )) is the total death cases (updatedby r ) by time t i which depends on both the total number of infections and the local healthcareresources, the detailed calculation of D ( t i ) is presented in Appendix C.These three constraints refer to three diﬀerent goals of prevention, which require thatafter relaxation, the total number of infection and death shouldn’t be greater than theircurrent value, R r shouldn’t induce infection divergence (greater than 1) or at least shouldn’tbe greater than its current value. The “no greater than” relation in (4)-(6) is in the point-wisesense, i.e. it has to hold for all region and all time after t s , therefore, it is a very stringentrestriction on the relaxation. Formally, any non-trivial relaxation r matrix satisfying theconstraints corresponds to a Pareto improvement of the current prevention strategy. In ourcounterfactual analysis, we shall search for each constraint type the Pareto optimal relaxationstrategy r ∗ from which no further Pareto improvement is allowed. This Pareto optimal r ∗ haspractical signiﬁcance in guiding the containment measure design for those countries suﬀeringfrom COVID-19 now. Except for city lock-down, many other non-lock-down measures have also been utilizedto prevent COVID-19 outbreak, such as the “social distancing” (Pike and Saini, 2020; Zhanget al., 2020). All these measures can contain the outbreak of COVID-19, while comparedto travel-ban and city lock-down, they generate less negative impact on the social-economicdevelopment, meanwhile their application is more accurately targeted rather than applies forall people regardless their healthiness and vulnerability to COVID-19. It can be reasonablyexpected that the execution of these non-lock-down measures can by and large substitutethe lock-downs and reduce the harm to the economy induced by lock-down.To quantify the substitutability, we extend (2) to include the eﬀect of non-lock-downmeasures. We roughly divide all the non-lock-down measures to two classes, which are themeasures adopted by the ﬂow-in regions and the measures by the ﬂow-out regions. Theﬂow-in measures, its eﬀect is quantiﬁed as a parameter in k ( t i , t a ), include the quarantineof arriving travellers from out-town, the close up of schools, the cancelation of gathering5ublic activities and so on; the ﬂow-out measures, its eﬀect quantiﬁed as another parameter out j ( t i , t b ), include setting health check point in the entrance of inter-regional high-way,airport, rail stations and so on. The notation k , j represent the index for regions, t a and t b represent the starting date of the relevant measures, in the other words, we let in k ( t i , t a ) = (cid:40) γ k , t i ≥ t a , elseout j ( t i , t a ) = (cid:40) θ j , t i ≥ t b , else (7)for some constant magnitude parameter γ k s and θ j s that measure the execution strength ofrelevant measures. Adding in k ( t i , t a ) and out j ( t i , t b ) into (2) yields the following regression: W kj ( t i ) = α + ( β − out j ( t i , t b )) · T kj ( t i ) − in k ( t i , t a ) + ε (cid:48) kji (8)where we suppose the ﬂow-in measures impact the infection adjacency matrix additivelywhile the ﬂow-out measures impact through a multiplier of the population ﬂow. For thestarting date of two classes of non-lock-down measures, we follow the timeline provided inTian et al. (2020) and set t a as Jan. 26, 2020 when all 31 provinces in mainland China hadalready initiated the ﬁrst-class protocol for emergent public health event which include theexecution of various quarantine measures and the close-up of major public facilities. t b isset to Jan. 30, 2020 when health check point had been set at all major high-way entrances,railway stations and airports within Mainland China.Given the estimate to parameter γ k s, θ j s and the residuals ε (cid:48) jki , the counterfactual analy-sis is done by solving the same set of Pareto optimization problem under the same constraintsas in the previous section. The only diﬀerence is that in the current setting, not only the re-laxation matrix r , but the set of non-lock-down parameter γ k s, θ j s can also be simultaneouslyadjusted.

3. Results

The NP-Net-SIR model is trained by using the province-level daily infection data col-lecting during Jan. 10 - Mar. 8, 2020 and from the oﬃcial website of the National HealthCommission (NHC) of China.Fig. 1 and 2 present respectively the ﬁtting to temporal variation trend of documentedinfectious case from Jan. 10, 2020 to Mar. 7, 2020 for the national-wide aggregation caseand province-level case for all 31 provinces in mainland China. Table 1 reports the ﬁttingaccuracy R := (cid:107) ˆ m − m (cid:107) (cid:107) m (cid:107) measuring the relative diﬀerence between the estimated ( ˆ m ) andthe real ( m ) documented infection number since Feb. 12, 2020. It is quite apparent that theﬁtting accuracy after the Feb. 12, 2020 for all situations in the two ﬁgures are extremely high( R > .

99 for the aggregation over the whole China), and the ﬁtted number is systematically6 igure 1: Aggregated documented infectious cases over all 31 provinces in Mainland China greater than the reported number before Feb. 12, 2020. This is due to that we set Feb. 12,2020 as the change point before which we do not punish the positive estimation error soas to reﬂect the potential under-estimate of the report data. The high accuracy after Feb.12, 2020 demonstrates the explanation power of our NP-Net-SIR model. As a comparison,we run the classic SEIR model with the version discussed in Li et al. (2020) on the samedata set, and calculate the R measure for both model after Feb. 12, 2020, the result showsour model performs much better by lifting R by 12%. The diﬀerence between the “over-estimated” infectious cases by our model and the reported cases before Feb. 12, 2020 can bethought of as a measure to the hidden infectious case that are not counted in the statistics.We calculate the ratio of the hidden cases and the total cases, ﬁnding that on the national-wide level, there were 79.27% of hidden cases on average that were not reported before Feb.12, 2020, this ratio is close to the one reported in Tian et al. (2020). If we look at theprovince-level data, the hidden ratios exceeds 90% for most of the provinces in mainlandChina, among which Fujian, Guizhou, Yunan, Jiangsu, Jiangxi and Shanxi provinces are thetop 6 with hidden ratios greater than 96%, while Hubei is the province with lowest hiddenratio (70%). This outstanding hidden ratio of Hubei can be attributed to the fact that Hubeiis the epicenter of the COVID-19 outbreak within China, which was attacked by COVID-19in the earliest time, and also reacted earliest in time to the virus. In contrast, all the otherprovinces suﬀered from the transmit-in cases in the early stage and therefore failed to reactin time and cause a signiﬁcant delay of updating the number.7 able 1: Estimation Accuracy by R Province R Full country 0.996Shanghai 0.979Yunnan 0.967Neimenggu 0.979Beijing 0.972Taiwan 0.978Jilin 0.977Sichuan 0.975Tianjin 0.991Ningxia 0.979Anhui 0.968Shandong 0.979Shanxi 0.979Guangdong 0.971Guangxi 0.974Xinjiang 0.993Jiangsu 0.976Jiangxi 0.977Hebei 0.976Henan 0.976Zhejiang 0.965Hainan 0.981Hubei 0.997Hunan 0.974Maco 0.897Gansu 0.976Fujian 0.964Tibet 0.907Guizhou 0.967Liaoning 0.973Chongqing 0.978Shanxi 0.968Qinghai 0.963Hong Kong 0.984Heilongjiang 0.984

Due to the close connection between cross-regional population ﬂow and the inter-regionaloutbreak of COVID-19 claimed in the literature (Li et al., 2020; Qiu et al., 2020), we presentan overview on the strength of this connection in the following Fig. 3 and 4, in whichthe correlation between the (estimated) temporal infection adjacency matrix W ( t ) and thetraﬃc ﬂow matrix T ( t ) are visualized in diﬀerent manners. The ﬁrst line subplots of Fig. 3consists of the scatter plots of all T kj ( t i )s versus W kj ( t i )s before (left) and after (right) thetime of Wuhan lock-down (Jan. 23, 2020). The second and third lines of Fig. 3 plot onlythose T kj ( t i ) and W kj ( t i )s that are end up with (the second line) or sourced from (the thirdline) Hubei province. To make the variation trend of the relation between W ( t ) and T ( t )clearly visible, we only plot the entries of W ( t ) and T ( t ) for ﬁve dates before and after Jan.23, 2020. Fig. 4 gives the temporal view of the variation trend of the total ﬂow-in (ﬁrstline) and ﬂow-out (second line) infection link weight and traﬃc ﬂow intensity since Jan. 19,8 igure 2: Documented infectious cases for 31 provinces in Mainland China W ( t ) and T ( t ), the horizontal axis corresponds tothe entries of T ( t ) vertical axis corresponds to W ( t ). For the entries of W ( t ), we rescale itﬁrst by the potential infection number, i.e. W kj ( t ) · n k ( t ) n j ( t ) before taking log transform. Byrescaling, we hope to take the eﬀect of the stock number of potential infections into account.From the ﬁrst line of Fig. 3, a counter-intuitive ﬁnding is that the correlation betweenpopulation ﬂow intensity across regions and the infection link weight is quite weak, nomatter before or after Jan. 23, 2020. Especially in Fig. 3, a great portion of scatterpoints are clustered around a straight line close and parallel to the horizontal axis, such anobservation does not support a linear correlation exists between W ( t ) and T ( t ) as imposed inLi et al. (2020); Qiu et al. (2020). The the subﬁgure 3c. and 3e. do show a signiﬁcant linearcorrelation between population ﬂow strength and the infection link weight at least before thelock-down, while the ﬂow-out population before Jan. 23, 2020 turns out more powerful inspreading the virus as in the subﬁgure 3e., the scale of the vertical axis is much greater than9hat in the subﬁgure 3c.. But on the other hand, after Jan. 23, 2020, the linear relationshipbetween W ( t ) and T ( t ) gets sharply decayed, after Feb. 10, 2020, the correlation coeﬃcientbetween entries of them cannot be diﬀerentiated from 0 no matter for either the ﬂow-in orthe ﬂow-out population. This observation contradicts to the classic assumption in Li et al.(2020); Qiu et al. (2020). In fact, by the linear correlation assumption, the city lock-downcan only control the number of people moving across regions, it has nothing to do with theproportion of infectious cases within these migrants, which should be a constant if only thelock-down and/or travel ban measures are applied. In the other words, if lock-down reallyworks to contain the outbreak, the scatters in the right panel of Fig. 3 should convergegradually to the origin along with a straight line with positive slope, rather than all scattersrotated toward the horizontal axis as shown in Fig. 3. The ﬁnding implies that the measuresthat really help contain the outbreak of COVID-19 may not be the lock-down, instead, theyshould be the other measures initiated almost simultaneously with the lock-down and theireﬀect confounded with that of lock-downs. To correctly evaluate the real eﬀect of eachtype of containment measures, we have to diﬀerentiate the confounding measures and theirimpact, which we shall leave to the discussion in section 3.4.By Fig. 4, there exists an signiﬁcant gap period around one week between the vanishingof the ﬂow-in and -out infection link weight and the decay of the corresponding populationﬂow intensity. For the time series of ﬂow-in and -out population intensity, all top 7 provincesreached their minimum before Jan. 31, 2020, while at the mean time none of them havemade the ﬂow-in and -out infection link weight decayed to somewhere close to zero until Feb.6, 2020. Such an one week gap period reﬂects the eﬀect of the long incubation period of theCOVID-19 and the fact that its infection can happen via infectious cases without symptoms.The classical SIR/SEIR models ignore this gap period and tend to over-estimate the basicreproductive number R in the early stage which would trigger the most severe containmentmeasures, such as the lock-down, if the decision is made upon that base.In sum, from the brief overlook on the numerical relationship between infection link weight W and the population ﬂow intensity T , we can summarize: 1) the positive linear correlationassumption made in many versions of the SIR/SEIR model (Eﬁmov and Ushirobira, 2020;Fang et al., 2020; Li et al., 2020; Qiu et al., 2020; Tian et al., 2020) does not hold uniformlyduring the outbreak of COVID-19, but it does hold for the population ﬂow into and out ofHubei province before the great lock-down; 2) after the lock-down initiated since Jan. 23,2020, the positive linear correlation between W and T is reduced signiﬁcantly and fastlyto zero and this reduction shouldn’t be simply attributed to the contribution of lock-down,the eﬀect of other confounding measures should be examined more carefully; 3) an one-weekgap-period exists between the decay of population ﬂow intensity and infection link weight,which should be a consequence led by the epidemic characteristics of COVID-19, the classicalSIR/SEIR model neglects this gap-period and can lead to too severe containment measures. As discussed in the previous section, the correlation between population ﬂow intensityand infection link weight is weak in most cases. This observation implies that the most10 igure 3: Relationship between infection link weight W and population ﬂow intensity T with Hubei as origin/destinationprovince severe travel ban and lock-down may not be that necessary for those regions among whichthe infection connection is weak. Therefore, there should be potential to relax the lock-downeven if the containment level of COVID-19 had to be maintained. To verify this argument,we solve the relaxation optimization problem stated in section 2.2, the result is plotted withinFig. 5, where we plot the averaged ratio of the relaxation of population ﬂow-in and ﬂow-outintensity for all provinces in China. The ratio for every province k (or j ) is calculated throughdividing the sum of all ﬂow-in/-out index (cid:80) j T kj ( t i ) ( (cid:80) k T kj ( t i )) by their optimally relaxedversion (cid:80) j T rkj ( t i ) ( (cid:80) k T rkj ( t i )), the average is taken over all t i s after the relaxation startingdate. Fig. 5 displays the relaxation degree for all the three alternative starting date, Jan.23, Feb. 02 and Feb. 10, 2020, and all the three containment targets (4)-(6).From Fig. 5, it is quite impressive that if control target is the total infectious cases, itseems impossible to relax the population ﬂows between any pair of provinces without anymore strict travel ban executed for Hubei and a couple of provinces that geographicallyconnect to Hubei. And the impossibility of relaxation hold for all the three starting points.Such a result veriﬁes the necessity of lock-down and strict travel ban executed by most ofmajor cities in China since Jan. 23, 2020. This conclusion also agrees with the discussion inLi et al. (2020); Qiu et al. (2020); Tian et al. (2020).But on the other hand, if the target is to control the temporal R that reﬂects the long-runinfection severity and/or the total death number, the global relaxation becomes feasible even11 igure 4: The temporal variation trend of the aggregated ﬂow-in/-out infection link weight and population ﬂow intensity if starting from Jan. 23, 2020. In particular to the total death number, the travel restrictionof all provinces in China can be relaxed substantially. For most of provinces in the south-eastern coast regions, the ratio of relaxation for ﬂow-out population can exceed 10%, whilethe ﬂow-in relaxation ratio exceeds 5%. In Zhejiang, Guangdong, Beijing, Shanghai andTianjin, the ﬂow-out ratio is even greater than 15% and ﬂow-in ratio is close to 10%. Asknown, these ﬁve provinces consist of the most developed area of China in economy. Asubstantial relaxation of the traﬃc connection both within them and between them and theother provinces can signiﬁcantly stimulate the overall economy growth for China.On the other hand, despite the existence of global relaxation strategy for the controltarget R , the potential for relaxation is not large. In the south-eastern coast area, mostprovinces have to maintain a strict travel ban at least in one direction (ﬂow-in or out) in orderto keep the R reasonably low (in the sense of constraint condition (4)). This observationis partially because the index R is much more sensitive, compared with the total deathnumber, to the change of entries of W , which restrict the space to relax the populationﬂow. But compared with the total infection number, R is less sensitive to the change of W because R depends merely on the greatest positive eigenvalue of W , while the infectionnumber relies on every single entry. This explains why the global relaxation is still feasiblefor controlling R but infeasible for controlling the total infections.Finally, if we come back to the target of controlling total infection, a partial relaxationstrategy does exist after all (the partial relaxation arrangement is determined by maximizing12he overall traﬃc ﬂow intensity across all provinces, i.e. maximizing (cid:80) j,k,i T rjk ( t i ), under thesame set of constraints (4)-(6), the overall traﬃc ﬂow intensity can be viewed as an measureto the active degree of the economy). It is remarkable that since Feb. 02, 2020, if thetravel ban was further strengthened for Hubei and its nearby provinces, the relaxation ratiofor ﬂow-out population becomes high for major south-eastern provinces, including Fujian,Zhejiang, Shanghai, Jiangsu, Beijing, Tianjin and Hebei, while the positive ﬂow-in relaxationratio is allowed to be positive for Guangdong. The existence of such an partial relaxationarrangement shows the existence of cross-regional substitutability of the strictness of lock-down, it also implies that a centralized decision mechanism for the choice of lock-down andtravel ban could be more eﬃcient in balancing the containment of COVID-19 outbreak andthe economy resume.In sum, by the counterfactual analysis on the relaxation of travel ban and city lock-down,we ﬁnd that global relaxation strategies do exist for both the control target of R and totaldeath number, while it does not exist for control the total infection number, this observationresults from the relative sensitivity between the control target variables and infection linkweight W . According to the degree of easiness in relaxing lock-down, controlling death iseasier than controlling R , both of which are easier than controlling infection. To control thedeath number, a substantial relaxation has already been feasible since early Feb. 2020 forthe major provinces in the south-east coast areas, relaxation for these provinces is criticalto maintain the national-wide economy development. To control total infection, although aglobal relaxation is never feasible during the period studied in this paper, a partial relaxationis still possible by which the traﬃc intensity for south-eastern provinces can be relaxedsubstantially at the cost of a more strict lock-down for Hubei and the provinces that haveclose connection with Hubei. Such a partial relaxation arrangement is better for economyrecovery, but its feasibility relies on the centralized decision on the lock-down as it does harmthe local beneﬁts via a more harsh travel ban. In this section, we study the substitutability between lock-down and alternative non-lock-down measures. A further counter-factual analysis is carried out to reveal how the extent ofpopulation ﬂow relaxation response to the strengthen of the non-lock-down measures.Fig. 6-8 sketches the substitutability between the two classes of non-lock-down measures(their eﬀect and executive strength are quantiﬁed by γ k s and θ j s respectively) and the relax-ation ratios of lock-down under three targets since three starting dates. As in the previoussection, the relaxation ratios are aggregated according to the ﬂow-in and -out direction onthe province level and averaged over all time after the corresponding starting date. Theﬁrst line subplots of Fig. 6-8 give the substitutability of the province-level γ s (horizontalaxis) versus the ﬂow-out(left)/ﬂow-in(right) relaxation ratios (vertical axis); the second linepresents the substitutability between the province-level θ s (horizontal axis) and the the ﬂow-out(left)/ﬂow-in(right) relaxation ratios (vertical axis). The colored straight lines in eachsubplot correspond to the OLS-ﬁtted line to the scatters with the same colors where thecolor is used to distinguish the three starting dates. From Fig. 6-8, it is straightforward that13 igure 5: One set of optimal relaxation solutions to city-level travel ban since Jan. 23, Feb. 2 and Feb. 10, 2020 there exists a gradually substitutable relationship between the non-lock-down measures bythe ﬂow-out region (represented by θ s) and the relaxation ratios. In addition, the substi-tutability between the θ s and the ﬂow-out relaxation ratios is stronger than between that14nd the ﬂow-in relaxation ratios, this can be explained by that the θ s is designed to capturethe eﬀect of such measures as setting health check-point in the high-way entrance, rail sta-tions and airports. The main function of these measures is to reduce the potential infectiousrisk of ﬂow-out population, therefore, they are more straightforwardly replacing the func-tion of locking down all people within the city no matter whether they are healthy or not.In contrast, their eﬀect on the ﬂow-in relaxation ratios is via an indirect way. Comparedto the substitutability of θ s, there seems not to exist the gradual substitutability betweenthe γ s and relaxation ratios. This is partially caused by the fact that the γ s represent theeﬀect of the conditional quarantine measures executed by ﬂow-in destinations and appliedto suspected infectious cases and those travellers coming from out-town. These quarantinemeasures are not directly linked with the cross-regional population-ﬂows and therefore aﬀectthe infection connection matrix W in an additive way. Compared to the multiplicative con-nection between θ s and W , the additive connection makes substitutability of γ s less direct.It is still impressive that most of the scatters in the second line subplots are clustered on theleft of a vertical boundary line ( x ≡ c for some c <

0) and a dense subset of these scattersare gathered around this boundary. In fact, this boundary phenomenon implies a much morestringent substitutability. That is, an universally bottom line exists such that the strengthof ﬂow-in non-lock-down measures cannot go below this line, otherwise it would squeeze thepotential to relax the population ﬂow intensity.Through comparison across the target types and starting dates, it is found that fordiﬀerent targets, the degree of θ ’s substitutability is increasing in the order of controllinginfection, R and death. In particular, for the target of controlling infection number, therealmost does not exist substitutability between θ s and relaxation for the starting date Jan. 23,2020 (reﬂected as the ﬂat red line in the second line plots of Fig. 7), which once again veriﬁesthe necessity of lock-down in the early stage. The order of substitutability is consistent withthe order of easiness in relaxing population ﬂow analyzed in the previous section, implyingthe relative easiness in the realization of diﬀerent targets. For diﬀerent starting date, thedegree of substitutability of the θ s and γ s is increasing for the later starting time, such asFeb. 2 and 10, 2020, which is reﬂected as (for θ s) a greater absolute slopes of the green andblue lines than that of the red line in the second line plots of all three ﬁgures, and (for γ s)that the blue lines lie above green lines that lie above the red lines in the ﬁrst line plotsof Fig. 7 and 8. The increasing substitutability along with time support the story thatthe lock-down measure is eﬀective in controlling the fast growth of infection number andthe induced burden to the local healthcare system, which makes lock-down beneﬁcial in theearly stage of the explosion of community infection when there is no enough time left forﬁguring out all unknown infectious sources and no suﬃcient medical resources to conducttreatment. The lock-down in this stage can help save time for the eﬀective reaction to thevirus in the next stage and the adoption of more subtly designed prevention measures inthe future. On the other hand, once if the explosion of community infection had been wellcontained and the total number of infectious cases were stablized, substitutability betweenlock-down and the other measures comes up, and it is proper to gradually turn lock-downto the other mild measures. Such a transition of containment measures agrees with the idea15iscussed in Harris (2020); Pueyo (2020).The next ﬁgure 9 presents the geographic distribution of relaxation ratios of ﬂow-in and-out populations for diﬀerent targets and diﬀerent starting dates. The coloring scheme is ex-actly the same as that in Fig. 5. Comparing Fig. 9 with Fig. 5, it is quite surprising that forthe starting date Feb. 2 and Feb. 10, 2020, almost all provinces (including Hubei province)in China can signiﬁcantly relax their travel ban and lock-down policies, the relaxation ratiosare almost uniformly greater than 20% for both the ﬂow-in and ﬂow-out direction, and forboth the targets of controlling total infection number and death number. For the targetof R , the optimal relaxation ratios are a bit smaller than the other two targets, and theﬂow-out population ﬂow of Hubei province cannot be relaxed even for the latest startingdate Feb. 10, 2020.For the starting date Jan. 23, 2020, the optimal relaxation ratio does not change muchcompared to the later starting date for the target of controlling the death number of R , buta huge diﬀerence exist for the containment target of infection number. If we counter-factuallystarted the relaxation since Jan. 23, 2020, there is no global relaxation arrangement withoutincreasing the infection number for some provinces and some time after Jan. 23, 2020. Thisconclusion is similar to that drawn from Fig. 5, it once again proves the robust necessity oflock-down in the early spreading stage of COVID-19.It is remarkable to highlight the diﬀerence in the absolute size of relaxation ratios betweenthe existence and non-existence of adjustment to the stringency of non-lock-down measuressince Feb. 2, 2020. In the later situation, the value of relaxation ratios is almost uniformlytwice greater that that in the former situation. This fact implies the existence of a bet-ter combination of various control measures during the China’s anti-COVID-19 movement.That is the execution of lock-down for a very short period since Jan. 23, 2020 (e.g. oneweek) in order to save time for stablizing the infection number and meanwhile preparing forthe transition to the other milder measures, such as the conditional quarantine and healthcheck-points. Then gradually relax the degree of lock-down since Feb. 02, 2020 throughsubstituting with an increasingly stringent execution of the other non-lock-down measures.Such a quick lock-down strategy, compared to the 1-month+ lock-down that was actuallycarried out in the real time line, have the least harm to the economy while can reach thesame eﬀect on mitigating the outbreak of COVID-19.

4. Discussion & Conclusion

In this study, we propose a non-parametric network-based SIR model (NP-Net-SIR) tostudy the cross-regional outbreak of COVID-19, within which the special epidemic character-istics of COVID-19, such as the long incubation period and asymptomatic infection channel,are easily encoded. The non-parametric nature of NP-Net-SIR saves it from suﬀering thepresumed liner dependence between COVID-19 outbreak and the inter-regional populationﬂow, which might lead to over-estimate of the real eﬀect of city lock-down. The low accuracyof outbreak data before the mid of Feb. 2020 imposes a major technical challenge to thosestudies based on statistic inference from the early outbreak. To resolve the data issue, we16 igure 6: Substitutability between lock-down and non-lock-down measures given R targetFigure 7: Substitutability between lock-down and non-lock-down measures under controlling the total infectious cases apply the graph-Laplacian regularization from semi-supervised learning to identify and trainNP-Net-SIR model which turns out robust under poor data quality condition.By the trained model, we analyze the connection between population ﬂow and the cross-regional infection network, based on which a set of counter-factual analysis is carried outto study the necessity of lock-down and substitutability between lock-down and the othercontainment measures. The main ﬁndings of this study include: 1) except for the very17 igure 8: Substitutability between lock-down and non-lock-down measures under controlling the total dead cases early stage of outbreak and the population ﬂow out of the epicenter Wuhan and Hubeiprovince, there does not exist strong linear connection between population ﬂow and cross-regional infection connection, indicating that the lock-down may not be the key measureto contain the COVID-19; 2) strong substitutability exists between the lock-down and non-lock-down-typed containment measures, between diﬀerent containment targets, and betweenthe lock-down of diﬀerent regions; 3) in the earliest stage (starting from Jan. 23, 2020) thelock-down of the epicenter, Hubei, is indispensable, while the indispensability is by andlarge attributed to the geographically unbalanced impact of the COVID-19 outbreak andthe cross-regional inequality in terms of the public awareness of the COVID-19, healthcareresources and the implementation of containment measures; 4) after the impact of COVID-19got equalized inter-regionally (e.g. after Feb. 2, 2020), the lock-down had already been ableto be relaxed substantially while the same containment eﬀect can be achieved; 5) when theother containment measures are implemented stringently, the relaxation degree of populationﬂow can be even enlarged.Our ﬁndings support that the lock-down may not be the optimal strategy in containingthe outbreak of COVID-19 except for the early stage, there exist alternatives that have lessnegative impact on the social-economic development. But the eﬀectiveness of the alternativemeasures requires a subtly designed prevention system which should admit the regional dif-ference and the temporal adjustment in the containment measures according to the particularsituations for diﬀerent regions and diﬀerent time periods. The discussion in this paper hascertain guiding and practical signiﬁcance for the normalization of the epidemic prevention,the resumption of production and economic activities from lock-down, and the containmentstrategy design of other countries in the same epidemic situation.Although the analysis of this paper is retrospective and based on that all the data of18 igure 9: One set of optimal relaxation solutions to lock-down since Jan. 23, Feb. 2 and Feb. 10, 2020 under the existence ofadjustable non-lock-down measures COVID-19 have been available, which is not possible for the decision time at Jan. 23 andthe early Feb., 2020, it is still meaningful to retrospect the potential optimal controlling19trategy. This is because even by now, China is still facing a high risk of the “second-wave”outbreak. The choice of both feasible and eﬀective containment measures is still a criticalbut open question, while many countries in the world currently still struggle with how toprevent the outbreak of COVID-19. Our study can provide some hints on this choice. First,the China’s experience and the strict lock-down measure turns out not only suﬃcient (Fanget al., 2020; Li et al., 2020; Prem et al., 2020; Qiu et al., 2020; Tian et al., 2020; Tuite et al.,2020) for mitigating the virus spread, but may also be the only eﬀective way to cool downthe explosion of community infection at least in the early spreading stage. But afterward, itshouldn’t be stuck in the lock-down status for long which is neither meaningful for controlthe virus nor good for the economic recovery. In contrast, a set of non-lock-down-typedalternative measures should be quickly prepared and actively executed so as to substitutethe lock-down which, as long as being strictly executed, can lead to as eﬀective control of thevirus as the lock-down can do. Meanwhile, without the collaboration of the non-lock-down-typed measures, such as the conditional quarantine, the purely lock-down may also fail tomitigate the COVID-19, as what happened in Italy, Spain, and New York, USA.

Acknowledgement:

This work was partially supported by the Ministry of Education in China Project of Hu-manities and Social Sciences under Grant No. 20YJC790176, and the Fundamental ResearchFunds for the Central Universities under Grant No. 2242020S30024.

References

Anderson, R.M., Heesterbeek, H., Klinkenberg, and Hollingsworth, T.D. Comment Howwill country-based mitigation measures inﬂuence the course of the COVID-19 epidemic?2019(20), 1-4.Atkeson, A.G. What will be the economic impact of COVID-19 in the US? Rough estimatesof disease scenarios.

Federal Reserve bank of Minneapolis , Staﬀ Report 595, 2020.Barro, R.J., Ursa, J.F. and Weng, J. The coronavirus and the great inﬂuenza pandemic:Lessons from the Spanish Flu for the coronaviruss potential eﬀects on mortality and eco-nomic activity.

National Bureau of Economic Research , No. w26866, 2020.Bootsma, M.C., and Ferguson, N.M. The eﬀect of public health measures on the 1918inﬂuenza pandemic in US cities,

Proceedings of the National Academy of Sciences , 104(18),7588-7593, 2007.Chen, S., Li, Q., Gao, S., Kang, Y. and Shi, X. Mitigating COVID-19 outbreak via hightesting capacity and strong transmission-intervention in the United States. medRxiv, 2020.Chinazzi, M., Davis, J.T., Ajelli, M., et al. The eﬀect of travel restrictions on the spread ofthe 2019 novel coronavirus (COVID-19) outbreak.

Science , 2020.20ohen, J., and Kupferschmidt, K. Strategies shift as coronavirus pandemic looms.

Science ,367, pp962-963, 2020.Eﬁmov, and Ushirobira, R. On an interval prediction of COVID-19 development based on aSEIR epidemic model. 2020.Fang H., Wang L. and Yang Y. Human Mobility Restrictions and the Spread of the NovelCoronavirus (2019-nCoV) in China.

National Bureau of Economic Research , 2020.Ferguson, N., Laydon, D., Nedjati Gilani, G., et al. Report 9: Impact of non-pharmaceuticalinterventions (NPIs) to reduce COVID19 mortality and healthcare demand. Authorswebsite, Imperial College London, 2020.Garﬁn D.R., Silver R.C. and Holman E.A. The novel coronavirus (COVID-2019) outbreak:Ampliﬁcation of public health consequences by media exposure. Health Psychology, 2020.Golstein E.G. Theory of convex programming.

American Mathematical Soc. , 2008.Halder, N., Kelso, J. K., and Milne, G. J. Cost-Eﬀective Strategies for Mitigating a FutureInﬂuenza Pandemic with H1N1 2009 Characteristics.

PLoS One

The Lancet ,395(10224), 542-545, 2020.Iwasaki A. and Grubaugh N.D. Why does Japan have so few cases of COVID19?.

EMBOMolecular Medicine , 12(5): e12481, 2020.Jorda, O., Singh, S. R. and Taylor, A.M. Longer-run economic consequences of pandemics.

National Bureau of Economic Research , No. w26934, 2020.Keeling M.J. and Rohani P. Modeling infectious diseases in humans and animals.

PrincetonUniversity Press , 2011.Kelso, J.K., Halder, N., Postma, M.J. and Milne, G.J. Economic analysis of pandemicinﬂuenza mitigation strategies for ﬁve pandemic severity categories.

BMC public health ,13(1), 2020.Li R., Pei S., Chen B., Song Y., Zhang T., Yang W. and Shaman J. Substantial undocu-mented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2).

Science , 2020.Liu Q., Wu S., Wang L. and Tan T. Predicting the next location: A recurrent model withspatial and temporal contexts. In

Thirtieth AAAI conference on artiﬁcial intelligence ,2016. 21izumoto, K., Kagaya, K., Zarebski, A., and Chowell, G. Estimating the asymptomaticproportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princesscruise ship. Yokohama, Japan, 2020.Park S., Choi G.J. and Ko H. Information technologybased tracing strategy in response toCOVID-19 in South Koreaprivacy controversies.

JAMA , 2020.Pike, W. and Saini, V. An international comparison of the second derivative of COVID-19deaths after implementation of social distancing measures. medRxiv , 2020.Prem K, Liu Y, Russell T W, et al. The eﬀect of control strategies to reduce social mixingon outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study.

The LancetPublic Health , 2020.Pueyo, T. Coronavirus: The hammer and the dance. URL https://medium.com/@tomaspueyo/coronavirus-the-hammer-and-the-dance-be9337092b56.Qiu Y., Chen X. and Shi W. Impacts of Social and Economic Factors on the Transmission ofCoronavirus Disease 2019 (COVID-19) in China.

Journal of Population Economics , 2020.Shaw R., Kim Y. and Hua J. Governance, technology and citizen behavior in pandemic:Lessons from COVID-19 in East Asia.

Progress in disaster science , 2020.Shen Z., Wang W., Fan Y., Di Z. and Lai Y. Reconstructing propagation networks withnatural diversity and identifying hidden sources.

Nature communications , 5: 4323, 2014.Tian H., Liu Y., Li Y. and others The impact of transmission control measures during theﬁrst 50 days of the COVID-19 epidemic in China.

Science , 2020.Tuite, A.R., Fisman, D. and Greer, A.L. Mathematical modelling of COVID-19 transmissionand mitigation strategies in the population of Ontario, Canada.

CMAJ , 2020.Wang, C., Liu, L., Hao, X., et al. Evolving epidemiology and impact of non-pharmaceuticalinterventions on the outbreak of coronavirus disease 2019 in Wuhan, China. medRxiv ,2020.Wang D., Hu B., Hu C., and et al. Clinical characteristics of 138 hospitalized patients with2019 novel coronavirus–infected pneumonia in Wuhan, China.

JAMA , 2020.Zhang, Y., Jiang, B., Yuan, J., and Tao, Y. The impact of social distancing and epicenterlockdown on the COVID-19 epidemic in mainland China: A data-driven SEIQR modelstudy. medRxiv , 2020.Zhou X. and Belkin M. Semi-supervised learning by higher order regularization.

Proceedingsof the Fourteenth International Conference on Artiﬁcial Intelligence and Statistics , 892–900, 2011. 22ou, L., Ruan, F., Huang, M., et al. SARS-CoV-2 viral load in upper respiratory specimensof infected patients.

New England Journal of Medicine , 382(12), 1177-1179, 2020.

Appendix A. Training of NP-Net-SIR model

The non-parametric network set-up makes our NP-Net-SIR model essentially a specialclass of recurrent neural network (RNN), namely the temporal RNN (Liu et al., 2016) withthe temporality coming from the time dependent neural network W ( t ). The total amount ofinfections n ( t ), due to its unobservable nature, corresponds to the hidden layer of the RNN,while the documented infections m ( t ) corresponds to the output layer. Due to the lack ofextra input to the NP-Net-SIR, the input layer is degenerated to 0. Given the observation ofthe sequence of documented infections M o = { M t i : i = 1 , . . . , n ; t < · · · < t n } and a properregularized loss function, the standard back-propagation method applies to estimate the setof unknown temporal parameters { W ( t i ) , n ( t i ) , r ( t i ) , p I ( j, t i ) , p B ( j, t i ) : i = 1 , . . . , n ; j =1 , . . . , incub } . Due to the discreteness of the observation time, the continuity conditionfor these temporal parameters can be converted to a graph-Laplacian regularization withthe grid graph on real line (Zhou and Belkin, 2011), which is asymptotically equivalentto require, under the high-frequent observation, these temporal parameters are continuous,diﬀerentiable and have square-integrable derivatives. In our special case, the graph-Laplacianregularization can be written in the following form: R ( W , r, p B , p I ) = (cid:107) W ( t n ) (cid:107) + n − (cid:88) i =1 (cid:107) W ( t i +1 ) − W ( t i ) (cid:107) + n − (cid:88) i =1 (cid:107) r ( t i +1 ) − r ( t i ) (cid:107) + n − (cid:88) i =1 incub (cid:88) j =1 (cid:0) (cid:107) p B ( j, t i +1 ) − p B ( j, t i ) (cid:107) + (cid:107) p I ( j, t i +1 ) − p I ( j, t i ) (cid:107) (cid:1) (A.1)where we artiﬁcially set the boundary W ( t n +1 ) ≡ W up to the supscript n implies the sparse requirement on the W ( t i )’s which is standardto avoid over-ﬁtting.For loss function, in addition to the standard square-sum error between the observed M t i s and the estimated m ( t i )s, we add an extra penalty to the error function in order to ﬁxthe data pollution issue in the early stage of COVID-19 outbreak. In particular, we deﬁnethe following indicator function: I t ∗ ( t, m, M ) = (cid:40) m − M if t ≥ t ∗ or m > M else , (A.2)the meaning of (A.2) is that there is a cut-oﬀ time point t ∗ before which the documentedinfection number tends to under-estimate the real spreading trend. Therefore, if the esti-mated number m exceeds the reported M we think the estimates reﬂect the true case anddon’t treat it as an error, while if the estimated is less than the reported, which indicates asevere under-estimate to the true case, the error is calculated as usual. After t ∗ , it is thought23hat all hidden infectious cases that should be documented and published have already beenreported, then the reported cases agree with the real trend. In this paper, we set t ∗ as thedate, Feb. 12, 2020, when Wuhan local government reported 13,000+ inventory infectiouscases that were not in record before. Then the loss function can be written as the followingform: L ( M o , m , n , W , r, p B , p I ) = n (cid:88) i =1 k (cid:88) j =1 (cid:107) ( M t i ,j − m j ( t i )) ∗ I t ∗ ( t i , m j ( t i ) , M t i ,j ) (cid:107) + R ( W , r, p B , p I ) , (A.3)where the loss depends on the hidden infection number n through the observed infectionnumber m via model (1).Note that the RNN nature of the model (1) makes the infection number n ( t ), m ( t )generated from n ( s ), m ( s ) for s < t , then by the back-propagation algorithm, the model (1)is ﬁtted in a reversed order, i.e. the parameter values of n ( s ), W ( s ), p I ( · , s ), p B ( · , s ) and r ( s ) for previous period s are essentially ﬁtted from the later observations m ( t ) with t > s .The back-ward ﬁtting direction together with the function (A.2) presents a way to utilize thelabel data m ( t ) at time t > t ∗ to generate label of infection number for those un-labeled time s with s ≤ t ∗ , such a trick of utilizing partially labeled data is standard in semi-supervisedlearning (Zhou and Belkin, 2011), we borrow it here to address the inaccurate data issue forthe early stage.To estimate the parameters, we minimize the loss function with respect to parametersand also subject to the default range restrictions that are the following:  ≤ W kl ( t i ) ≤ , ∀ k, l, ir ( t i ) ≥ , ∀ ip B ( j, t i ) , p I ( j, t i ) ≥ , ∀ j, i (cid:80) j p B ( j, t i ) = (cid:80) j p B ( j, t i ) = 1 , ∀ i (A.4)The quadratic nature of the square-sum loss function guarantees that even if the penalty(A.2) is added, the resulting loss function (A.3) is still continuously diﬀerentiable, standardgradient descending solvers are applicable. Appendix B. Training algorithm

Training model (1) is equivalent to solving the optimization problem in (A.3) underthe constraints (A.4). The classical gradient-descending-based solution for RNN usuallyassumes no constraint. Therefore, some modiﬁcations are needed. In the following, wepropose a sequential modiﬁcation to the classical backward propagation training algorithmfor neural network model. To facilitate the introduction of the sequential algorithm, wetemporally assume the temporal RNN is no longer temporal, but a static RNN, i.e. all thetemporal parameters W , p B , p I and r are no longer dependent on t . Also suppose that24he infection number m t is observed within the discrete time interval { , . . . , T } . Then, thediscrete version of model (1) under above assumptions becomes the following:∆ n t = n t +1 − n t = incub (cid:88) i =1 p I,i W · n t − i − r n t m t = incub (cid:88) i =1 p B,i · n t − i , (B.1)where for x = B, I , p x,i is a short-hand notation for p x ( i ) when the static probability p x massfunction is evaluated at the discrete time i . Given (B.1), notice that when the unknown modelprobability parameter p B , p I , the recovery rate r and network matrix W are ﬁxed, the modeldepends completely on the hidden layer n = { n t : t = − incub, − incub + 1 , . . . , , . . . , T } viavector multiplication. While p B , p I , r and n are ﬁxed, the model depends completely on W via matrix multiplication. When W and n are ﬁxed, the model depends completely on the p B , p I and r via constant multiple and vector inner product. Note that all above operationsare linear operations, meanwhile, the loss function (A.3) has quadratic form, these factsimply that ﬁxing any two classes of quantities among (a) p B , p I , r ; (b) W ; and (c) n , theoptimization problem (A.3) under constraint (A.4) is a classical convex programming prob-lem (Golstein, 2008; Shen et al., 2014), with respect to the remaining class of quantities. Asour loss function (A.3) is strictly convex, the resulting convex programming problem has theunique minimum and can be solved quickly via the classical gradient algorithm. Therefore,under static setting of model parameters, the following iterative ﬁtting algorithm can beapplied to train the parameters: Step 1:

Given s ≥

0, for ﬁxed vector p sB , p sI , constant r s and matrix W s , solve problem(A.3) under (A.4) with respect to n , resulting in n s +1 ; Step 2:

Given s ≥

0, for ﬁxed vector p sB , p sI , constant r s and time series n s , solve problem(A.3) under (A.4) with respect to W , resulting in W s +1 ; Step 3:

Given s ≥

0, for ﬁxed matrix W s and time series n s , solve problem (A.3) under(A.4) with respect to vector p B , p I and constant r , resulting in p s +1 B , p s +1 I and r s +1 ; Step 4:

Repeat

Step 1-3 until the ratio of L norms (cid:107) p s +1 B − p sB (cid:107) + (cid:107) p s +1 I − p sI (cid:107) + (cid:107) r s +1 − r s (cid:107) + (cid:107) W s +1 − W s (cid:107)(cid:107) p sB (cid:107) + (cid:107) p sI (cid:107) + (cid:107) r s (cid:107) + (cid:107) W s (cid:107) (B.2)is less than a prescribed threshold δ (=10 − ).Then, to release the static assumption, given the data M o = { M , M , . . . , M S } of theseries of observed infection vector during the period end up with day S . consider the follow-ing sequential backward propagation 25 tep 1: (Initialization) Set τ = S M τ = { M τ − T , . . . , M τ } , apply the 4-step static trainingalgorithm as above, denote the output as W τ , p τB , p τI , r τ and n τ = { n ττ − T − incub , . . . , n ττ } ; Step 2:

For T ≤ τ < S and M τ , redeﬁne the hidden vector as n = { n τ − T − incub , n τ +1 τ − T − incub +1 ,. . . , n τ +1 τ } where only the ﬁrst entry n τ − T − incub is undetermined and needs to be optimized,the remaining entries are ﬁxed via the estimated value from the previous iteration. Giventhe estimation W τ +1 , p τ +1 B , p τ +1 I , r τ +1 from the previous iteration, apply the static versionof training algorithm as above with the redeﬁned loss function as in the following equation(B.3), we get the output W τ , p τB , p τI , r τ and n τ = { n ττ − T − incub , n τ +1 τ − T − incub +1 , . . . , n ττ } . L ( M τ , m , n τ − T − incub , W , r, p B , p I , ) = τ (cid:88) i = τ − T +1 k (cid:88) j =1 (cid:107) ( M i,j − m i,j ) ∗ I t ∗ ( i, m i,j , M i,j ) (cid:107) + (cid:107) W τ +1 − W (cid:107) + (cid:107) r τ +1 − r (cid:107) + (cid:107) p τ +1 B − p B (cid:107) + (cid:107) p τ +1 I − p I (cid:107) (B.3)where the timely integrated Laplacian regularization (A.1) in loss function (A.3) is replacedwith the one-period regularization.Combining the sequence of outputs from the two-step sequential backward propagationalgorithm, we obtain the estimated sequence of adjacency matrices { W T , . . . , W S } , proba-bility parameters { p TB , . . . , p SB } , { p TI , . . . , p SI } , recovery rate { r T , . . . , r S } and the sequence ofhidden infection vector { n − incub , n − incub +1 , . . . , n , . . . , n S } . For the hidden infection vector,note that the estimate to n τ for every τ is unique according to the design of n τ in the Step2 of the sequential backward propagation.The sequential backward propagation is essentially a sequence of the standard backwardpropagation which is applied to solve the static version of our model (B.1), where the con-nection between two consecutive steps is established through the consecutive one-perioddecomposition of the Laplacian regularization condition in (B.3) and the construction thatlet n τ and n τ +1 share the common hidden infection numbers in the overlapped period. Itis not hard to verify that the sequential implementation of backward propagation generatesasymptotically equivalent result to the classical backward propagation.Also notice that the sequential training depends on an unspeciﬁed horizon parameter T , in this paper, we set T = 7 as it minimizes the aggregated loss (A.3) compared to thealternatives in the range { , . . . , } . The implementation of the algorithm is by pythonwhere the key-step minimization ( Step 1-3 ) for the static model (B.1) is implemented viathe convex programming package, CVXPY.

Appendix C. Calculation death numberD ( t i ) (similarly D r ( t i )) is calculated from the sequence { m ( t ) : t ≤ t i } through followingauto-regressive equation Dr j ( t i ) = a j + b · ∆ ˜ m j ( t i − k ) + c · Dr j ( t i − k ) + d · h j ( t i ) + ε j ( t i ) (C.1)where Dr j ( t i ) is the death rate such that D j ( t i ) = Dr j ( t i ) m j ( t i ). In (C.1) h j ( t i ) is the ratiobetween m j ( t i ) and the local healthcare resources that are measured by the total number26f hospital beds. According to preliminary analysis, the time lag k , k2