Evaluating the effect of city lock-down on controlling COVID-19 propagation through deep learning and network science models
EEvaluating the effect of city lock-down on controlling COVID-19propagation through deep learning and network science models
Xiaoqi Zhang , Zheng Ji , Yanqiao Zheng , Xinyue Ye , Dong Li Abstract
The special epistemic characteristics of the COVID-19, such as the long incubation periodand the infection through asymptomatic cases, put severe challenge to the containment of itsoutbreak. By the end of March, 2020, China has successfully controlled the within- spreadingof COVID-19 at a high cost of locking down most of its major cities, including the epicenter,Wuhan. Since the low accuracy of outbreak data before the mid of Feb. 2020 forms a majortechnical concern on those studies based on statistic inference from the early outbreak. Weapply the supervised learning techniques to identify and train NP-Net-SIR model whichturns out robust under poor data quality condition. By the trained model parameters, weanalyze the connection between population flow and the cross-regional infection connectionstrength, based on which a set of counterfactual analysis is carried out to study the necessityof lock-down and substitutability between lock-down and the other containment measures.Our findings support the existence of non-lock-down-typed measures that can reach the samecontainment consequence as the lock-down, and provide useful guideline for the design of amore flexible containment strategy.
Keywords:
COVID-19; city lock-down; counterfactual analysis; deep learning; networkscience; China
1. Introduction
The novel coronavirus COVID-19 that was first reported in Wuhan, China at the end of2019 quickly spread. Early 2020 has witnessed many efforts to contain the virus, such as thecity lock-down, quarantining the suspected infectious cases and their close-contacts, settinghealth check point at crucial traffic nodes. By the mid of March 2020, the cumulativeinfectious cases have stopped growth in most of the major cities of China, including theepicenter of Wuhan. Although a growing list of published papers and reports claimed that the
Email addresses: [email protected] (Xiaoqi Zhang), [email protected] (Zheng Ji), [email protected] (Yanqiao Zheng), [email protected] (Xinyue Ye), [email protected] (Dong Li) National School of Development, Southeast University, Nanjing, China School of Finance, Zhejiang University of Finance and Economics, Hangzhou, China Urban Informatics-Spatial Computing Lab & College of Computing, New Jersey Institute of Technoloy, New Jersey, U.S.A Innovation Center for Technology, Beijing Tsinghua Tongheng Urban Planning & Design Institute, China
Preprint submitted to Journal Name September 7, 2020 a r X i v : . [ q - b i o . P E ] S e p uccessful containment of COVID-19 in China was due to the national-wide travel ban andlock-down(Li et al., 2020; Qiu et al., 2020; Tian et al., 2020), these studies focus exclusivelyon the aggregated number. The micro-mechanism how lock-down stops outbreak has rarelybeen analyzed based on real data. On the other hand, many countries, such as Italy, adoptedsimilar lock-down policy, but failed to contain the outbreak of COVID-19 as China did. TheSouth Korea and Japan didn’t close up their major cities nor impose severe travel restrictionto those uninfected people(Iwasaki and Grubaugh, 2020; Park et al., 2020; Shaw et al., 2020),but they both reported a low growth rate of infections within a relatively short time. Hence,it cannot be confirmed that the containment is achieved by lock-down unless the effect ofother confounding non-lock-down measures, such as the conditional quarantine and socialdistancing, can be separated.To this end, a thorough investigation on the necessity of lock-down and its functioningmechanisms is needed. At the same time, due to the severe social-economic cost of lock-down (Atkeson, 2020; Barro et al., 2020; Bootsma and Ferguson, 2007; Halder et al., 2011;Jorda et al., 2020; Kelso et al., 2020), it is neither feasible for the other countries thatare still struggling in the outbreak of COVID-19. Nor is an option for China given therisk of experiencing a second-wave outbreak. Therefore, alternative measures to the lock-down are recalled. A deep mechanism analysis of the lock-down can shed light on thesearching of alternative containment measures and understanding their effectiveness, it ishighly demanding at such a special moment when both US and China have experienced asecond-wave outbreak of COVID-19 in recent weeks but neither of the top 2 economies canafford another round of lock-down.In this paper, we attempt to explore the mechanism issue mentioned above. Our analysisis based on a novel network-based SIR (Susceptible-Infection-Recovery) model framework(Li et al., 2020; Qiu et al., 2020) in which a time-lagged latent random infection mechanismis added to capture the epidemic characteristics of undocumented infectious and long incu-bation period, which cannot be handled in the classical SIR models (Heymann and Shindo,2020; Keeling and Rohani, 2011; Li et al., 2020; Mizumoto et al., 2020; Qiu et al., 2020;Wang et al., b; Zou et al., 2020). Unlike the model in Li et al. (2020); Qiu et al. (2020), tocapture hidden infection channels that are not directly linked to inter-regional populationflow, such as the infection through panic-induced gathering(Fang et al., 2020; Garfin et al.,2020; Wang et al., 2020), we won’t adopt the prior assumption that the infection propagationcross regions can only be via the inter-regional population flow. Instead, a non-parametricnetwork-based SIR model is applied, in which we do not impose any prior knowledge on thelink weights across regions and let it be fully inferred from the COVID-19 outbreak data viadeep learning methods. Using the inferred network, the connections between the infectionlink weights and the population flows are established through standard regression techniqueand tested for significance, by which, a counterfactual evaluation for the real effect of lock-down is carried out. Different from the counterfactual analysis done in existing literature(Chinazzi et al., 2020; Li et al., 2020; Qiu et al., 2020; Tian et al., 2020) that attempt tojustify the travel ban and the city lock-down measures as a sufficient condition for China’sachievements in fighting COVID-19, we evaluate the necessity of these measures in the sense2hether or not there exists much more moderate prevention measures that are as effective asthe travel ban and city lock-down in terms of containing the outbreak of COVID-19, whilehave less negative impact on the social-economic development. We give positive evidencefor the existence of such alternative measures, and also discuss the substitutability betweenlock-down and the other containment measures. We highlight that the substitutability canhelp quantitatively design the combination of containment measures that reach the balancebetween containing COVID-19 and the social-economic cost.
2. Methodology
The NP-Net-SIR model is set up as the following: d n ( t ) dt = (cid:90) tt − incub p I ( τ, t ) W ( t ) · n ( τ ) dτ − r ( t ) n ( t ) m ( t ) = (cid:90) tt − incub p B ( τ, t ) n ( τ ) dτ (1)where n ( t ) = ( n ( t ) , . . . , n k ( t )) (cid:62) is the vector of the cumulative number of infectious cases of k regions by time t , both the documented and undocumented cases are included within n ( t ), m ( t ) = ( m ( t ) , . . . , m k ( t )) (cid:62) denotes the documented number of infectious case by time t , r ( t ) is the time dependent recovery rate. To reflect the epidemic characteristics of COVID-19 that the incubation period (14 days) is very long (Heymann and Shindo, 2020; Li et al.,2020; Mizumoto et al., 2020; Qiu et al., 2020; Wang et al., b; Zou et al., 2020) and theasymptomatic infectious cases can proceed the transmission, we add two time-dependentprobability p B ( · , t ) and p I ( · , t ), they capture the time-lagged randomness within the twoprocesses that the hidden infectious cases get discovered ( p B ) and that the hidden infectiouscases transmit the virus to healthy people ( p I ). Without loss of generality, we let r , p B and p I depend on time continuously so as to capture the impact of time and the governmentprevention measures.To formulate the spatial interactions of COVID-19 outbreak, we set a family weightednetwork adjacency matrices { W ( t ) } with W ij ( t ) ∈ [0 ,
1] for all ij entries and all t . Theadjacency matrix W ( t ) captures the cross-regional link weight of COVID-19 outbreak andcan be interpreted as the proportion of the past cumulative infectious cases in region j thatcontribute to the newly infected cases in region i . In previous studies (Li et al., 2020; Qiuet al., 2020), the adjacency matrices are directly identified as a constant multiple of thepopulation flow matrix cross regions. This assumption is not sufficient to capture outbreakchannels other than the point-to-point travel, such as the infection by panic-driven gathering,multi-destination travelling and the like (Cohen and Kupferschmidt, 2020; Ferguson et al.,2020; Harris, 2020; Pueyo, 2020; Wang et al., 2020). To account for these hidden channels,we take the non-parametric specification of W rather than impose prior knowledge. Wealso let W continuously depends on time t , accounting for the effect of time and variousprevention measures. Model (1) is trained by deep learning technique, and the details arepresented in Appendix A. 3 .2. Counterfactual evaluation on the effects of travel ban and lock-down The effects of travel ban and city lock-down on containing the outbreak of COVID-19can be evaluated based on the population flow data from Baidu Migration Index (avail-able through the url “https://qianxi.baidu.com/”) and the trained NP-Net-SIR. The tem-poral network adjacency matrix W ( t )s sketches the cross-regional outbreak link strength ofCOVID-19 and its variation trend over time. The variation of W ( t ) is by and large the con-sequence of the travel restriction policies, but as we comment in the introduction, it cannotexclude the impact of panic and the other type of unaware driving force. To single out theimpact of travel ban and city lock-down, we apply the following regression analysis: W kj ( t i ) = α + β T kj ( t i ) + ε kji (2)where the temporal matrix T ( t i ) is the weighted average of the singe-day population flowmatrices T ( t i )s (extracted from Baidu migration index) by the infection probability p I ( · , t i ) T ( t i ) = incub − (cid:88) j =0 p I ( j, t i ) T ( t i − incub + j ) . (3)Since Wuhan City was locked down, a bucket of containment measures had been appliedby both Wuhan and the other major cities in China, such as the social distancing, conditionalquarantine and setting health checkpoint in major transportation facilities etc., all of whichcould affect the link weights and contribute to contain the outbreak of COVID-19. Todifferentiate the effect of lock-down from the other measures, we fit the equation (2) onlyusing the data with Jan. 23, 2020 when Wuhan start to lock down. Given the estimatescoefficient ˆ α , ˆ β , the estimate to the residuals ˆ ε kji for the t i s after Jan. 23, 2020 are calculatedand interpreted as the part of infection link weights unexplainable by population flows,accounting for the effect of the non-lock-down measures. Fix ˆ α , ˆ β and ˆ ε kji s, (2) will be usedto evaluate the impact of counter-factually increasing the population flow intensity betweenregion pairs.Unlike the existing studies (Anderson et al., 2019; Chinazzi et al., 2020; Fang et al.,2020; Li et al., 2020; Qiu et al., 2020; Tian et al., 2020; Zhang et al., 2020) that focusalmost exclusively on the sufficiency question, i.e. whether lock-down really help mitigatethe outbreak of COVID-19, this study attempts to answer the inverse problem. That is thenecessity of lock-down, i.e. whether or not there exists an alternative prevention strategythat causes less damage to the social-economic development while performs as effective asthe travel ban and lock-down in containing the outbreak of COVID-2019. Since the existenceof such alternatives might be timing-sensitive, we consider different initialization time t s forthe counterfactual worlds in which the travel ban and city lock-down are relaxed. The degreeof relaxation is measured by a proportion r jk for each pair of destinations j and k such thatafter the the relaxation, the traffic flow intensity is increased to T rjk ( t i ) = (1 + r jk ) × T jk ( t i )for t i ≥ t s . Given the relaxed population/traffic flow matrix T r ( t i )s, we can update theadjacency matrix W ( t i ) to W r ( t i ) via (2). Denote r as the matrix consisting of all r jk s,then we evaluate the necessity of travel ban by asking whether there exist a positive r W r ( t i )s by r , the outbreak status of COVID-19 are no worse than thecurrent for every t i ≥ t s . The comparison of outbreak status between the real case and thecounterfactual case can be measured in various different ways. In this study, we focus onthree measures that are summarized through the following three constraints: R ( W r ( t i ) , r ( t i )) ≤ max( R ( W ( t i ) , r ( t i )) , , ∀ t i ≥ t s (4) m r ( t i ) ≤ m ( t i ) , ∀ t i ≥ t s (5) D r ( t i ) ≤ D ( t i ) , ∀ t i ≥ t s (6)where R is the basic reproduction number which depends on the maximal eigenvalue ofadjacency matrix and recovery rate. m r ( t i ) denotes the estimated documented infectiouscase by the updated W r matrix through (1). D ( t i ) ( D r ( t i )) is the total death cases (updatedby r ) by time t i which depends on both the total number of infections and the local healthcareresources, the detailed calculation of D ( t i ) is presented in Appendix C.These three constraints refer to three different goals of prevention, which require thatafter relaxation, the total number of infection and death shouldn’t be greater than theircurrent value, R r shouldn’t induce infection divergence (greater than 1) or at least shouldn’tbe greater than its current value. The “no greater than” relation in (4)-(6) is in the point-wisesense, i.e. it has to hold for all region and all time after t s , therefore, it is a very stringentrestriction on the relaxation. Formally, any non-trivial relaxation r matrix satisfying theconstraints corresponds to a Pareto improvement of the current prevention strategy. In ourcounterfactual analysis, we shall search for each constraint type the Pareto optimal relaxationstrategy r ∗ from which no further Pareto improvement is allowed. This Pareto optimal r ∗ haspractical significance in guiding the containment measure design for those countries sufferingfrom COVID-19 now. Except for city lock-down, many other non-lock-down measures have also been utilizedto prevent COVID-19 outbreak, such as the “social distancing” (Pike and Saini, 2020; Zhanget al., 2020). All these measures can contain the outbreak of COVID-19, while comparedto travel-ban and city lock-down, they generate less negative impact on the social-economicdevelopment, meanwhile their application is more accurately targeted rather than applies forall people regardless their healthiness and vulnerability to COVID-19. It can be reasonablyexpected that the execution of these non-lock-down measures can by and large substitutethe lock-downs and reduce the harm to the economy induced by lock-down.To quantify the substitutability, we extend (2) to include the effect of non-lock-downmeasures. We roughly divide all the non-lock-down measures to two classes, which are themeasures adopted by the flow-in regions and the measures by the flow-out regions. Theflow-in measures, its effect is quantified as a parameter in k ( t i , t a ), include the quarantineof arriving travellers from out-town, the close up of schools, the cancelation of gathering5ublic activities and so on; the flow-out measures, its effect quantified as another parameter out j ( t i , t b ), include setting health check point in the entrance of inter-regional high-way,airport, rail stations and so on. The notation k , j represent the index for regions, t a and t b represent the starting date of the relevant measures, in the other words, we let in k ( t i , t a ) = (cid:40) γ k , t i ≥ t a , elseout j ( t i , t a ) = (cid:40) θ j , t i ≥ t b , else (7)for some constant magnitude parameter γ k s and θ j s that measure the execution strength ofrelevant measures. Adding in k ( t i , t a ) and out j ( t i , t b ) into (2) yields the following regression: W kj ( t i ) = α + ( β − out j ( t i , t b )) · T kj ( t i ) − in k ( t i , t a ) + ε (cid:48) kji (8)where we suppose the flow-in measures impact the infection adjacency matrix additivelywhile the flow-out measures impact through a multiplier of the population flow. For thestarting date of two classes of non-lock-down measures, we follow the timeline provided inTian et al. (2020) and set t a as Jan. 26, 2020 when all 31 provinces in mainland China hadalready initiated the first-class protocol for emergent public health event which include theexecution of various quarantine measures and the close-up of major public facilities. t b isset to Jan. 30, 2020 when health check point had been set at all major high-way entrances,railway stations and airports within Mainland China.Given the estimate to parameter γ k s, θ j s and the residuals ε (cid:48) jki , the counterfactual analy-sis is done by solving the same set of Pareto optimization problem under the same constraintsas in the previous section. The only difference is that in the current setting, not only the re-laxation matrix r , but the set of non-lock-down parameter γ k s, θ j s can also be simultaneouslyadjusted.
3. Results
The NP-Net-SIR model is trained by using the province-level daily infection data col-lecting during Jan. 10 - Mar. 8, 2020 and from the official website of the National HealthCommission (NHC) of China.Fig. 1 and 2 present respectively the fitting to temporal variation trend of documentedinfectious case from Jan. 10, 2020 to Mar. 7, 2020 for the national-wide aggregation caseand province-level case for all 31 provinces in mainland China. Table 1 reports the fittingaccuracy R := (cid:107) ˆ m − m (cid:107) (cid:107) m (cid:107) measuring the relative difference between the estimated ( ˆ m ) andthe real ( m ) documented infection number since Feb. 12, 2020. It is quite apparent that thefitting accuracy after the Feb. 12, 2020 for all situations in the two figures are extremely high( R > .
99 for the aggregation over the whole China), and the fitted number is systematically6 igure 1: Aggregated documented infectious cases over all 31 provinces in Mainland China greater than the reported number before Feb. 12, 2020. This is due to that we set Feb. 12,2020 as the change point before which we do not punish the positive estimation error soas to reflect the potential under-estimate of the report data. The high accuracy after Feb.12, 2020 demonstrates the explanation power of our NP-Net-SIR model. As a comparison,we run the classic SEIR model with the version discussed in Li et al. (2020) on the samedata set, and calculate the R measure for both model after Feb. 12, 2020, the result showsour model performs much better by lifting R by 12%. The difference between the “over-estimated” infectious cases by our model and the reported cases before Feb. 12, 2020 can bethought of as a measure to the hidden infectious case that are not counted in the statistics.We calculate the ratio of the hidden cases and the total cases, finding that on the national-wide level, there were 79.27% of hidden cases on average that were not reported before Feb.12, 2020, this ratio is close to the one reported in Tian et al. (2020). If we look at theprovince-level data, the hidden ratios exceeds 90% for most of the provinces in mainlandChina, among which Fujian, Guizhou, Yunan, Jiangsu, Jiangxi and Shanxi provinces are thetop 6 with hidden ratios greater than 96%, while Hubei is the province with lowest hiddenratio (70%). This outstanding hidden ratio of Hubei can be attributed to the fact that Hubeiis the epicenter of the COVID-19 outbreak within China, which was attacked by COVID-19in the earliest time, and also reacted earliest in time to the virus. In contrast, all the otherprovinces suffered from the transmit-in cases in the early stage and therefore failed to reactin time and cause a significant delay of updating the number.7 able 1: Estimation Accuracy by R Province R Full country 0.996Shanghai 0.979Yunnan 0.967Neimenggu 0.979Beijing 0.972Taiwan 0.978Jilin 0.977Sichuan 0.975Tianjin 0.991Ningxia 0.979Anhui 0.968Shandong 0.979Shanxi 0.979Guangdong 0.971Guangxi 0.974Xinjiang 0.993Jiangsu 0.976Jiangxi 0.977Hebei 0.976Henan 0.976Zhejiang 0.965Hainan 0.981Hubei 0.997Hunan 0.974Maco 0.897Gansu 0.976Fujian 0.964Tibet 0.907Guizhou 0.967Liaoning 0.973Chongqing 0.978Shanxi 0.968Qinghai 0.963Hong Kong 0.984Heilongjiang 0.984
Due to the close connection between cross-regional population flow and the inter-regionaloutbreak of COVID-19 claimed in the literature (Li et al., 2020; Qiu et al., 2020), we presentan overview on the strength of this connection in the following Fig. 3 and 4, in whichthe correlation between the (estimated) temporal infection adjacency matrix W ( t ) and thetraffic flow matrix T ( t ) are visualized in different manners. The first line subplots of Fig. 3consists of the scatter plots of all T kj ( t i )s versus W kj ( t i )s before (left) and after (right) thetime of Wuhan lock-down (Jan. 23, 2020). The second and third lines of Fig. 3 plot onlythose T kj ( t i ) and W kj ( t i )s that are end up with (the second line) or sourced from (the thirdline) Hubei province. To make the variation trend of the relation between W ( t ) and T ( t )clearly visible, we only plot the entries of W ( t ) and T ( t ) for five dates before and after Jan.23, 2020. Fig. 4 gives the temporal view of the variation trend of the total flow-in (firstline) and flow-out (second line) infection link weight and traffic flow intensity since Jan. 19,8 igure 2: Documented infectious cases for 31 provinces in Mainland China W ( t ) and T ( t ), the horizontal axis corresponds tothe entries of T ( t ) vertical axis corresponds to W ( t ). For the entries of W ( t ), we rescale itfirst by the potential infection number, i.e. W kj ( t ) · n k ( t ) n j ( t ) before taking log transform. Byrescaling, we hope to take the effect of the stock number of potential infections into account.From the first line of Fig. 3, a counter-intuitive finding is that the correlation betweenpopulation flow intensity across regions and the infection link weight is quite weak, nomatter before or after Jan. 23, 2020. Especially in Fig. 3, a great portion of scatterpoints are clustered around a straight line close and parallel to the horizontal axis, such anobservation does not support a linear correlation exists between W ( t ) and T ( t ) as imposed inLi et al. (2020); Qiu et al. (2020). The the subfigure 3c. and 3e. do show a significant linearcorrelation between population flow strength and the infection link weight at least before thelock-down, while the flow-out population before Jan. 23, 2020 turns out more powerful inspreading the virus as in the subfigure 3e., the scale of the vertical axis is much greater than9hat in the subfigure 3c.. But on the other hand, after Jan. 23, 2020, the linear relationshipbetween W ( t ) and T ( t ) gets sharply decayed, after Feb. 10, 2020, the correlation coefficientbetween entries of them cannot be differentiated from 0 no matter for either the flow-in orthe flow-out population. This observation contradicts to the classic assumption in Li et al.(2020); Qiu et al. (2020). In fact, by the linear correlation assumption, the city lock-downcan only control the number of people moving across regions, it has nothing to do with theproportion of infectious cases within these migrants, which should be a constant if only thelock-down and/or travel ban measures are applied. In the other words, if lock-down reallyworks to contain the outbreak, the scatters in the right panel of Fig. 3 should convergegradually to the origin along with a straight line with positive slope, rather than all scattersrotated toward the horizontal axis as shown in Fig. 3. The finding implies that the measuresthat really help contain the outbreak of COVID-19 may not be the lock-down, instead, theyshould be the other measures initiated almost simultaneously with the lock-down and theireffect confounded with that of lock-downs. To correctly evaluate the real effect of eachtype of containment measures, we have to differentiate the confounding measures and theirimpact, which we shall leave to the discussion in section 3.4.By Fig. 4, there exists an significant gap period around one week between the vanishingof the flow-in and -out infection link weight and the decay of the corresponding populationflow intensity. For the time series of flow-in and -out population intensity, all top 7 provincesreached their minimum before Jan. 31, 2020, while at the mean time none of them havemade the flow-in and -out infection link weight decayed to somewhere close to zero until Feb.6, 2020. Such an one week gap period reflects the effect of the long incubation period of theCOVID-19 and the fact that its infection can happen via infectious cases without symptoms.The classical SIR/SEIR models ignore this gap period and tend to over-estimate the basicreproductive number R in the early stage which would trigger the most severe containmentmeasures, such as the lock-down, if the decision is made upon that base.In sum, from the brief overlook on the numerical relationship between infection link weight W and the population flow intensity T , we can summarize: 1) the positive linear correlationassumption made in many versions of the SIR/SEIR model (Efimov and Ushirobira, 2020;Fang et al., 2020; Li et al., 2020; Qiu et al., 2020; Tian et al., 2020) does not hold uniformlyduring the outbreak of COVID-19, but it does hold for the population flow into and out ofHubei province before the great lock-down; 2) after the lock-down initiated since Jan. 23,2020, the positive linear correlation between W and T is reduced significantly and fastlyto zero and this reduction shouldn’t be simply attributed to the contribution of lock-down,the effect of other confounding measures should be examined more carefully; 3) an one-weekgap-period exists between the decay of population flow intensity and infection link weight,which should be a consequence led by the epidemic characteristics of COVID-19, the classicalSIR/SEIR model neglects this gap-period and can lead to too severe containment measures. As discussed in the previous section, the correlation between population flow intensityand infection link weight is weak in most cases. This observation implies that the most10 igure 3: Relationship between infection link weight W and population flow intensity T with Hubei as origin/destinationprovince severe travel ban and lock-down may not be that necessary for those regions among whichthe infection connection is weak. Therefore, there should be potential to relax the lock-downeven if the containment level of COVID-19 had to be maintained. To verify this argument,we solve the relaxation optimization problem stated in section 2.2, the result is plotted withinFig. 5, where we plot the averaged ratio of the relaxation of population flow-in and flow-outintensity for all provinces in China. The ratio for every province k (or j ) is calculated throughdividing the sum of all flow-in/-out index (cid:80) j T kj ( t i ) ( (cid:80) k T kj ( t i )) by their optimally relaxedversion (cid:80) j T rkj ( t i ) ( (cid:80) k T rkj ( t i )), the average is taken over all t i s after the relaxation startingdate. Fig. 5 displays the relaxation degree for all the three alternative starting date, Jan.23, Feb. 02 and Feb. 10, 2020, and all the three containment targets (4)-(6).From Fig. 5, it is quite impressive that if control target is the total infectious cases, itseems impossible to relax the population flows between any pair of provinces without anymore strict travel ban executed for Hubei and a couple of provinces that geographicallyconnect to Hubei. And the impossibility of relaxation hold for all the three starting points.Such a result verifies the necessity of lock-down and strict travel ban executed by most ofmajor cities in China since Jan. 23, 2020. This conclusion also agrees with the discussion inLi et al. (2020); Qiu et al. (2020); Tian et al. (2020).But on the other hand, if the target is to control the temporal R that reflects the long-runinfection severity and/or the total death number, the global relaxation becomes feasible even11 igure 4: The temporal variation trend of the aggregated flow-in/-out infection link weight and population flow intensity if starting from Jan. 23, 2020. In particular to the total death number, the travel restrictionof all provinces in China can be relaxed substantially. For most of provinces in the south-eastern coast regions, the ratio of relaxation for flow-out population can exceed 10%, whilethe flow-in relaxation ratio exceeds 5%. In Zhejiang, Guangdong, Beijing, Shanghai andTianjin, the flow-out ratio is even greater than 15% and flow-in ratio is close to 10%. Asknown, these five provinces consist of the most developed area of China in economy. Asubstantial relaxation of the traffic connection both within them and between them and theother provinces can significantly stimulate the overall economy growth for China.On the other hand, despite the existence of global relaxation strategy for the controltarget R , the potential for relaxation is not large. In the south-eastern coast area, mostprovinces have to maintain a strict travel ban at least in one direction (flow-in or out) in orderto keep the R reasonably low (in the sense of constraint condition (4)). This observationis partially because the index R is much more sensitive, compared with the total deathnumber, to the change of entries of W , which restrict the space to relax the populationflow. But compared with the total infection number, R is less sensitive to the change of W because R depends merely on the greatest positive eigenvalue of W , while the infectionnumber relies on every single entry. This explains why the global relaxation is still feasiblefor controlling R but infeasible for controlling the total infections.Finally, if we come back to the target of controlling total infection, a partial relaxationstrategy does exist after all (the partial relaxation arrangement is determined by maximizing12he overall traffic flow intensity across all provinces, i.e. maximizing (cid:80) j,k,i T rjk ( t i ), under thesame set of constraints (4)-(6), the overall traffic flow intensity can be viewed as an measureto the active degree of the economy). It is remarkable that since Feb. 02, 2020, if thetravel ban was further strengthened for Hubei and its nearby provinces, the relaxation ratiofor flow-out population becomes high for major south-eastern provinces, including Fujian,Zhejiang, Shanghai, Jiangsu, Beijing, Tianjin and Hebei, while the positive flow-in relaxationratio is allowed to be positive for Guangdong. The existence of such an partial relaxationarrangement shows the existence of cross-regional substitutability of the strictness of lock-down, it also implies that a centralized decision mechanism for the choice of lock-down andtravel ban could be more efficient in balancing the containment of COVID-19 outbreak andthe economy resume.In sum, by the counterfactual analysis on the relaxation of travel ban and city lock-down,we find that global relaxation strategies do exist for both the control target of R and totaldeath number, while it does not exist for control the total infection number, this observationresults from the relative sensitivity between the control target variables and infection linkweight W . According to the degree of easiness in relaxing lock-down, controlling death iseasier than controlling R , both of which are easier than controlling infection. To control thedeath number, a substantial relaxation has already been feasible since early Feb. 2020 forthe major provinces in the south-east coast areas, relaxation for these provinces is criticalto maintain the national-wide economy development. To control total infection, although aglobal relaxation is never feasible during the period studied in this paper, a partial relaxationis still possible by which the traffic intensity for south-eastern provinces can be relaxedsubstantially at the cost of a more strict lock-down for Hubei and the provinces that haveclose connection with Hubei. Such a partial relaxation arrangement is better for economyrecovery, but its feasibility relies on the centralized decision on the lock-down as it does harmthe local benefits via a more harsh travel ban. In this section, we study the substitutability between lock-down and alternative non-lock-down measures. A further counter-factual analysis is carried out to reveal how the extent ofpopulation flow relaxation response to the strengthen of the non-lock-down measures.Fig. 6-8 sketches the substitutability between the two classes of non-lock-down measures(their effect and executive strength are quantified by γ k s and θ j s respectively) and the relax-ation ratios of lock-down under three targets since three starting dates. As in the previoussection, the relaxation ratios are aggregated according to the flow-in and -out direction onthe province level and averaged over all time after the corresponding starting date. Thefirst line subplots of Fig. 6-8 give the substitutability of the province-level γ s (horizontalaxis) versus the flow-out(left)/flow-in(right) relaxation ratios (vertical axis); the second linepresents the substitutability between the province-level θ s (horizontal axis) and the the flow-out(left)/flow-in(right) relaxation ratios (vertical axis). The colored straight lines in eachsubplot correspond to the OLS-fitted line to the scatters with the same colors where thecolor is used to distinguish the three starting dates. From Fig. 6-8, it is straightforward that13 igure 5: One set of optimal relaxation solutions to city-level travel ban since Jan. 23, Feb. 2 and Feb. 10, 2020 there exists a gradually substitutable relationship between the non-lock-down measures bythe flow-out region (represented by θ s) and the relaxation ratios. In addition, the substi-tutability between the θ s and the flow-out relaxation ratios is stronger than between that14nd the flow-in relaxation ratios, this can be explained by that the θ s is designed to capturethe effect of such measures as setting health check-point in the high-way entrance, rail sta-tions and airports. The main function of these measures is to reduce the potential infectiousrisk of flow-out population, therefore, they are more straightforwardly replacing the func-tion of locking down all people within the city no matter whether they are healthy or not.In contrast, their effect on the flow-in relaxation ratios is via an indirect way. Comparedto the substitutability of θ s, there seems not to exist the gradual substitutability betweenthe γ s and relaxation ratios. This is partially caused by the fact that the γ s represent theeffect of the conditional quarantine measures executed by flow-in destinations and appliedto suspected infectious cases and those travellers coming from out-town. These quarantinemeasures are not directly linked with the cross-regional population-flows and therefore affectthe infection connection matrix W in an additive way. Compared to the multiplicative con-nection between θ s and W , the additive connection makes substitutability of γ s less direct.It is still impressive that most of the scatters in the second line subplots are clustered on theleft of a vertical boundary line ( x ≡ c for some c <
0) and a dense subset of these scattersare gathered around this boundary. In fact, this boundary phenomenon implies a much morestringent substitutability. That is, an universally bottom line exists such that the strengthof flow-in non-lock-down measures cannot go below this line, otherwise it would squeeze thepotential to relax the population flow intensity.Through comparison across the target types and starting dates, it is found that fordifferent targets, the degree of θ ’s substitutability is increasing in the order of controllinginfection, R and death. In particular, for the target of controlling infection number, therealmost does not exist substitutability between θ s and relaxation for the starting date Jan. 23,2020 (reflected as the flat red line in the second line plots of Fig. 7), which once again verifiesthe necessity of lock-down in the early stage. The order of substitutability is consistent withthe order of easiness in relaxing population flow analyzed in the previous section, implyingthe relative easiness in the realization of different targets. For different starting date, thedegree of substitutability of the θ s and γ s is increasing for the later starting time, such asFeb. 2 and 10, 2020, which is reflected as (for θ s) a greater absolute slopes of the green andblue lines than that of the red line in the second line plots of all three figures, and (for γ s)that the blue lines lie above green lines that lie above the red lines in the first line plotsof Fig. 7 and 8. The increasing substitutability along with time support the story thatthe lock-down measure is effective in controlling the fast growth of infection number andthe induced burden to the local healthcare system, which makes lock-down beneficial in theearly stage of the explosion of community infection when there is no enough time left forfiguring out all unknown infectious sources and no sufficient medical resources to conducttreatment. The lock-down in this stage can help save time for the effective reaction to thevirus in the next stage and the adoption of more subtly designed prevention measures inthe future. On the other hand, once if the explosion of community infection had been wellcontained and the total number of infectious cases were stablized, substitutability betweenlock-down and the other measures comes up, and it is proper to gradually turn lock-downto the other mild measures. Such a transition of containment measures agrees with the idea15iscussed in Harris (2020); Pueyo (2020).The next figure 9 presents the geographic distribution of relaxation ratios of flow-in and-out populations for different targets and different starting dates. The coloring scheme is ex-actly the same as that in Fig. 5. Comparing Fig. 9 with Fig. 5, it is quite surprising that forthe starting date Feb. 2 and Feb. 10, 2020, almost all provinces (including Hubei province)in China can significantly relax their travel ban and lock-down policies, the relaxation ratiosare almost uniformly greater than 20% for both the flow-in and flow-out direction, and forboth the targets of controlling total infection number and death number. For the targetof R , the optimal relaxation ratios are a bit smaller than the other two targets, and theflow-out population flow of Hubei province cannot be relaxed even for the latest startingdate Feb. 10, 2020.For the starting date Jan. 23, 2020, the optimal relaxation ratio does not change muchcompared to the later starting date for the target of controlling the death number of R , buta huge difference exist for the containment target of infection number. If we counter-factuallystarted the relaxation since Jan. 23, 2020, there is no global relaxation arrangement withoutincreasing the infection number for some provinces and some time after Jan. 23, 2020. Thisconclusion is similar to that drawn from Fig. 5, it once again proves the robust necessity oflock-down in the early spreading stage of COVID-19.It is remarkable to highlight the difference in the absolute size of relaxation ratios betweenthe existence and non-existence of adjustment to the stringency of non-lock-down measuressince Feb. 2, 2020. In the later situation, the value of relaxation ratios is almost uniformlytwice greater that that in the former situation. This fact implies the existence of a bet-ter combination of various control measures during the China’s anti-COVID-19 movement.That is the execution of lock-down for a very short period since Jan. 23, 2020 (e.g. oneweek) in order to save time for stablizing the infection number and meanwhile preparing forthe transition to the other milder measures, such as the conditional quarantine and healthcheck-points. Then gradually relax the degree of lock-down since Feb. 02, 2020 throughsubstituting with an increasingly stringent execution of the other non-lock-down measures.Such a quick lock-down strategy, compared to the 1-month+ lock-down that was actuallycarried out in the real time line, have the least harm to the economy while can reach thesame effect on mitigating the outbreak of COVID-19.
4. Discussion & Conclusion
In this study, we propose a non-parametric network-based SIR model (NP-Net-SIR) tostudy the cross-regional outbreak of COVID-19, within which the special epidemic character-istics of COVID-19, such as the long incubation period and asymptomatic infection channel,are easily encoded. The non-parametric nature of NP-Net-SIR saves it from suffering thepresumed liner dependence between COVID-19 outbreak and the inter-regional populationflow, which might lead to over-estimate of the real effect of city lock-down. The low accuracyof outbreak data before the mid of Feb. 2020 imposes a major technical challenge to thosestudies based on statistic inference from the early outbreak. To resolve the data issue, we16 igure 6: Substitutability between lock-down and non-lock-down measures given R targetFigure 7: Substitutability between lock-down and non-lock-down measures under controlling the total infectious cases apply the graph-Laplacian regularization from semi-supervised learning to identify and trainNP-Net-SIR model which turns out robust under poor data quality condition.By the trained model, we analyze the connection between population flow and the cross-regional infection network, based on which a set of counter-factual analysis is carried outto study the necessity of lock-down and substitutability between lock-down and the othercontainment measures. The main findings of this study include: 1) except for the very17 igure 8: Substitutability between lock-down and non-lock-down measures under controlling the total dead cases early stage of outbreak and the population flow out of the epicenter Wuhan and Hubeiprovince, there does not exist strong linear connection between population flow and cross-regional infection connection, indicating that the lock-down may not be the key measureto contain the COVID-19; 2) strong substitutability exists between the lock-down and non-lock-down-typed containment measures, between different containment targets, and betweenthe lock-down of different regions; 3) in the earliest stage (starting from Jan. 23, 2020) thelock-down of the epicenter, Hubei, is indispensable, while the indispensability is by andlarge attributed to the geographically unbalanced impact of the COVID-19 outbreak andthe cross-regional inequality in terms of the public awareness of the COVID-19, healthcareresources and the implementation of containment measures; 4) after the impact of COVID-19got equalized inter-regionally (e.g. after Feb. 2, 2020), the lock-down had already been ableto be relaxed substantially while the same containment effect can be achieved; 5) when theother containment measures are implemented stringently, the relaxation degree of populationflow can be even enlarged.Our findings support that the lock-down may not be the optimal strategy in containingthe outbreak of COVID-19 except for the early stage, there exist alternatives that have lessnegative impact on the social-economic development. But the effectiveness of the alternativemeasures requires a subtly designed prevention system which should admit the regional dif-ference and the temporal adjustment in the containment measures according to the particularsituations for different regions and different time periods. The discussion in this paper hascertain guiding and practical significance for the normalization of the epidemic prevention,the resumption of production and economic activities from lock-down, and the containmentstrategy design of other countries in the same epidemic situation.Although the analysis of this paper is retrospective and based on that all the data of18 igure 9: One set of optimal relaxation solutions to lock-down since Jan. 23, Feb. 2 and Feb. 10, 2020 under the existence ofadjustable non-lock-down measures COVID-19 have been available, which is not possible for the decision time at Jan. 23 andthe early Feb., 2020, it is still meaningful to retrospect the potential optimal controlling19trategy. This is because even by now, China is still facing a high risk of the “second-wave”outbreak. The choice of both feasible and effective containment measures is still a criticalbut open question, while many countries in the world currently still struggle with how toprevent the outbreak of COVID-19. Our study can provide some hints on this choice. First,the China’s experience and the strict lock-down measure turns out not only sufficient (Fanget al., 2020; Li et al., 2020; Prem et al., 2020; Qiu et al., 2020; Tian et al., 2020; Tuite et al.,2020) for mitigating the virus spread, but may also be the only effective way to cool downthe explosion of community infection at least in the early spreading stage. But afterward, itshouldn’t be stuck in the lock-down status for long which is neither meaningful for controlthe virus nor good for the economic recovery. In contrast, a set of non-lock-down-typedalternative measures should be quickly prepared and actively executed so as to substitutethe lock-down which, as long as being strictly executed, can lead to as effective control of thevirus as the lock-down can do. Meanwhile, without the collaboration of the non-lock-down-typed measures, such as the conditional quarantine, the purely lock-down may also fail tomitigate the COVID-19, as what happened in Italy, Spain, and New York, USA.
Acknowledgement:
This work was partially supported by the Ministry of Education in China Project of Hu-manities and Social Sciences under Grant No. 20YJC790176, and the Fundamental ResearchFunds for the Central Universities under Grant No. 2242020S30024.
References
Anderson, R.M., Heesterbeek, H., Klinkenberg, and Hollingsworth, T.D. Comment Howwill country-based mitigation measures influence the course of the COVID-19 epidemic?2019(20), 1-4.Atkeson, A.G. What will be the economic impact of COVID-19 in the US? Rough estimatesof disease scenarios.
Federal Reserve bank of Minneapolis , Staff Report 595, 2020.Barro, R.J., Ursa, J.F. and Weng, J. The coronavirus and the great influenza pandemic:Lessons from the Spanish Flu for the coronaviruss potential effects on mortality and eco-nomic activity.
National Bureau of Economic Research , No. w26866, 2020.Bootsma, M.C., and Ferguson, N.M. The effect of public health measures on the 1918influenza pandemic in US cities,
Proceedings of the National Academy of Sciences , 104(18),7588-7593, 2007.Chen, S., Li, Q., Gao, S., Kang, Y. and Shi, X. Mitigating COVID-19 outbreak via hightesting capacity and strong transmission-intervention in the United States. medRxiv, 2020.Chinazzi, M., Davis, J.T., Ajelli, M., et al. The effect of travel restrictions on the spread ofthe 2019 novel coronavirus (COVID-19) outbreak.
Science , 2020.20ohen, J., and Kupferschmidt, K. Strategies shift as coronavirus pandemic looms.
Science ,367, pp962-963, 2020.Efimov, and Ushirobira, R. On an interval prediction of COVID-19 development based on aSEIR epidemic model. 2020.Fang H., Wang L. and Yang Y. Human Mobility Restrictions and the Spread of the NovelCoronavirus (2019-nCoV) in China.
National Bureau of Economic Research , 2020.Ferguson, N., Laydon, D., Nedjati Gilani, G., et al. Report 9: Impact of non-pharmaceuticalinterventions (NPIs) to reduce COVID19 mortality and healthcare demand. Authorswebsite, Imperial College London, 2020.Garfin D.R., Silver R.C. and Holman E.A. The novel coronavirus (COVID-2019) outbreak:Amplification of public health consequences by media exposure. Health Psychology, 2020.Golstein E.G. Theory of convex programming.
American Mathematical Soc. , 2008.Halder, N., Kelso, J. K., and Milne, G. J. Cost-Effective Strategies for Mitigating a FutureInfluenza Pandemic with H1N1 2009 Characteristics.
PLoS One
The Lancet ,395(10224), 542-545, 2020.Iwasaki A. and Grubaugh N.D. Why does Japan have so few cases of COVID19?.
EMBOMolecular Medicine , 12(5): e12481, 2020.Jorda, O., Singh, S. R. and Taylor, A.M. Longer-run economic consequences of pandemics.
National Bureau of Economic Research , No. w26934, 2020.Keeling M.J. and Rohani P. Modeling infectious diseases in humans and animals.
PrincetonUniversity Press , 2011.Kelso, J.K., Halder, N., Postma, M.J. and Milne, G.J. Economic analysis of pandemicinfluenza mitigation strategies for five pandemic severity categories.
BMC public health ,13(1), 2020.Li R., Pei S., Chen B., Song Y., Zhang T., Yang W. and Shaman J. Substantial undocu-mented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2).
Science , 2020.Liu Q., Wu S., Wang L. and Tan T. Predicting the next location: A recurrent model withspatial and temporal contexts. In
Thirtieth AAAI conference on artificial intelligence ,2016. 21izumoto, K., Kagaya, K., Zarebski, A., and Chowell, G. Estimating the asymptomaticproportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princesscruise ship. Yokohama, Japan, 2020.Park S., Choi G.J. and Ko H. Information technologybased tracing strategy in response toCOVID-19 in South Koreaprivacy controversies.
JAMA , 2020.Pike, W. and Saini, V. An international comparison of the second derivative of COVID-19deaths after implementation of social distancing measures. medRxiv , 2020.Prem K, Liu Y, Russell T W, et al. The effect of control strategies to reduce social mixingon outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study.
The LancetPublic Health , 2020.Pueyo, T. Coronavirus: The hammer and the dance. URL https://medium.com/@tomaspueyo/coronavirus-the-hammer-and-the-dance-be9337092b56.Qiu Y., Chen X. and Shi W. Impacts of Social and Economic Factors on the Transmission ofCoronavirus Disease 2019 (COVID-19) in China.
Journal of Population Economics , 2020.Shaw R., Kim Y. and Hua J. Governance, technology and citizen behavior in pandemic:Lessons from COVID-19 in East Asia.
Progress in disaster science , 2020.Shen Z., Wang W., Fan Y., Di Z. and Lai Y. Reconstructing propagation networks withnatural diversity and identifying hidden sources.
Nature communications , 5: 4323, 2014.Tian H., Liu Y., Li Y. and others The impact of transmission control measures during thefirst 50 days of the COVID-19 epidemic in China.
Science , 2020.Tuite, A.R., Fisman, D. and Greer, A.L. Mathematical modelling of COVID-19 transmissionand mitigation strategies in the population of Ontario, Canada.
CMAJ , 2020.Wang, C., Liu, L., Hao, X., et al. Evolving epidemiology and impact of non-pharmaceuticalinterventions on the outbreak of coronavirus disease 2019 in Wuhan, China. medRxiv ,2020.Wang D., Hu B., Hu C., and et al. Clinical characteristics of 138 hospitalized patients with2019 novel coronavirus–infected pneumonia in Wuhan, China.
JAMA , 2020.Zhang, Y., Jiang, B., Yuan, J., and Tao, Y. The impact of social distancing and epicenterlockdown on the COVID-19 epidemic in mainland China: A data-driven SEIQR modelstudy. medRxiv , 2020.Zhou X. and Belkin M. Semi-supervised learning by higher order regularization.
Proceedingsof the Fourteenth International Conference on Artificial Intelligence and Statistics , 892–900, 2011. 22ou, L., Ruan, F., Huang, M., et al. SARS-CoV-2 viral load in upper respiratory specimensof infected patients.
New England Journal of Medicine , 382(12), 1177-1179, 2020.
Appendix A. Training of NP-Net-SIR model
The non-parametric network set-up makes our NP-Net-SIR model essentially a specialclass of recurrent neural network (RNN), namely the temporal RNN (Liu et al., 2016) withthe temporality coming from the time dependent neural network W ( t ). The total amount ofinfections n ( t ), due to its unobservable nature, corresponds to the hidden layer of the RNN,while the documented infections m ( t ) corresponds to the output layer. Due to the lack ofextra input to the NP-Net-SIR, the input layer is degenerated to 0. Given the observation ofthe sequence of documented infections M o = { M t i : i = 1 , . . . , n ; t < · · · < t n } and a properregularized loss function, the standard back-propagation method applies to estimate the setof unknown temporal parameters { W ( t i ) , n ( t i ) , r ( t i ) , p I ( j, t i ) , p B ( j, t i ) : i = 1 , . . . , n ; j =1 , . . . , incub } . Due to the discreteness of the observation time, the continuity conditionfor these temporal parameters can be converted to a graph-Laplacian regularization withthe grid graph on real line (Zhou and Belkin, 2011), which is asymptotically equivalentto require, under the high-frequent observation, these temporal parameters are continuous,differentiable and have square-integrable derivatives. In our special case, the graph-Laplacianregularization can be written in the following form: R ( W , r, p B , p I ) = (cid:107) W ( t n ) (cid:107) + n − (cid:88) i =1 (cid:107) W ( t i +1 ) − W ( t i ) (cid:107) + n − (cid:88) i =1 (cid:107) r ( t i +1 ) − r ( t i ) (cid:107) + n − (cid:88) i =1 incub (cid:88) j =1 (cid:0) (cid:107) p B ( j, t i +1 ) − p B ( j, t i ) (cid:107) + (cid:107) p I ( j, t i +1 ) − p I ( j, t i ) (cid:107) (cid:1) (A.1)where we artificially set the boundary W ( t n +1 ) ≡ W up to the supscript n implies the sparse requirement on the W ( t i )’s which is standardto avoid over-fitting.For loss function, in addition to the standard square-sum error between the observed M t i s and the estimated m ( t i )s, we add an extra penalty to the error function in order to fixthe data pollution issue in the early stage of COVID-19 outbreak. In particular, we definethe following indicator function: I t ∗ ( t, m, M ) = (cid:40) m − M if t ≥ t ∗ or m > M else , (A.2)the meaning of (A.2) is that there is a cut-off time point t ∗ before which the documentedinfection number tends to under-estimate the real spreading trend. Therefore, if the esti-mated number m exceeds the reported M we think the estimates reflect the true case anddon’t treat it as an error, while if the estimated is less than the reported, which indicates asevere under-estimate to the true case, the error is calculated as usual. After t ∗ , it is thought23hat all hidden infectious cases that should be documented and published have already beenreported, then the reported cases agree with the real trend. In this paper, we set t ∗ as thedate, Feb. 12, 2020, when Wuhan local government reported 13,000+ inventory infectiouscases that were not in record before. Then the loss function can be written as the followingform: L ( M o , m , n , W , r, p B , p I ) = n (cid:88) i =1 k (cid:88) j =1 (cid:107) ( M t i ,j − m j ( t i )) ∗ I t ∗ ( t i , m j ( t i ) , M t i ,j ) (cid:107) + R ( W , r, p B , p I ) , (A.3)where the loss depends on the hidden infection number n through the observed infectionnumber m via model (1).Note that the RNN nature of the model (1) makes the infection number n ( t ), m ( t )generated from n ( s ), m ( s ) for s < t , then by the back-propagation algorithm, the model (1)is fitted in a reversed order, i.e. the parameter values of n ( s ), W ( s ), p I ( · , s ), p B ( · , s ) and r ( s ) for previous period s are essentially fitted from the later observations m ( t ) with t > s .The back-ward fitting direction together with the function (A.2) presents a way to utilize thelabel data m ( t ) at time t > t ∗ to generate label of infection number for those un-labeled time s with s ≤ t ∗ , such a trick of utilizing partially labeled data is standard in semi-supervisedlearning (Zhou and Belkin, 2011), we borrow it here to address the inaccurate data issue forthe early stage.To estimate the parameters, we minimize the loss function with respect to parametersand also subject to the default range restrictions that are the following: ≤ W kl ( t i ) ≤ , ∀ k, l, ir ( t i ) ≥ , ∀ ip B ( j, t i ) , p I ( j, t i ) ≥ , ∀ j, i (cid:80) j p B ( j, t i ) = (cid:80) j p B ( j, t i ) = 1 , ∀ i (A.4)The quadratic nature of the square-sum loss function guarantees that even if the penalty(A.2) is added, the resulting loss function (A.3) is still continuously differentiable, standardgradient descending solvers are applicable. Appendix B. Training algorithm
Training model (1) is equivalent to solving the optimization problem in (A.3) underthe constraints (A.4). The classical gradient-descending-based solution for RNN usuallyassumes no constraint. Therefore, some modifications are needed. In the following, wepropose a sequential modification to the classical backward propagation training algorithmfor neural network model. To facilitate the introduction of the sequential algorithm, wetemporally assume the temporal RNN is no longer temporal, but a static RNN, i.e. all thetemporal parameters W , p B , p I and r are no longer dependent on t . Also suppose that24he infection number m t is observed within the discrete time interval { , . . . , T } . Then, thediscrete version of model (1) under above assumptions becomes the following:∆ n t = n t +1 − n t = incub (cid:88) i =1 p I,i W · n t − i − r n t m t = incub (cid:88) i =1 p B,i · n t − i , (B.1)where for x = B, I , p x,i is a short-hand notation for p x ( i ) when the static probability p x massfunction is evaluated at the discrete time i . Given (B.1), notice that when the unknown modelprobability parameter p B , p I , the recovery rate r and network matrix W are fixed, the modeldepends completely on the hidden layer n = { n t : t = − incub, − incub + 1 , . . . , , . . . , T } viavector multiplication. While p B , p I , r and n are fixed, the model depends completely on W via matrix multiplication. When W and n are fixed, the model depends completely on the p B , p I and r via constant multiple and vector inner product. Note that all above operationsare linear operations, meanwhile, the loss function (A.3) has quadratic form, these factsimply that fixing any two classes of quantities among (a) p B , p I , r ; (b) W ; and (c) n , theoptimization problem (A.3) under constraint (A.4) is a classical convex programming prob-lem (Golstein, 2008; Shen et al., 2014), with respect to the remaining class of quantities. Asour loss function (A.3) is strictly convex, the resulting convex programming problem has theunique minimum and can be solved quickly via the classical gradient algorithm. Therefore,under static setting of model parameters, the following iterative fitting algorithm can beapplied to train the parameters: Step 1:
Given s ≥
0, for fixed vector p sB , p sI , constant r s and matrix W s , solve problem(A.3) under (A.4) with respect to n , resulting in n s +1 ; Step 2:
Given s ≥
0, for fixed vector p sB , p sI , constant r s and time series n s , solve problem(A.3) under (A.4) with respect to W , resulting in W s +1 ; Step 3:
Given s ≥
0, for fixed matrix W s and time series n s , solve problem (A.3) under(A.4) with respect to vector p B , p I and constant r , resulting in p s +1 B , p s +1 I and r s +1 ; Step 4:
Repeat
Step 1-3 until the ratio of L norms (cid:107) p s +1 B − p sB (cid:107) + (cid:107) p s +1 I − p sI (cid:107) + (cid:107) r s +1 − r s (cid:107) + (cid:107) W s +1 − W s (cid:107)(cid:107) p sB (cid:107) + (cid:107) p sI (cid:107) + (cid:107) r s (cid:107) + (cid:107) W s (cid:107) (B.2)is less than a prescribed threshold δ (=10 − ).Then, to release the static assumption, given the data M o = { M , M , . . . , M S } of theseries of observed infection vector during the period end up with day S . consider the follow-ing sequential backward propagation 25 tep 1: (Initialization) Set τ = S M τ = { M τ − T , . . . , M τ } , apply the 4-step static trainingalgorithm as above, denote the output as W τ , p τB , p τI , r τ and n τ = { n ττ − T − incub , . . . , n ττ } ; Step 2:
For T ≤ τ < S and M τ , redefine the hidden vector as n = { n τ − T − incub , n τ +1 τ − T − incub +1 ,. . . , n τ +1 τ } where only the first entry n τ − T − incub is undetermined and needs to be optimized,the remaining entries are fixed via the estimated value from the previous iteration. Giventhe estimation W τ +1 , p τ +1 B , p τ +1 I , r τ +1 from the previous iteration, apply the static versionof training algorithm as above with the redefined loss function as in the following equation(B.3), we get the output W τ , p τB , p τI , r τ and n τ = { n ττ − T − incub , n τ +1 τ − T − incub +1 , . . . , n ττ } . L ( M τ , m , n τ − T − incub , W , r, p B , p I , ) = τ (cid:88) i = τ − T +1 k (cid:88) j =1 (cid:107) ( M i,j − m i,j ) ∗ I t ∗ ( i, m i,j , M i,j ) (cid:107) + (cid:107) W τ +1 − W (cid:107) + (cid:107) r τ +1 − r (cid:107) + (cid:107) p τ +1 B − p B (cid:107) + (cid:107) p τ +1 I − p I (cid:107) (B.3)where the timely integrated Laplacian regularization (A.1) in loss function (A.3) is replacedwith the one-period regularization.Combining the sequence of outputs from the two-step sequential backward propagationalgorithm, we obtain the estimated sequence of adjacency matrices { W T , . . . , W S } , proba-bility parameters { p TB , . . . , p SB } , { p TI , . . . , p SI } , recovery rate { r T , . . . , r S } and the sequence ofhidden infection vector { n − incub , n − incub +1 , . . . , n , . . . , n S } . For the hidden infection vector,note that the estimate to n τ for every τ is unique according to the design of n τ in the Step2 of the sequential backward propagation.The sequential backward propagation is essentially a sequence of the standard backwardpropagation which is applied to solve the static version of our model (B.1), where the con-nection between two consecutive steps is established through the consecutive one-perioddecomposition of the Laplacian regularization condition in (B.3) and the construction thatlet n τ and n τ +1 share the common hidden infection numbers in the overlapped period. Itis not hard to verify that the sequential implementation of backward propagation generatesasymptotically equivalent result to the classical backward propagation.Also notice that the sequential training depends on an unspecified horizon parameter T , in this paper, we set T = 7 as it minimizes the aggregated loss (A.3) compared to thealternatives in the range { , . . . , } . The implementation of the algorithm is by pythonwhere the key-step minimization ( Step 1-3 ) for the static model (B.1) is implemented viathe convex programming package, CVXPY.
Appendix C. Calculation death numberD ( t i ) (similarly D r ( t i )) is calculated from the sequence { m ( t ) : t ≤ t i } through followingauto-regressive equation Dr j ( t i ) = a j + b · ∆ ˜ m j ( t i − k ) + c · Dr j ( t i − k ) + d · h j ( t i ) + ε j ( t i ) (C.1)where Dr j ( t i ) is the death rate such that D j ( t i ) = Dr j ( t i ) m j ( t i ). In (C.1) h j ( t i ) is the ratiobetween m j ( t i ) and the local healthcare resources that are measured by the total number26f hospital beds. According to preliminary analysis, the time lag k , k2