A Recurrent Neural Network and Differential Equation Based Spatiotemporal Infectious Disease Model with Application to COVID-19
AA Recurrent Neural Network and DifferentialEquation Based Spatiotemporal InfectiousDisease Model with Application to COVID-19
Zhijian Li , Yunling Zheng , Jack Xin , and Guofa Zhou Department of Mathematics, UC Irvine College of Health Sciences, UC Irvine, CA 92697, USA.July 21, 2020.
Abstract
The outbreaks of Coronavirus Disease 2019 (COVID-19) have im-pacted the world significantly. Modeling the trend of infection and real-time forecasting of cases can help decision making and control of thedisease spread. However, data-driven methods such as recurrent neuralnetworks (RNN) can perform poorly due to limited daily samples in time.In this work, we develop an integrated spatiotemporal model based on theepidemic differential equations (SIR) and RNN. The former after simplifi-cation and discretization is a compact model of temporal infection trend ofa region while the latter models the effect of nearest neighboring regions.The latter captures latent spatial information. We trained and tested ourmodel on COVID-19 data in Italy, and show that it out-performs existingtemporal models (fully connected NN, SIR, ARIMA) in 1-day, 3-day, and1-week ahead forecasting especially in the regime of limited training data.
Keywords:
COVID-19, Recurrent Neural Network,Discrete Epidemic Model, Spatiotemporal Machine Learning. a r X i v : . [ q - b i o . P E ] S e p Introduction
Susceptible-Infected-Removed (SIR) is a classical differential equa-tion model of infectious diseases [2]. It divides the total populationinto three compartments and models their evolution by the systemof equations dSdt = − β I SdIdt = β I S − γ IdRdt = γ I where β and γ are two positive parameters. SIR is a simple andefficient model of temporal data for a given region, see also [3] forrelated compartment models with social structures.Yet the infectious disease data are often spatio-temporal as inthe case of COVID-19, see [5]. A natural question is how to extendSIR to a space time model of suitable complexity so that it can bequickly trained from the available public data sets and applied inreal-time forecasts. See [8] for temporal model real-time forecastson cumulative cases of China in Feb 2020.In this paper, we explore spatial infectious disease information tomodel the latent effect due to the in-flow of the infected people fromthe geographical neighbors. The in-flow data is not observed. Tothis end, machine learning tools such as regression and neural net-work models are more convenient. Auto-regressive model (AR) andits variants are linear statistical models to forecast time-series data.The Long Short Term Memory (LSTM) neural networks, originallydesigned for natural language processing [4], have more representa-tion power and can be applied to disease time-series data as well.With spatial structures added, the graph-structured LSTM modelscan achieve state-of-the-art performance on spatiotemporal influenzadata [6], crime and traffic data [10, 9]. However, they require a largeenough supply of training data. For COVID-19, we only have lim-ited daily data since the outbreaks began in early 2020. Applyingspace-time LSTM models [6, 9] directly to COVID-19 turns out toproduce poor results. In view of the limited COVID-19 data, weshall propose a hybrid SIR-LSTM model. In [11], the authors designed a variant of AR, the AutoRegressionwith Google search data (ARGO), that utilizes external feature ofgoogle search data to forecast influenza data from Centers for Diseaseontrol and Prevention (CDC). Based on google search trend datathat correlated to influenza as external feature, ARGO is a linearmodel that processes historical observations and external features.The prediction of influenza activity level at time t , defined as ˆ y t , isgiven by: ˆ y t = u t + (cid:88) j =1 α j y t − j + (cid:88) i =1 β i X i,t . ARGO is optimized as:min µ y ,(cid:126)α,(cid:126)β (cid:16) y t − u t − (cid:88) j =1 α j y t − j − (cid:88) i =1 β i X i,t (cid:17) + λ a || (cid:126)α || + η a || (cid:126)β || + λ b || (cid:126)α || + η b || (cid:126)β || where (cid:126)α = ( α , · · · , α ) and (cid:126)β = ( β , · · · , β ). The y t − j ’s, 1 ≤ j ≤
52, are historical observations of previous 52 weeks and X i,t arethe google search trend measures of top 100 terms that are mostcorrelated to influenza at time t . Essentially, ARGO is a linearregression with regularization terms. In [11], ARGO is shown tooutperform standard machine learning models such as LSTM, AR,and ARIMA.In [6], graph structured recurrent neural network (GSRNN) fur-ther improved ARGO in the forecasting accuracy of CDC influenzaactivity level. The CDC partitions the US into 10 Health and Hu-man Services (HHS) regions for reporting. GSRNN treats the 10regions as a graph with nodes v , · · · , v , and E be the collection ofedges (i.e E = { ( v i , v j ) | v i , v j are adjacent } ). Based on the averagehistory of activity levels, the 10 HHS regions are divided into twogroups, relatively active group, H , and relatively inactive group, L .There are 3 types of edges, L − L , H − L , and
H − H , and each edgetype has a corresponding RNN to train the edge features. There arealso two node RNNs for each group to output the final prediction.Given a node (region) v , suppose v ∈ H . GSRNN generates the edgefeatures of v at time t , e tv, H and e tv, L , by averaging the history ofneighbors of v in the corresponding groups. Next, the edge featuresare fed into the corresponding edge RNNs: f tv = edgeRNN H−L ( e tv, L ) , h tv = edgeRNN H−H ( e tv, L )Then, the outputs of edgeRNNs are fed into the nodeRNN of group H together with the node feature of v at time t , denoted as v t , tooutput the prediction of the activity level of node v at time t + 1, or y t +1 v : y t +1 v = nodeRNN H ( v t , f tv , h tv ) . Our Contribution: IeRNN model
We propose a novel spatiotemporal model integrating LSTM [4] witha discrete time I-equation derived from SIR differential equations.The LSTM is utilized to model latent spatial information. The I-equation models the observed temporal information. Our model,named IeRNN, differs from [6, 10, 9] in that a difference equationwith 3 parameters (the I-equation) fits the limited temporal data,which is far more compact than LSTM.
Based on SIR model, we add an additional feature I e that representsthe external infection influence from the neighbors of a region. Thenthe SIR nonlinear system with I e as external forcing becomes dSdt = − β S I − β S I e (1) dIdt = β S I + β S I e − γ I (2) dRdt = γ I (3)which conserves the total mass (normalized to 1): S + I + R = 1. Itfollows from (3) that R ( t ) = R ( t ) + (cid:90) tt γ I dτ Hence, S ( t ) = 1 − I ( t ) − R ( t ) − γ (cid:90) tt I dτ
Substituting S ( t ) into (2) we have: dIdt = ( β I + β I e ) (cid:18) − I ( t ) − R ( t ) − γ (cid:90) tt I ( τ ) dτ (cid:19) − γI Combining forward Euler method and Riemann sum approximationof the integral, we have a discrete approximation: I ( t + 1) = (1 − γ ) I ( t ) + (cid:0) β I ( t ) + β I e ( t ) (cid:1) · (cid:16) − I ( t ) − R ( t ) − γ t − t p + 1 p (cid:88) j =0 I ( t − j ) (cid:17) s we model I ( t ) from the beginning of the infection, we have t = 0and R ( t ) = 0. We arrive at the following discrete time I-equation: I ( t +1) = (1 − γ ) I ( t )+ (cid:0) β I ( t )+ β I e ( t ) (cid:1) · (cid:16) − I ( t ) − γ tp + 1 p (cid:88) j =0 I ( t − j ) (cid:17) (4)Note that if we let I e ( t ) = 0, then we have an approximation of I ( t ) for the original SIR model, which is a solely temporal model(named I model): I ( t + 1) = (1 − γ − β ) I ( t ) − β I ( t ) − β γ tp + 1 I ( t ) p (cid:88) j =0 I ( t − j ) (5)In reality, it is hard to know how a population of a region interactswith populations of neighboring regions. As a result, I e ( t ) is a latentinformation that is difficult to model by a mathematical formula orequation. In order to retrieve latent spatial information, we employrecurrent neural networks made of LSTM cells [4], see Fig. 2. I e We utilize the spatial information based on the Italy region map,Fig. 1. In order to learn the latent information I e of a region v ,we first generate the edge feature of v . Let C be the collection ofneighbors of v . Then, the edge feature of v at time t is formulatedas: f te = 1 | C | (cid:88) i : v i ∈ C (cid:2) I i ( t − , · · · , I i ( t − p ) (cid:3) where I i ( t ) is the infection population percentage of region v i attime t . Then, we feed f te into an Edge RNN, an RNN with 3 stackedLSTM cells (see Fig. 3), followed by a dense layer for computing I e . The activation function of the dense layer is hyperbolic tangentfunction. Figure 4 illustrates the procedure of computing I e forLazio as an illustration. We hence call our model IeRNN due to itsintegrated design of I-equation and edge RNN. To calibrate our model IeRNN, we use the Italy COVID-19 data [5]for training and testing. Although the US has the most infectedcases, the recovered cases are largely missing. On the other hand,the Italian COVID-19 data is more accurately reported and better igure 1:
Italian Region Map σ σ
Tanh σ × + × × Tanhct-1ht-1 xtInput cththtOutput
Figure 2:
LSTM cell: input x t , output: h t ; σ is a sigmoid function. maintained, reflecting a nearly complete duration of the rise and fallof infection. We collect the data of daily new (current) cases from2020-02-24 to 2020-06-18 of 20 Italian regions. We set p = 3 in(4) based on experimental performance. As a result, we have the igure 3: Edge RNN consisting of 3 stacked LSTM cells.
TuscanyUmbriaMarcheAbruzzoMoliseCampania
EdgeFeature
EdgeRNN DenseLayer tanh I e Figure 4:
Computing I e of Lazio region. Edge RNN is as shown in Fig. 3. Denselayer is fully connected (see Fig. 5). current data for 113 days, with 81 days to train our model and 32days to test our model (or 70%/30% training/testing data split).ur training loss function is the mean squared error of the modeloutput and training data:ˆ y ( t ) = (1 − γ ) y ( t −
1) + (cid:0) β y ( t −
1) + β I e ( t − (cid:1) · (cid:16) − y ( t − − γ tp + 1 p +1 (cid:88) j =1 y ( t − j ) (cid:17) Loss = 1 T − p − T (cid:88) t = p +1 (cid:0) y ( t ) − ˆ y ( t ) (cid:1) Since the training is minimization of the above loss fucntion overparameters in both I-equation and RNN, the two components of IeRNNare coupled while learning from data . We use Adam gradient de-scent to learn the weights of LSTM and the dense layer, as well asI-equation parameters β , β , and γ .To evaluate the performance of our model, we compare IeRNN,I-model (5), a fully-connected neural network (fcNN, Fig. 5) withhyperbolic tangent activation function, and auto-regression model(ARIMA). As the standard setting of ARIMA is 1-day ahead pre-diction, we shall only compare with it in such a very short-term case.Since infectious disease evolution is intrinsically nonlinear, we shallcompare nonlinear models for 3-day and 1-week ahead forecasting.Based on experimental performance, we set the number of hiddenunits to be 100, 150, and 100 for the three layers of fcNN respectively. Figure 5:
Schematic of fcNN for modeling time series. ... ... ... ... y t − y t − y t − y t − p y t − p y t − p w (1)1 w (1)1 w (1)1 w (1) n w (1) n w (1) n w (2)1 w (2)1 w (2)1 w (2) n w (2) n w (2) n w (3)1 w (3)1 w (3)1 w (3) n w (3) n w (3) n tanhtanhtanh y t y t y t Inputlayer Hiddenlayer Outputlayer .1 One-Day Ahead Forecast
As we see in Fig. 6, fcNN can perform poorly. This is not a surprise,as both [6] and [11] relied on hundreds of historical observations totrain their models. The I-model based on only sequential data intime of one region merely follows the trend of the true data but can-not provide accurate predictions. Our IeRNN model, with the helpof additional spatial information, is able to make accurate predic-tions and outperform other models. We also test the IeRNN withtraining data reduced to 40% (46 days). IeRNN is still able to trackthe general trend of the infected population percentage.We measure the test accuracy with the Root Mean Square Error(RMSE) averaged over a few trials in training. In Tables 1 and 2on 1-day ahead forecast, IeRNN achieves the smallest RMSE errors,and I-model has the largest errors. The compact I-model with 2parameters cannot do 1-day ahead prediction as accurately. ARIMAoutperforms I-model and does better on Emilia-Romagna and Lazioregions than fcNN. ARIMA, a linear model, has simpler structurethan fcNN whose nonlinearity does not play out in such a short timetask. Fig. 8 shows 1-day ahead forecast of IeRNN model on otherregions with the learned latent external forcing I e in Fig. 9. Table 1:
RMSE test errors in 1-day ahead forecast trained with 70 % of data. E-R=Emilia-Romagna.
Model Lombardy E-R LazioIeRNN 1.027e-04 6.333e-05 3.251e-05I-model 1.175e-03 3.284e-04 2.439e-04fcNN 1.580e-04 4.614e-04 2.294e-04ARIMA 9.789e-04 3.627e-04 4.365e-05
Table 2:
RMSE test errors in 1-day ahead forecast trained with reduced (40 % of)data. E-R= Emilia-Romagna.
Model Lombardy E-R LazioIeRNN 9.850e-05 1.778e-04 3.617e-05I-model 1.871e-03 1.252 e-03 5.443e-04fcNN 3.364e-04 6.204e-04 8.030e-04ARIMA 1.277e-03 1.082e-03 4.018e-05 able 3:
RMSE test errors in 7-day ahead forecast trained with 70 % of data. E-R=Emilia-Romagna.
Model Lombardy E-R LazioIeRNN 3.513e-04 4.423e-04 1.161e-04I-model 2.004e-03 6.627e-04 5.586e-04fcNN 6.608e-04 4.804e-04 4.508e-04
Table 4:
RMSE test errors in 7-day ahead forecast trained with reduced (40 % of)data. E-R= Emilia-Romagna.
Model Lombardy E-R LazioIeRNN 3.061e-04 4.324e-04 7.754e-05I-model 2.196e-03 1.167e-03 6.011e-04fcNN 2.224e-03 6.889e-04 1.851e-04
Table 5:
RMSE test errors in 3-day ahead forecast trained with 70 % of data. E-R=Emilia-Romagna.
Model Lombardy E-R LazioIeRNN 2.479e-04 3.668e-04 5.979e-05I-model 5.609e-04 1.724e-04 1.383e-04fcNN 8.165e-04 6.757e-04 1.689e-04
Table 6:
RMSE test errors in 3-day ahead forecast trained with reduced (40 % of)data. E-R= Emilia-Romagna.
Model Lombardy E-R LazioIeRNN 1.987e-04 3.256e-04 5.297e-05I-model 1.114e-03 7.337e-04 3.507e-04fcNN 8.611e-04 1.374e-03 5.290e-04
Table 7:
Average model training (tr) and inference (inf) times in seconds on MacbookPro with Intel i5 CPU. The first two columns are for 70 % training (tr70) data andthe last two columns are for 40 % training (tr40) data.
Model tr70 inf70 tr40 inf40IeRNN 0.58s 0.018s 0.51s 0.02sI-model 0.14s 0.004s 0.11s 0.004sfcNN 0.09s 0.003s 0.09s 0.003sARIMA 0.23s 0.014s 0.19s 0.015s .2 Multi-Day Ahead Forecast
In model training for multi-day ahead forecast, the training lossfunction is modified so that the model input comes from multipledays in the past. In 7-day ahead forecast, IeRNN leads the othertwo nonlinear models especially in the 40% training data case, byas much as a factor of 7 in Lombardy. In the 3-day ahead forecast,IeRNN leads fcNN by a factor of 4 in the 40% training data case, asmuch as a factor of 10 in Lazio. Figs. 10-13 show model comparisonin training and forecast phases for Lombardy and Lazio.
IeRNN (fcNN) has about 16400 (1800) parameters. The optimized( β , β , γ ) = (0 . , . , . We developed a novel spatiotemporal infectious disease model con-sisting of a discrete epidemic equation for the region of interest andRNNs for interactions with nearest geographic regions. Our modelcan be trained under 1 second. Its inference takes a fraction of asecond, suitable for real-time applications. Our model out-performstemporal models in one-day and multi-day ahead forecasts in limitedtraining data regime. In future work, we shall consider social andcontrol mechanisms [1, 7] to strengthen the I-equation, as well astraffic data to expand interaction beyond nearest neighbors.
The work was partially supported by NSF grants IIS-1632935, andDMS-1924548. JX would like to thank Prof. Fred Wan for helpfulcommunications on disease modeling. eferences [1] G. Albi, L. Pareschi, and M. Zanella. Control with uncertain data of so-cially structured compartmental epidemic models. arXiv preprint arXiv ,2004.13067v1:1–26, 2020.[2] R. Anderson and R. May.
Infectious Diseases of Humans: Dynamics andControl . Oxford University Press, Oxford, 1992.[3] H. Hethcote. The mathematics of infectious diseases.
SIAM Review , 42:599– 653, 2000.[4] S. Hochreiter and J. Schmidhuber. Long short-term memory.
Neural com-putation , 9(8):1735–1780, 1997.[5] Italian region.
Italian COVID-19 Data (accessed March-July, 2020) .https://github.com/Akaza994/COVID-19-Data/blob/master/covid-19/italy.csv.[6] Z. Li, X. Luo, B. Wang, A. Bertozzi, and J. Xin. A study on graph-structured recurrent neural networks and sparsification with application toepidemic forecasting. In
World Congress on Global Optimization , pages730–739. Springer, 2019, https://doi.org/10.1007/978-3-030-21803-4 80.[7] D. Morris, F Rossine, J. Plotkin, and S. Levin. Optimal, near-optimal, androbust epidemic control. arXiv preprint arXiv , 2004.02209v2:1–28, 2020.[8] K. Roosa, Y. Lee, R. Luo, A. Kirpich, R. Rothenberg, J.M. Hyman, P. Yan,and G. Chowell. Real-time forecasts of the COVID-19 epidemic in Chinafrom February 5th to February 24th, 2020.
Infectious Disease Modelling ,5:256 – 263, 2020.[9] B. Wang, X. Luo, F. Zhang, B. Yuan, A. Bertozzi, and P. Brantingham.Graph-based deep modeling and real time forecasting of sparse spatio-temporal data. arXiv preprint arXiv:1804.00684 , 2018.[10] B. Wang, P. Yin, A. Bertozzi, P. Brantingham, S. Osher, and J. Xin.Deep learning for real-time crime forecasting and its ternarization.
ChineseAnnals of Mathematics, Series B , 40(6):949–966, 2019.[11] S. Yang, M. Santillana, and S. Kou. Accurate estimation of influenza epi-demics using Google search data via ARGO.
Proceedings of the NationalAcademy of Sciences , 112(47):14473–14478, 2015. igure 6:
Training and 1-day ahead forecast of 4 models (IeRNN, fcNN, I-model,and ARIMA) in 4 rows respectively. (a)
Lombardy (b)
Lazio (c)
Lombardy (d)
Lazio (e)
Lombardy (f)
Lazio (g)
Lombardy (h)
Lazio igure 7:
Training and 1-day ahead forecast of 4 models with reduced (40%) trainingdata. The 4 rows are IeRNN, fcNN, I-model, and ARIMA respectively. (a)
Lombardy (b)
Lazio (c)
Lombardy (d)
Lazio (e)
Lombardy (f)
Lazio (g)
Lombardy (h)
Lazio igure 8:
IeRNN training and 1-day ahead forecast on four additional regions. (a)
Piemonte (b)
Campania (c)
Molise (d)
Umbria igure 9:
Visulization of the latent information I e in Fig. 8 learned by IeRNN. (a) Piemonte (b)
Campania (c)
Molise (d)
Umbria igure 10:
Training and 7-day ahead forecast of 3 models (IeRNN, fcNN, and I-model) in 3 rows respectively. (a)
Lombardy (b)
Lazio (c)
Lombardy (d)
Lazio (e)
Lombardy (f)
Lazio igure 11:
Training and 7-day ahead forecast of 3 models (IeRNN, fcNN, and I-model) with reduced (40%) training data in 3 rows respectively. (a)
Lombardy (b)
Lazio (c)
Lombardy (d)
Lazio (e)
Lombardy (f)(f)
Lombardy (f)(f)
Lazio igure 12:
Training and 3-day ahead forecast of 3 models (IeRNN, fcNN, and I-model) in 3 rows respectively. (a)
Lombardy (b)
Lazio (c)
Lombardy (d)
Lazio (e)
Lombardy (f)(f)
Lombardy (f)(f)
Lazio igure 13:
Training and 3-day ahead forecast of 3 models (IeRNN, fcNN, and I-model) with reduced (40 %) training data in 3 rows respectively. (a)
Lombardy (b)
Lazio (c)
Lombardy (d)
Lazio (e)
Lombardy (f)(f)