Spatio-Temporal Multi-step Prediction of Influenza Outbreaks
ZZhang et al.
METHODOLOGY
Spatio-Temporal Multi-step Prediction ofInfluenza Outbreaks
Jie Zhang , Kazumitsu Nawata and Hongyan Wu * Correspondence: [email protected] Joint Engineering ResearchCenter for Health Big DataIntelligent Analysis Technology,Shenzhen Institutes of AdvancedTechnology, Chinese Academy ofSciences, 1068 Xueyuan Avenue,Shenzhen University Town,518055 Shenzhen, ChinaFull list of author information isavailable at the end of the article
AbstractBackground:
Flu circulates all over the world. The worldwide infection places asubstantial burden on people’s health every year. Regardless of the characteristicof the worldwide circulation of flu, most previous studies focused on regionalprediction of flu outbreaks. The methodology of considering the spatio-temporalcorrelation could help forecast flu outbreaks more precisely. Furthermore,forecasting a long-term flu outbreak, and understanding flu infection trend moreaccurately could help hospitals, clinics, and pharmaceutical companies to betterprepare for annual flu outbreaks. Predicting a sequence of values in future,namely, the multi-step predication of flu outbreaks should cause concern.Therefore, we highlight the importance of developing spatio-temporalmethodologies to perform multi-step prediction of worldwide flu outbreaks.
Results:
We compared the MAPEs of SVM, RF, LSTM models of predicting fludata of the 1-4 week(s) ahead with and without other countries’ flu data. Wefound the LSTM models achieved the lowest MAPEs in most cases. As forcountries in Southern hemisphere, the MAPEs of predicting flu data with othercountries are higher than those of predicting without other countries. Forcountries in Northern hemisphere, the MAPEs of predicting flu data of the 2-4weeks ahead with other countries are lower than those of predicting without othercountries; and the MAPEs of predicting flu data of the 1-weeks ahead with othercountries are higher than those of predicting without other countries, except forUK.
Conclusions:
In this study, we performed the spatio-temporal multi-stepprediction of influenza outbreaks. The methodology considering thespatio-temporal features improves the multi-step prediction of flu outbreaks.
Keywords: sample; article; author
Background
Influenza, short for flu, is an acute respiratory infection caused by flu viruses. Flucirculates in all over the world. The worldwide infection places a substantial burdenon people’s health every year. According to World Health Organization(WHO)’sreport, flu is estimated to result in about 3 to 5 million cases of severe illness,and about 290 000 to 650 000 deaths. Accurately forcasting of influenza outbreakscould help taking appropriate actions, such as school closure, to prevent or reduceflu illness.Regardless of the characteristic of its worldwide circulation of flu, most previousstudies have focused on regional prediction of flu outbreaks [1, 2, 3, 4] for two prob-able reasons. First, different locations in one country or one region, to some extent,share similar geo-locational characteristics, such as humidity and temperature. Flu a r X i v : . [ c s . C Y ] F e b hang et al. Page 2 of 9 virus shows a sensitivity to temperature and humidity. As a result, predicting fluoutbreaks of one country or one region is considered reasonable and approachable.Second, flu virus transmission is believed to occur mostly over relatively short dis-tances. Usually, flu virus is spread through the air from coughs or sneezes. When aninfected person coughs or sneezes, droplets containing viruses (infectious droplets)are dispersed into the air and can spread up to one meter, and infect persons inclose proximity who breathe these droplets in.However, one fast-growing risk group, travelers, is neglected from these overviews.Several changes in our globalizing world contribute to the growing influence of thetraveller group: (i) steady increase in total travel volume worldwide, (ii) adventof mass-tourism and (iii) increasing numbers of immune-compromised and elderlytravelers. International sporting events and festivals as well as traveling by airplaneor cruise ship could facilitate flu virus transmissionand therefore global spread ofinfluenza [5]. The study in [6] shows that flu outbreaks correlate with each otherin all countries around the world. The methodology of considering the correlationcould help forecast the flu outbreaks.Furthermore, forecasting a longer-term flu outbreak, and knowing its outbreaktrend more accurately could help hospitals, clinics, and pharmaceutical companiesto better prepare for annual flu outbreaks. First, manufacturing flu vaccine is achallenging work. According to WHO’s report, vaccination is the most effectiveway to prevent the disease. During 2015-2016 flu seasons, flu vaccine prevented anestimated 5.1 million illnesses, 2.5 million medical visits, 71,000 hospitalizations,and 3,000 pneumonia & influenza (P&I) deaths. The problem is that flu virus un-dergoes high mutation rates and frequent genetic re-assortment (combination andrearrangement of genetic material). This characteristic of flu complicates the proce-dure of flu vaccines production. In Februaries, World Health Organization (WHO)assesses the strains of flu virus that are most likely to be circulating over the fol-lowing winter. Then, vaccine manufacturers produce flu vaccines in a very limitedtime. Usually, the first batch of vaccine is unavailable until September. As a re-sult, in an extremely limited time, manufacturers have to prepare enough vaccines[7, 8]. Second, beds assignment to flu patients is another challenging task due tothe limited capacity of hospital beds, time-dependencies of bed request arrivals, andunique treatment requirements of flu patients. Flu seasons vary in timing, severity,and duration from one season to another. Therefore, flu hospitalization also variesgreatly by sites and time in each season [9]. Predicting a sequence of values infuture,namely,the multi-step predication of flu outbreaks should cause concern.Therefore, we highlight the importance of developing global methodologies to per-form multi-step prediction of worldwide influenza outbreaks. Nonetheless, not manypast studies focused on multistep prediction of influenza outbreaks. The probablereason could be that multistep prediction usually results in poor accuracy due tosome insuperable problems, such as error accumulation, etc. [10, 11]. One compro-mising method is that one can aggregate raw data to a larger time unit and then usethe single-step prediction to replace multi-step prediction. For instance, if raw datais weekly based, we can aggregate weekly values to monthly values and then per-form single-step prediction of the total value of the coming month (roughly aroundfour weeks). However the aggregation hinders us from understanding the internalvariation during the coming four weeks. hang et al.
Page 3 of 9
In this study, we performed multi-step prediction by leveraging Long Short TermMemory (LSTM). The LSTM is a special kind of RNN. In theory, the complexstructure (layers and gated cells) enables LSTM to learn long-term dependencies[12], simulate nonlinear function, and refine time-series prediction very well [13].
Methods
As shown in the Figure 1, to perform spatio-temporal flu prediction based on his-torical data, firstly, we scraped flu data of all the 155 countries from the FluNet,a global web-based tool for flu virological surveillance in WHO. We selected 23countries as features since other countries have N/As in their flu data. We selectedspatio-temporal related features. Then, we send those features into a model com-bined with LSTM and fully connected layers. Finally, the model predicts the fludata of the 1-, 2-, 3-, and 4-week ahead with other countries’ flu data. To comparethe results, we also predicts the flu data of the 1-, 2-, 3-, and 4-week ahead withoutother countries’ flu data. The following subsections presents the details.
Data acquisition
FluNet is a global web-based tool for flu virological surveillance [14]. The data atcountry level are available and updated weekly. From FLuNet, we collected theflu data of 155 countries around the world from the 1st week of 2010 to the 18thweek of 2018. We select 23 countries, the flu data of which have no NAs. The 23countries are Australia, Brazil, Cambodia , China , Egypt , French Guiana , Ghana, Indonesia , Iran , Iraq , Ireland , Japan , Netherlands , Nicaragua, Niger, Norway,Panama, Poland, Republic of Korea, Russia, United Kingdom of Great Britain andNorthern Ireland (UK), United States of America (USA).
Feature Selection
The features are selected or generated considering the spatital and temporal influ-ence of flu outbreaks.
Temporal factors
In temperate climates, flu outbreaks occur mainly during winter; while in tropicalregions, flu outbreaks occur throughout the year. Considering the possible one-yearlong period of flu outbreaks our previous studies compared the performance of thetime lags of 2, 4, 9, 13, 26, and 52 week, and found that 52 weeks lead to the bestaccuracy [ ? ]. The temperature changes could affect flu virus, and people tend toget illness. Therefore we construct the temporal factors with three kinds of data:the original data of the past 52 weeks; the first order difference; the mean, median,standard deviations(std), maximum, and minimum of windows, the length of whichare 1, 2, 3, 4, 9, 13, 26, 52 weeks. Spatial factors
Considering the global spread of influenza and the correlation between countries, weuse the historical flu data of another above-mentioned 22 countries as the predictionfeatures when predicating one country. Therefore when we are predicating the fluoutbreaks of one country, the other countries could affect the outcome by adjustingtheir weight parameters. By this way, we get another 1,144 (22 times 52) features. hang et al.
Page 4 of 9
Multi-step Prediction
There are two types of prediction of flu outbreaks: (a) single-step prediction: pre-dicting the coming value in future by analyzing observed values in the past; and(b) multistep prediction: predicting a sequence of values in future by analyzing ob-served values in the past. The idea (a) tend to accumulate the errors induced inthe previous steps to future predictions. In this study we use multiple single-outputprediction (MSOP) to implement multi-step prediction. MSOP predicts the comingseveral values by the same past values. In other words, when predicting X t + p ( p> =2) ,MSOP jumps X t + p − p> =2) , X t + p − p> =2) , . . . , and X t +1 . Formula 3 explains thealgorithm of MSOP. Its flow are presented in Figure 2. X t +1 ( predicted ) = LST M M ODEL X t ( observed ) , X t − ( observed ) , . . . , X t − ( observed )] X t +2 ( predicted ) = LST M M ODEL X t ( observed ) , X t − ( observed ) , . . . , X t − ( observed )] X t +3 ( predicted ) = LST M M ODEL X t ( observed ) , X t − ( observed ) , . . . , X t − ( observed )] X t +4 ( predicted ) = LST M M ODEL X t ( observed ) , X t − ( observed ) , . . . , X t − ( observed )](1)As shown in Figure 2, to predict x t +1 , we train a model by using X t , X t − , X t − , . . . , X t − as features. To predict X t +2 , we train another model by still using X t , X t − , X t − , . . . , X t − as features. Although we use the same feature space in these two models, the twomodels are trained differently with different responses ( x t +1 andx t +2 ). The researchin [ ? ] shows the 3-layered LSTM is efficient enough in predicting flu outbreaks. Metrics
Because the population of some countries is quite small and only 1 or 2 flu patientsevery week are reported, the study on those countries is insignificant. We predictedthe flu data of the coming weeks in countries with a large population. We selectedAustralasia, Brazil, China, Japan, UK, and USA when considering population andlocation.We investigated the distribution of the flu data, and found that it was non-normaldistribution. In our opinion, comparing models’ accuracy by Mean Absolute Per-centage Error (MAPE, as shown in the Formula 2) reflects the difference based onthe median, while comparing models’ accuracy by Root Mean Square Error (RMSE)is based on means. Therefore, we used MAPE as a metrics to compare predictingthe accuracy of models.
M AP E = 1 n n x (cid:88) t =1 (cid:12)(cid:12)(cid:12)(cid:12) F t − A t A t − (cid:12)(cid:12)(cid:12)(cid:12) (2) Results
Table 1 presents the MAPEs of RF, SVM, and LSTM models with and withoutother countries’ flu data. For example, when forecasting China’s flu data of 1-,2-, 3-, and 4-week ahead, the MAPEs of the LSTM models with other countries’ hang et al.
Page 5 of 9 flu data are 13.1%, 19.8%, 26.7%, 36.2%; while the MAPEs of the LSTM modelswithout other countries’ flu data are 12.5%, 20.2%, 29.0%, and 36.7%.Figure 3 compares the MAPEs of SVM, RF, LSTM models of predicting flu dataof the 1-, 2-, 3-, and 4-week ahead with other countries’ flu data. In most cases, theLSTM models achieved the lowest MAPEs.Alike, Figure 4 compares the MAPEs of SVM, RF, LSTM models of predictingflu data of the 1-, 2-, 3-, and 4-week ahead without other countries’ flu data. Inmost cases, the LSTM models achieved the lowest MAPEs.Figure 5 compares the MAPEs of the LSTM models with and without othercountries’ flu data. As for countries in Southern hemisphere, i.e. Australia andBrazil, the MAPEs of predicting flu data of the 1-, 2-, 3-, and 4-week ahead withother countries are higher than those of predicting without other countries. Forcountries in Northern hemisphere, i.e. China, Japan, UK, and USA, the MAPEs ofpredicting flu data of the 2-4 weeks ahead with other countries are lower than thoseof predicting without other countries. Interestingly, when predicting flu data of the1 week ahead, the MAPEs of predicting with other countries are usually higher thanthose of predicting without other countries, except for UK.
Discussion
We found, in southern hemisphere (Australia and Brazil), the MAPEs of predictingflu data with other countries are higher than the MAPEs without other countries.The probable reasons are the southern hemisphere’s countries have totally differentflu seasons since their winters are in June, July and August. And the countries se-lected in this study are mostly in Northern hemisphere and their flu data are barelycorrelated to the flu data of southern hemisphere’s countries. In addition, Australiais geographically isolated from other countries. As for Northern hemisphere, theMAPEs of predicting flu data of the 2-4 weeks ahead with other countries are lowerthan those without other countries. That is because of high correlations among fludata of Northern hemisphere’s countries. However, the MAPEs of predicting fludata of the 1 week ahead with other countries are lower than those without othercountries. That is probably because flu infection in other countries does not impactthe of flu infection in target countries in one week ahead because of geographicaldistance.The best MAPEs of LSTM models achieved were still very high because we usedthe flu data in 2017-2018 as a testing set. The 2017-2018 flu season, a pandemic-alike season, is quite different from and seriously heavier the past few seasons. Andusing other machine learning metrologies, such as SVR and RF, result in higherMAPEs. In this study, we used only historical values. To some extent, historicalvalues are a reflection of all possible related factors. However, one might say otherfeatures, such as temperature and humidity, could help predict more accurately,especially at turning points. For one thing, when we predict future values, we haveto use predicted data, e.g. weather forecast. The predicting error of predicted datacould intensively enlarge the predicting error in further steps. For another, how toexpress one country’s weather could be another problem if the country has a largearea and population. A possible solution could be using two convolutional neuralnetworks to extract features of weather and population of the whole country. hang et al.
Page 6 of 9
Conclusions
In this study, we performed the spatio-temporal multi-step prediction of influenzaoutbreaks.The methodology considering the spatio-temporal features improves themulti-step prediction of flu outbreak. We compared the MAPEs of SVM, RF, LSTMmodels of predicting flu data of the 1-4 week(s) ahead with and without othercountries’ flu data. We found the LSTM models achieved the lowest MAPEs in mostcases. As for countries in Southern hemisphere, the MAPEs of predicting flu datawith other countries are higher than those of predicting without other countries.For countries in Northern hemisphere, the MAPEs of predicting flu data of the 2-4weeks ahead with other countries are lower than those of predicting without othercountries; and the MAPEs of predicting flu data of the 1-weeks ahead with othercountries are higher than those of predicting without other countries, except forUK.
Competing interests
The authors declare that they have no competing interests.
Author details Department of Technology Management for Innovation, The University of Tokyo, Hongo, 1138656 Tokyo, Japan. Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes ofAdvanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, 518055Shenzhen, China.
References
1. Wang, F., Wang, H., Xu, K., Raymond, R., Chon, J., Fuller, S., Debruyn, A.: Regional level influenza studywith geo-tagged twitter data. Journal of medical systems (8), 189 (2016)2. Kane, M.J., Price, N., Scotch, M., Rabinowitz, P.: Comparison of arima and random forest time series modelsfor prediction of avian influenza h5n1 outbreaks. BMC bioinformatics (1), 276 (2014)3. Malik, M.R., Haq, Z.U., Saeed, Q., Riley, R., Khan, W.M.: Distressed setting and profound challenges:Pandemic influenza preparedness plans in the eastern mediterranean region. Journal of infection and publichealth (2017)4. Wu, H., Cai, Y., Wu, Y., Zhong, R., Li, Q., Zheng, J., Lin, D., Li, Y.: Time series analysis of weeklyinfluenza-like illness rate using a one-year period of factors in random forest regression. Bioscience trends (3),292–296 (2017)5. Goeijenbier, M., van Genderen, P., Ward, B., Wilder-Smith, A., Steffen, R., Osterhaus, A.: Travellers andinfluenza: risks and prevention. Journal of travel medicine (1) (2017)6. He, D., Lui, R., Wang, L., Tse, C.K., Yang, L., Stone, L.: Global spatio-temporal patterns of influenza in thepost-pandemic era. Scientific reports , 11013 (2015)7. Gerdil, C.: The annual production cycle for influenza vaccine. Vaccine (16), 1776–1779 (2003)8. Lubeck, M.D., Schulman, J.L., Palese, P.: Antigenic variants of influenza viruses: marked differences in thefrequencies of variants selected with different monoclonal antibodies. Virology (2), 458–462 (1980)9. Puig-Barber`a, J., Tormos, A., Sominina, A., Burtseva, E., Launay, O., Ciblak, M.A., Natividad-Sancho, A.,Buigues-Vila, A., Mart´ınez-´Ubeda, S., Mah´e, C.: First-year results of the global influenza hospital surveillancenetwork: 2012–2013 northern hemisphere influenza season. BMC Public Health (1), 564 (2014)10. Zhang, L., Zhou, W.-D., Chang, P.-C., Yang, J.-W., Li, F.-Z.: Iterated time series prediction with multiplesupport vector regression models. Neurocomputing , 411–422 (2013)11. Akhlaghi, S., Zhou, N.: Adaptive multi-step prediction based ekf to power system dynamic state estimation. In:Power and Energy Conference at Illinois (PECI), 2017 IEEE, pp. 1–8 (2017). IEEE12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation (8), 1735–1780 (1997)13. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with lstm (1999)14. FluNet World Health Organization. Figures
The flow chart of our method.
The flow of our method is composed of fourparts:acquire data;Select countries;features; and build models.
Tables hang et al.
Page 7 of 9
Multi-step prediction.The MAPES considering spatio-temporal features.
The figure illustratesthe MAPEs of predicting flu data of the 1-, 2-, 3-, and 4-week ahead with othercountries’ Flu data .
The MAPES without considering spatio-temporal features.
The figureillustrates the MAPEs of predicting flu data of the 1-, 2-, 3-, and 4-week aheadwithout other countries’ Flu data .
The result of predicting flu outbreaks using LSTM.
The figure illustratesthe comparison of MAPEs of predicting flu data of the 1-, 2-, 3-, and 4-weekahead with other countries’ flu data and those without other countries’ flu dataTable 1: The multi-step flu outbreak prediction considering the spatio-temporalfeatures. The table presents the MAPEs of RF, SVM, and LSTM models with andwithout other countries’ flu data.
Hemisphere Country step ahead SVM RF LSTMwithout without with without with withoutSouthern Australia 1 0.31 0.32 0.38 0.37 0.30 0.232 0.41 0.41 0.39 0.39 0.30 0.253 0.52 0.52 0.48 0.46 0.33 0.304 0.66 0.66 0.54 0.53 0.39 0.32Southern Brazil 1 0.28 0.28 0.30 0.33 0.29 0.232 0.34 0.33 0.85 0.31 0.29 0.263 0.42 0.36 0.70 0.30 0.36 0.314 0.43 0.42 0.90 0.32 0.43 0.36Northern China 1 0.21 0.19 0.15 0.20 0.13 0.132 0.35 0.35 0.28 0.29 0.20 0.203 0.56 0.56 0.45 0.43 0.27 0.294 0.74 0.74 0.55 0.51 0.36 0.37Northern Japan 1 0.31 0.29 0.43 0.41 0.31 0.282 0.50 0.48 0.56 0.48 0.39 0.403 0.60 0.57 0.63 0.49 0.43 0.434 0.78 0.74 0.70 0.66 0.44 0.54Northern UK 1 1.28 1.31 1.46 1.22 0.69 0.862 1.71 1.66 2.62 2.64 0.92 0.953 3.10 2.89 3.51 3.08 1.13 1.174 3.67 3.27 4.34 5.10 0.83 1.19Northern USA 1 0.17 0.18 0.14 0.14 0.18 0.152 0.30 0.29 0.21 0.21 0.23 0.253 0.45 0.43 0.27 0.24 0.24 0.294 0.59 0.59 0.30 0.29 0.29 0.30hang et al.
Page 8 of 9
LSTM1 LSTM1 LSTM1 LSTM1…LSTM2 LSTM2 LSTM2 LSTM2…LSTM3 LSTM3 LSTM3 LSTM3…Fully Connected LayerFully Connected LayerFully Connected Layerflu prediction of 1,2,3, or 4 week(s) aheadX t-1
AustraliaBrazilCambodiaChinaEgyptFrench Guiana GhanaIndonesiaIranIraqIrelandJapan NetherlandsNicaraguaNigerNorwayPanamaPoland Republic of KoreaRussiaUKUSA X t X t+1 X n • max, min, mean, median of rolling windows• first order difference of rolling windows• second order difference of rolling windowswithout N/As
155 countriesfrom the 1st week of 2009 to the 18th week of 2018
Build Models:Features:Select countries:Acquire data:
Figure 1: The flow chart of our method. …X t-1 X t X t-52 X t+1 X t+2 X t+3 X t+4 LSTM_Model 1input
Multiple Single-Output Prediction (MSOP)
LSTM_Model 2LSTM_Model 3LSTM_Model 4 output
Figure 2: Multi-step prediction. hang et al.
Page 9 of 9 .
38 0 .
39 0 .
48 0 .
55 0 .
30 0 .
85 0 .
70 0 .
90 0 .
15 0 .
28 0 .
45 0 .
55 0 .
43 0 .
55 0 .
63 0 .
70 1 .
46 2 .
62 3 .
51 4 .
34 0 .
14 0 .
21 0 .
27 0 . .
31 0 .
41 0 .
52 0 .
66 0 .
28 0 .
34 0 .
42 0 .
43 0 .
21 0 .
35 0 .
56 0 .
75 0 .
31 0 .
50 0 .
60 0 .
78 1 .
28 1 .
71 3 .
10 3 .
67 0 .
17 0 .
29 0 .
45 0 . .
30 0 .
30 0 .
33 0 .
39 0 .
29 0 .
29 0 .
36 0 .
43 0 .
13 0 .
20 0 .
27 0 .
36 0 .
31 0 .
39 0 .
43 0 .
44 0 .
69 0 .
92 1 .
13 0 .
80 0 .
18 0 .
23 0 .
24 0 . MAPES OF PREDICTING FLU DATA OF THE 1-, 2-, 3-, AND 4-WEEK AHEAD WITH OTHER COUNTRIES' FLU DATA
MAPEs of RF MAPEs of SVM MAPEs of LSTM
Figure 3: The MAPES considering spatio-temporal features. .
37 0 . . .
53 0 . . .
30 0 . . .
29 0 . . .
41 0 . . .
66 1 . . .
08 5 . .
14 0 . .
24 0 . . .
41 0 . . .
29 0 . . .
42 0 . . .
56 0 . .
29 0 . .
57 0 . . .
66 2 . . .
18 0 . . . . . .
30 0 . .
23 0 . .
31 0 . . .
20 0 . . .
28 0 . . .
54 0 . . .
17 1 . . .
24 0 . . MAPES OF PREDICTING FLU DATA OF THE 1-, 2-, 3-, AND 4-WEEK AHEAD WITHOUT
OTHER COUNTRIES' FLU DATA
MAPEs of RF MAPEs of SVM MAPEs of LSTM
Figure 4: The MAPES without considering spatio-temporal features. .
30 0 .
30 0 .
33 0 .
39 0 . . .
36 0 .
43 0 .
13 0 .
20 0 .
27 0 . . .
39 0 .
43 0 .
44 0 .
69 0 .
92 1 . . .
18 0 .
23 0 .
24 0 . .
23 0 . . .
32 0 .
23 0 .
26 0 .
31 0 .
36 0 . . .
29 0 .
37 0 .
28 0 .
39 0 .
43 0 . . .
95 1 .
17 1 .
19 0 .
15 0 .
24 0 . . AUS AUS AUS AUS BRA BRA BRA BRA CHN CHN CHN CHN JPN JPN JPN JPN UK UK UK UK USA USA USA USA
COMPARISON OF MAPES OF PREDICTING FLU DATA OF THE 1-, 2-, 3-, AND 4-WEEK AHEAD WITH OTHER COUNTRIES' FLU DATA AND THOSE WITHOUT OTHER COUNTRIES' FLU DATA
MAPE with other countries
MAPE without other countriesother countries