Covid-19 Modeling towards socioeconomic and health data from New South Wales (NSW) -- Australia: An approach via Geospatial Analysis and Geographically Weighted Poisson Regression (GWPR)
GGEOSPATIAL ANALYSIS AND GEOGRAPHICALLY WEIGHTED POISSON REGRESSION (GWPR): CORONAVIRUS (COVID-19) OUTBREAKS MODELING IN NEW SOUTH WALES (NSW), AUSTRALIA
Francelino Antonio XAVIER CONCEICAO ([email protected] ; [email protected] ) School of Environment, Faculty of Science, University of Auckland, New Zealand
Abstract
The global regression and the geographically weighted poisson regression (GWPR) techniques are used to model and investigate relationships between the Coronavirus (covid-19) outbreaks and the socioeconomic as well as the pre-existing health conditions in New South Wales (NSW), Australia. Based on geospatial data analysis and step-by-step procedure in building GWR model, 4 (four) independent variables are finally selected to investigate relationships between the dependent variable and independent variables. The result of the GWPR model calibration with R range between 45-73% exhibits positive relationships between Coronavirus (covid-19) outbreaks and the total population, the cancers, and the people with age between 60 and 85 in most of the NSW state. Meanwhile, there is negative relationship between the Coronavirus (covid-19) and the ischaemic heart disease. Finally, the model suggests that the relationship between dependent variable and independent variables are non-stationary, and therefore GWPR model calibration take an important role in geographic modelling at local scale. Keywords : Coronavirus (covid-19) outbreaks, Geographically Weighted Poisson Regression (GWPR), Geospatial analysis and modeling rancelino A. Xavier Conceicao Introduction
The Coronavirus disease (covid-19) is “an infectious disease caused by a newly discovered coronavirus” (WHO, 2020), which was first reported from China on December 31 th , 2019 (WHO, 2020), and spread quickly throughout the world, and has resulted in a total of 8,525,042 cases with 456,973 deaths by June 20 th , 2020. This includes 3,144 cases with 48 deaths reported from New South Wales, Australia (Department of Health, Australian Government, 2020). WHO (2020) declare that elderly people including those with pre-medical health problems, such cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to be infected with serious illness. In this study, the author statistically analyses the spatial data as independent variables and geographically models the Coronavirus (covid-19) pandemic as dependent variable to investigate their relationship by local government area (LGA) in New South Wales (NSW), Australia. In this case, the global regression and the geographically weighted poisson regression (GWPR) approaches are used to globally and geographically model the relationship between the Coronavirus (covid-19) outbreaks and the socioeconomic data as well as the pre-existing medical health problems, particularly those susceptive to coronavirus outbreaks as declared by WHO. Materials & Methods . The state adjacent to the South Australia to the west, Queensland to the north, Victoria to the south, and Tasman Sea to the east. The location of the NSW state can be seen in the figure 1, along with the total coronavirus (covid-19) cases for each LGA between January 25 th , 2020 and April 24 th , 2020. Figure 1. Map of New South Wales with the total Coronavirus (covid-19) cases per local government area (LGA) by April 24th, 2020. rancelino A. Xavier Conceicao th , 2020 and April 24 th Table 1. List of dataset and variables accepted after literature review and Multicollinearity check as highlighted with green colour respectively
Group Variables Description Literature Review Multi collinearity Check S o c i o ec o n o m i c SQKM Square Kilometer of area per local government area (LGA)
Accepted Accepted
Accepted Rejected
Accepted Rejected
Accepted Rejected
Accepted Accepted
Lone_House Total Lone person households
Rejected Rejected
Group_House Total Group households
Rejected Rejected
Family_House Total Family households
Rejected Rejected
Tot_households Total households
Rejected Rejected
Couple_Children Total Couple families with children under 15 and/or dependent students
Rejected Rejected
Couple_NoDep Total Couple families with non-dependent children only
Rejected Rejected
Couple_NoChild Total Couple families without children
Rejected Rejected
OnePar_Children Total one parent families, children under 15 &/or dependent students
Rejected Rejected onePar_NoDep Total one parent families, non-dependent children only
Rejected Rejected
Tot_families Total families
Rejected Rejected
Tot_mar_reg Total married in a registered marriage
Rejected Rejected
Tot_mar_defact Total married in a de facto marriage
Rejected Rejected not_Marri Total not married
Rejected Rejected
Tot_married Total married
Rejected Rejected never_Marri Total never Married
Rejected Rejected
Tot_widowed Total widowed
Rejected Rejected
Tot_divorced Total divorced
Rejected Rejected
Tot_separated Total separated
Rejected Rejected rancelino A. Xavier Conceicao
TotPop Total population_2016
Accepted Accepted H e a l t h R ec o r d s Inf&Par_Dis Total case of infectious and parasitic diseases in 2017
Accepted Rejected
Cancers Total case of cancers in 2017
Accepted Accepted
End-nut-met Total case of endocrine, nutritional and metabolic diseases in 2017
Accepted Rejected
Diabetes Total case of diabetes in 2017
Accepted Accepted
Mental Total case of mental health related conditions in 2017
Accepted Rejected
Mood-Disord Total case of mood affective disorders in 2017
Accepted Accepted
Nervous_syst Total case of nervous system diseases in 2017
Accepted Rejected eye-adnexa Total case of eye and adnexa diseases in 2017
Accepted Rejected ear-mastoid Total case of ear and mastoid process diseases in 2017
Accepted Rejected circ_syst Total case of circulatory system diseases in 2017
Accepted Rejected
Ischaemic_heart Total case of ischaemic heart disease in 2017
Accepted Accepted
Heart_fail Total case of heart failure in 2017
Accepted Rejected
Stroke Total case of stroke in 2017
Accepted Accepted
Resp_syst Total case of respiratory system diseases in 2017
Accepted Accepted
Asthma Total case of asthma in 2017
Accepted Accepted
Obst_Pulmonary Total case of Chronic Obstructive Pulmonary Disease (COPD) in 2017
Accepted Accepted digestive_syst Total case of digestive system diseases in 2017
Accepted Rejected skin-subcunt Total case of skin and subcutaneous tissue diseases in 2017
Accepted Rejected
Musc_syst Total case of musculoskeletal system and connective tissue disease in 2017
Accepted Rejected
Gen_syst Total case of genitourinary system diseases in 2017
Accepted Rejected
Kidney Total case of chronic kidney disease in 2017
Accepted Accepted
Perinatal_cond Total case of certain conditions originating in the perinatal period in 2017
Accepted Rejected
Birth defects Total case of congenital malformations, deformations and chromosomal abnormalities in 2017
Accepted Rejected
Poison-others_ext Total case of injury, poisoning and other external causes in 2017
Accepted Rejected %smoke_Pregn Total percent smoking during pregnancy
Rejected Rejected
Tot_Chil_Immun Total children fully immunised
Rejected Rejected i = β + β X i1 + β X i2 + … β n X in (Equation 1) rancelino A. Xavier Conceicao i is the dependent variable, total Coronavirus (covid-19) cases in this case, measured at location i, X i represents the explanatory variables measured at the same location and β represents coefficient parameters that describe how a change in X (independent variable) affects Y (dependent variable). In relation to the geographically weighted regression (GWR) modelling, global model and its result parameters are important as benchmarks model to be compared to its GWR counterpart (Fotheringham et al., 2003). In GWR4 software, the global model is generated simultaneously with the GWR calibration with similar input parameters. 2.3 Geographically Weighted Regression Linear (GWR) Unlike global model, the geographically weighted regression linear is a nonstationary regression linear and as a function of spatial location (Fotheringham et al., 2003), where the estimated coefficient parameters vary over space. The formula for the GWR regression is expressed with variability over space as follows: Y i = β (i) + β (i) X i1 + β (i) X i2 + … β n (i) X in (Equation 2) where index i is added to the standard global regression calibration formula to express the variability over space locally. In this study, step-by-step procedure in building GWR model is adopted from (Fotheringham et al., 2003; Vanessa da Silva, 2015; Tenerelli et al., 2016; Mansley et al., 2015), as provided in the following six stages in order. 2.3.1 Stage 1: Variable selection The very first stage in regression linear is to appropriately select potential explanatory variables based on the literature review, input from experts and common knowledge. In this section, a total of 30 independent variables are selected from 50 variables of the dataset provided in the table 1. The 30 selected variables consist of 5 variables of the socioeconomic group and 25 variables from health group. 2.3.2 Stage 2: Multicollinearity check The multicollinearity between variables are assessed using “Spearman’s correlation rank” with the threshold correlation of 0.7 (Tenerelli et al., 2016). This is aiming to further filter variables with multicollinearity effect due to high degree of correlation between variables. At this stage, variables with correlation >0.7 are excluded except those that considered as susceptive to coronavirus outbreaks, such as elder people, cardiovascular disease, diabetes, chronic respiratory disease, and cancers (WHO, 2020). The matrix of Multicollinearity matrix can be seen in the table 2. Thus, a total of 18 variables further rejected and remain with 12 variables, which is consist of 3 socioeconomic variables and 9 health variables. rancelino A. Xavier Conceicao Table 2. Matrix of Multicollinearity using Spearman’s correlation rank (note: correlation value >0.7 are excluded except those considered as pre-condition for coronavirus outbreak as described in table 1).
SQKM TotPop Inf&Par_Dis Cancers End-nut-met Diabetes Mental Mood-Disord Nervous_syst eye-adnexa ear-mastoid circ_syst Ischaemic_heart Heart_fail Stroke Resp_syst Asthma Obst_Pulmonary digestive_syst skin-subcunt Musc_syst Gen_syst Kidney Perinatal_cond Birth defects Poison-others_ext 0-19yr 20-39yr 40-59yr 60-85yr
SQKM 1.0TotPop -0.3 1.0Inf&Par_Dis -0.3 1.0 1.0Cancers -0.3 0.9 0.9 1.0End-nut-met -0.3 1.0 1.0 1.0 1.0Diabetes -0.3 0.9 1.0 0.9 0.9 1.0Mental -0.3 1.0 0.9 0.9 0.9 0.9 1.0Mood-Disord -0.2 0.5 0.6 0.5 0.5 0.5 0.5 1.0Nervous_syst -0.3 1.0 0.9 1.0 1.0 0.9 1.0 0.5 1.0eye-adnexa -0.3 0.9 0.9 1.0 0.9 0.9 0.9 0.4 0.9 1.0ear-mastoid -0.3 0.9 0.9 1.0 1.0 0.9 0.9 0.5 1.0 0.9 1.0circ_syst -0.3 0.9 0.9 1.0 1.0 0.9 0.9 0.5 1.0 1.0 1.0 1.0Ischaemic_heart -0.3 0.9 0.9 1.0 1.0 0.9 0.9 0.5 0.9 0.9 0.9 1.0 1.0Heart_fail -0.3 0.9 1.0 0.9 0.9 0.9 0.9 0.5 0.9 0.9 0.9 0.9 0.9 1.0Stroke -0.3 0.9 0.9 1.0 0.9 0.9 0.9 0.4 0.9 1.0 0.9 1.0 0.9 0.9 1.0Resp_syst -0.3 1.0 1.0 0.9 1.0 0.9 0.9 0.5 0.9 0.9 1.0 1.0 0.9 1.0 0.9 1.0Asthma -0.2 0.7 0.7 0.7 0.7 0.7 0.7 0.5 0.7 0.6 0.7 0.7 0.7 0.7 0.7 0.7 1.0Obst_Pulmonary -0.2 0.9 0.9 0.9 0.9 0.9 0.9 0.5 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.7 1.0digestive_syst -0.3 1.0 1.0 1.0 1.0 0.9 1.0 0.5 1.0 0.9 1.0 1.0 1.0 1.0 0.9 1.0 0.7 0.9 1.0skin-subcunt -0.3 1.0 1.0 0.9 1.0 0.9 0.9 0.5 1.0 0.9 1.0 0.9 0.9 0.9 0.9 1.0 0.7 0.9 1.0 1.0Musc_syst -0.3 0.9 0.8 1.0 0.9 0.8 0.9 0.5 1.0 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.6 0.8 0.9 0.9 1.0Gen_syst -0.3 1.0 1.0 0.9 1.0 0.9 0.9 0.5 0.9 0.9 0.9 0.9 0.9 1.0 0.9 1.0 0.7 0.9 1.0 1.0 0.9 1.0Kidney -0.2 0.5 0.5 0.5 0.5 0.6 0.5 0.2 0.5 0.6 0.5 0.5 0.5 0.6 0.6 0.5 0.4 0.5 0.5 0.5 0.5 0.7 1.0Perinatal_cond -0.2 0.9 0.9 0.7 0.8 0.9 0.8 0.6 0.8 0.7 0.8 0.8 0.8 0.8 0.7 0.9 0.8 0.8 0.9 0.9 0.7 0.9 0.4 1.0Birth defects -0.3 0.9 0.9 0.8 0.9 0.9 0.9 0.6 0.9 0.8 0.9 0.9 0.9 0.9 0.8 0.9 0.8 0.8 0.9 0.9 0.8 0.9 0.5 1.0 1.0Poison-others_ext -0.3 1.0 1.0 1.0 1.0 0.9 1.0 0.5 1.0 0.9 1.0 1.0 0.9 1.0 0.9 1.0 0.7 0.9 1.0 1.0 0.9 1.0 0.5 0.8 0.9 1.00-19yr -0.3 1.0 1.0 0.9 1.0 0.9 0.9 0.5 0.9 0.8 0.9 0.9 0.9 0.9 0.9 1.0 0.7 0.9 1.0 1.0 0.9 1.0 0.5 0.9 1.0 1.0 1.020-39yr -0.3 0.9 0.9 0.8 0.9 0.8 0.9 0.5 0.9 0.8 0.8 0.8 0.8 0.9 0.8 0.9 0.6 0.8 0.9 0.9 0.8 0.9 0.4 0.8 0.9 0.9 0.9 1.040-59yr -0.3 1.0 1.0 0.9 1.0 0.9 1.0 0.5 1.0 0.9 1.0 0.9 0.9 0.9 0.9 1.0 0.7 0.9 1.0 1.0 0.9 1.0 0.5 0.9 0.9 1.0 1.0 0.9 1.060-85yr -0.3 1.0 0.9 1.0 1.0 0.9 0.9 0.5 1.0 1.0 0.9 1.0 1.0 1.0 1.0 1.0 0.7 0.9 1.0 1.0 0.9 1.0 0.5 0.8 0.9 1.0 0.9 0.8 1.0 1.0 rancelino A. Xavier Conceicao
Figure 2. Stepwise-AICc procedure for model quality optimisation (variables Mood-Disord, Obst_Pulmonary, SQKM, Diabetes and Stroke are excluded for the model optimisation) rancelino A. Xavier Conceicao
Table 3. Result of spatial variability test (negative “Diff of Criterion” indicates no spatial variability)
Variable F DOF for F test DIFF of Criterion-------------------- ------------------ ---------------- -----------------Intercept 0.750099 0.554 120.811 0.863945TotPop 5.127172 0.209 120.811 -0.652167Resp_syst 1.538115 0.313 120.811 0.224372Cancers 3.160455 0.132 120.811 -0.13478660-85yr 8.089597 0.139 120.811 -0.875815Kidney 0.618454 0.012 120.811 0.020069Ischaemic_heart 5.430100 0.074 120.811 -0.254740Asthma 0.355863 0.085 120.811 0.169835 rancelino A. Xavier Conceicao
Table 4. Global Regression Coefficient for each independent variable Results ) of 64% with AICc value of 2060. The global regression model suggests a positive relationship between variable total coronavirus (covid-19) and the variables total population, total cancers cases, total kidney cases and total asthma cases, while the rest are in negative relationship. Table 5. Results of the global regression model, showing independent variables along with global coefficient, their significance, R and AICc value Independent variable Value of global regression coefficient β n Intercept β =2.067951 TotPop β = 0.000016 Cancers β = 0.000369 β = -0.000013 Ischaemic_heart β = -0.000639 Estimated Z-valueIntercept 2.067951 62.750044TotPop 0.000016 24.295802Cancers 0.000369 13.43823560-85yr -0.000013 -2.532562Ischaemic_heart -0.000639 -6.940932Resp_syst -0.00047 -9.340436Kidney 0.000028 0.673674Asthma 0.000703 5.041009 rancelino A. Xavier Conceicao
10 | P a g e variables are statistically presented in the table 6 along with the AICc and R , while the individual local coefficient and their significance (t-value) are mapped and presented in the figures 3 – 8 including the R and standardised residuals maps. Detail discussion and interpretation of each map is presented in the section 4. The result of the GWR model indicates a good of fitness (R ) of 78% with AICc value of 1264. Table 6. Results of the GWR model, showing independent variables along with local statistical coefficient, R and AICc value Figure 3. Map of variable Total population, parameter estimate (left) and t-value (right)
Figure 4. Map of variable Cancers, parameter estimate (left) and t-value (right)
Mean STD Min MaxIntercept 1.625614 1.360355 -0.699802 3.013294TotPop 0.000003 0.000011 -0.000036 0.000012Cancers 0.000808 0.000656 0.000399 0.00238660-85yr 0.000079 0.000169 -0.000044 0.000442Ischaemic_heart -0.002685 0.003282 -0.009771 -0.000357 rancelino A. Xavier Conceicao
11 | P a g e
Figure 5. Map of variable people with age between 60 and 85 years, parameter estimate (left) and t-value (right)
Figure 6. Map of variable Ischaemic heart disease, parameter estimate (left) and t-value (right)
Figure 7. Map of goodness of fit (Pseudo R ) rancelino A. Xavier Conceicao
12 | P a g e
Figure 8. Map of standardised global residual (left) and standardised local residual (right) Discussion
The GWR calibration with poisson model exhibits an improvement in model quality, where the AICc decrease from 2060 in global model to 1264 in GWR, and the good of fitness (R ) increase from 64% in global model to 78% in GWR. The figures 3 – 6 show maps of the local parameter estimate (left maps) and statistically significant to the P<0.05 (right maps), which means t-value between -1.96 and 1.96 are not statistically significant as indicated by yellow colour in the t-value maps. Therefore, only red and green colour in the t-value maps are statistically significant. The figure 3 illustrates a positive relationship between variable total population and coronavirus (covid-19) in the eastern side of NSW (orange and red in the left map), where the t-value is statistically significant (red in the right map). Similarly, the figure 4 illustrates a positive relationship between variable cancers and coronavirus (covid-19) in the eastern side of NSW (green, yellow, orange, red in the left map), where the t-value is statistically significant (red in the right map). Both global and GWPR model suggest a positive relationship between these 2 variables and coronavirus (covid-19). Interestingly, the global model suggests a negative relationship between variable people with age between 60 and 85 years and coronavirus (covid-19), however the GWPR model in the figure 5 reveals a positive relationship between variable people with age between 60 and 85 years and coronavirus (covid-19) in most of the NSW area (red, orange, yellow, light green in the left map) except the very eastern side of the NSW area with a negative relationship (green), where t-value is statistically significant in most of the area (green and red in the right map). Meanwhile, the global model as well as the GWPR model in the figure 6 suggest a negative relationship between variable ischaemic heart disease and coronavirus (covid-19) in the eastern side of NSW (green, light green, orange, red in the left map), where t-value is statistically significant (green in the right map). The negative relationship between variable ischaemic heart disease and coronavirus (covid-19) observed in both models is contradicting to initial hypotheses by WHO (2020) as described in the section 1. However, the model coefficients are very small and close to zero. rancelino A. Xavier Conceicao
13 | P a g e The figure 7 exhibit that the local model replicates variations in variable coronavirus (covid-19) very well in the eastern side of NSW with R between 52-73%, but perform less well in the western NSW. Nevertheless, the goodness of fitness (R ) in the western NSW is not significantly poor and maintain its consistency between 45-52% hence, can be accepted. The standardised residuals for the global and local model (figure 8) clearly show that the global model is spatially autocorrelation compare to local model, which indicate that relationships between the dependent variable coronavirus (covid-19) and the independent variables are indeed non-stationary. This result indicates that the geographic variability should be taken into consideration and it proves the use of GWPR calibration in this study. Conclusions
The global regression and geographical weighted regression poisson model (GWPR) are used in this study to assess the relationship between Coronavirus (covid-19) and variables from socioeconomic and health sectors in local government area (LGA) in New South Wales (NSW). Through step-by-step procedure in building GWR model, the 50 initial variables are filtered and therefore remain with 4 variables for GWPR calibration. The GWPR results exhibit that the variables total population, cancers and people with age between 60 and 85 years are positively correlated with Coronavirus (covid-19). Meanwhile the variable ischaemic heart disease is negatively correlated with Coronavirus (covid-19), where model coefficient is very close to zero. The GWPR calibration successfully improve the model quality by 14% and the AICc value decrease from 2060 in global model to 1264 in GWPR. The good of fitness map indicates that the model perform well with R varies from 45-73% in the NSW state. The standardised residual for the GWPR model exhibits that the relationship between dependent variable and independent variables are non-stationary. Therefore, GWR calibration take an important role in geographic modelling at local scale. rancelino A. Xavier Conceicao
14 | P a g e References
Geographically weighted regression: The analysis of spatially varying relationships
John Wiley & Sons. Fotheringham, A. S., Kelly, M. H., & Charlton, M. (2013). The demographic impacts of the irish famine: Towards a greater geographical understanding.
Transactions of the Institute of British Geographers, 38 (2), 221-237. Konishi, S., & Kitagawa, G. (2008).
Information criteria and statistical modeling
Springer Science & Business Media. Mansley, E., & Demšar, U. (2015). Space matters: Geographic variability of electoral turnout determinants in the 2012 london mayoral election.
Electoral Studies, 40 , 322-334. Nakaya, T. (2016). GWR4.09 user manual.
GWR4.09 User Manual.GWR 4 Development Team,
Tenerelli, P., Demšar, U., & Luque, S. (2016). Crowdsourcing indicators for cultural ecosystem services: A geographically weighted approach for mountain landscapes.
Ecological Indicators, 64