[PDF] Analysis of the Effectiveness of Face-Coverings on the Death Rate of COVID-19 Using Machine Learning

Abstract

The recent outbreak of the COVID-19 shocked humanity leading to the death of millions of people worldwide. To stave off the spread of the virus, the authorities in the US, employed different strategies including the mask mandate (MM) order issued by the states' governors. Although most of the previous studies pointed in the direction that MM can be effective in hindering the spread of viral infections, the effectiveness of MM in reducing the degree of exposure to the virus and, consequently, death rates remains indeterminate. Indeed, the extent to which the degree of exposure to COVID-19 takes part in the lethality of the virus remains unclear. In the current work, we defined a parameter called the average death ratio as the monthly average of the ratio of the number of daily deaths to the total number of daily cases. We utilized survey data provided by New York Times to quantify people's abidance to the MM order. Additionally, we implicitly addressed the extent to which people abide by the MM order that may depend on some parameters like population, income, and political inclination. Using different machine learning classification algorithms we investigated how the decrease or increase in death ratio for the counties in the US West Coast correlates with the input parameters. Our results showed a promising score as high as 0.94 with algorithms like XGBoost, Random Forest, and Naive Bayes. To verify the model, the best performing algorithms were then utilized to analyze other states (Arizona, New Jersey, New York and Texas) as test cases. The findings show an acceptable trend, further confirming usability of the chosen features for prediction of similar cases.

Full PDF

AAnalysis of the Eﬀectiveness of Face-Coverings on theDeath Rate of COVID-19 Using Machine Learning

Ali Lafzi , Miad Boodaghi , Siavash Zamani , and NiyoushaMohammadshaﬁe Department of Agricultural and Biological Engineering, Purdue University, Indiana, USA School of Mechanical Engineering, Purdue University, Indiana, USA Department of Civil and Environmental Engineering, University of Pittsburgh, Pennsylvania,USA

Abstract

The recent outbreak of the COVID-19 shocked humanity leading to the death of mil-lions of people worldwide. To stave oﬀ the spread of the virus, the authorities in theUS, employed diﬀerent strategies including the mask mandate (MM) order issued bythe states’ governors. Although most of the previous studies pointed in the directionthat MM can be eﬀective in hindering the spread of viral infections, the eﬀectivenessof MM in reducing the degree of exposure to the virus and, consequently, death ratesremains indeterminate. Indeed, the extent to which the degree of exposure to COVID-19 takes part in the lethality of the virus remains unclear. In the current work, wedeﬁned a parameter called the average death ratio as the monthly average of the ratioof the number of daily deaths to the total number of daily cases. We utilized surveydata provided by New York Times to quantify people’s abidance to the MM order. Ad-ditionally, we implicitly addressed the extent to which people abide by the MM order a r X i v : . [ s t a t . M L ] F e b hat may depend on some parameters like population, income, and political inclina-tion. Using diﬀerent machine learning classiﬁcation algorithms we investigated howthe decrease or increase in death ratio for the counties in the US West Coast correlateswith the input parameters. Our results showed a promising score as high as 0.94 withalgorithms like XGBoost, Random Forest, and Naive Bayes. To verify the model, thebest performing algorithms were then utilized to analyze other states (Arizona, NewJersey, New York and Texas) as test cases. The ﬁndings show an acceptable trend,further conﬁrming usability of the chosen features for prediction of similar cases. The recent pandemic of COVID-19 has aﬀected millions of peoples worldwide and led to thetragic death of many innocent lives. The lack of a certain treatment at the beginning of pan-demic traumatized populace and the only solutions were limited to preventive actions such aswearing face coverings, maintaining social distancing, washing hands, and self-quarantine.Owing to the high transmission rate, only in the US, the number of new daily cases in-creased from 6 to 22562 during March 2020 according to CDC (Center for Disease Controland Prevention) [1]. There is still extensive ongoing research about the possible factors beingeﬀective in the pace of this spread; as of now, scientists have declared that meteorologicalfactors such as temperature, wind speed, precipitation and humidity are some of the impor-tant environmental parameters in this regard [2]. However, most of the parameters involvedin the spread of COVDID-19 are out of our control. As a result, state oﬃcials began toimpose legislative guidelines including mandatory use of masks and closure of businessessuch as bars and restaurants. Shutting down diﬀerent businesses has been sporadic due toits adverse economic impact, but obligatory face coverings order is still in eﬀect across theUS. In this respect, the eﬀectiveness of facial masks gains further importance and requiresscientiﬁc studies. 2resenting a model that can measure the eﬀectiveness of the mask mandate orders canpave the way for governments to take decisive actions during pandemics. The experimentaldata in tandem with mathematical modelings can be utilized to study the eﬀects of facialcoverings on the spread of viral infections. A plethora of previous publications have triedto address the eﬀectiveness of nonpharmaceutical interventions (NPIs) during pandemics,particularly for the spread of inﬂuenza [3, 4]. Deterministic models have been widely used tostudy the eﬀects of facial masks on the reproduction number R . Indeed, the face mask istaken into account by its role in reducing the transmission per contact [5]. The results of thedeterministic model indicated that public use of face masks delays the inﬂuenza pandemic.On the other hand, some studies suggest that the use of a face mask does not have a substan-tial eﬀect on inﬂuenza transmission and there is little evidence in favor of the eﬀectivenessof facial masks [6, 7]. As for the COVID-19, the eﬃcacy of the facial mask in impeding theinfectivity of the SARS-CoV-2 remains unclear. Having considered the eﬀects of mask inreproduction number R , Li et al. [8] claimed that wearing face masks alongside the socialdistancing can ﬂatten the epidemic curve. Other studies also pinpointed that public use ofa facial mask may contribute to the reduction in spread of COVID-19 [9]. Despite theseﬁndings, the eﬃcacy of face masks remains controversial.The cardinal point that has not garnered enough attention is the relationship between thedegree of exposure to the virus and its mortality rate. The idea that the severity of thesymptoms correlates with the extent of exposure to the COVID-19 was presented by someresearchers to justify the high death rate in healthcare workers [10]. Unfortunately, there isnot a universal trend that can predict the relationship between the dose of the virus and theseverity of the resulting symptoms. A study performed on the relationship between inﬂuenzaand rhinovirus viral load, and the severity in the upper respiratory tract infections reporteda diﬀerent behavior for those viruses [11]. In fact, the results indicated that for inﬂuenza Aand the rhinovirus, viral loads were not associated with hospitalization/ICU. On the other3and, for inﬂuenza B, viral load was higher in hospitalized/ICU patients. Furthermore, forRespiratory syncytial virus (RSV), viral load seems to correlate with the severity of symp-toms as many studies in the literature suggest that a correlation exists [12–14]. The samecontroversy holds for the COVID-19. Recently, some studies have tried to investigate theseverity of COVID-19 with its load, where they found that the load tightly correlates withthe severity [15, 16]. However, another study suggests that no such a correlation exists [17].To unveil whether COVID-19 viral load is related to disease severity requires an in-depthstudy, which involves infecting volunteers with controlled doses of virus and monitoring theirsymptoms. However, experimental challenges in addition to the ethicality of these experi-ments make this type of studies very challenging at this point [10]. Although studies havenot been convergent in whether nose [18] or mouth [19] is the primary site for COVID-19infection, they underscored the importance of wearing a facial mask as a barrier to the virusspread. Additionally, although the protection level of diﬀerent types of mask are diﬀer-ent, wearing any mask even a cloth mask is better than wearing nothing at all, which canplay a role in protection from the exposure to COVID-19 [20, 21]. Given the challenges ofthe experimental studies on the relationship between the extent of exposure and severity ofCOVID-19, one way to study whether the extent to which an individual is exposed to theCOVID-19 correlates with the severity of the symptoms is to introduce a model that cancapture changes in the mortality rate due to the wearing a facial mask. Indeed, if the ratio ofthe number of death to the number of cases decreases, this can support the hypothesis thatthere is a correlation between the viral load and the severity of symptoms. Thus, studyingthe eﬀects of MM order on the mortality rate gains extra importance.An ML analysis can be very useful to shed light on the possible correlation between thepublic use of mask and changes in the mortality rate. The success of implementing MachineLearning (ML) and Artiﬁcial Intelligence (AI) techniques in the previous pandemic has con-4inced researchers to use them as precious tools in ﬁghting against the current outbreak[22]. ML and AI can be used for prediction and forecasting in diﬀerent regions so that thecorresponding health oﬃcials can take essential actions in advance [22]. In addition, thistechnology is capable of enhancing the prediction accuracy for screening both infectious andnon-infectious diseases [23]. Six ML methods have been carried out to predict 1, 3, and 6days ahead the total number of conﬁrmed COVID-19 cases with error ranges of 0.87%–3.51%,1.02%–5.63%, and 0.95%–6.90%, respectively, in 10 Brazilian states [24]. Moreover, an MLmethod like XGBoost model was capable of identifying 3 important biomarkers from 485blood samples in Wuhan, China as the key mortality parameters [25]. ML algorithms alsohave been used to capture the correlation between the weather data, and COVID-19 mor-tality and transmission rates [26, 27]. Additionally, ML has been utilized to study the eﬀectsof MM order on the number of daily cases, where no signiﬁcant statistical diﬀerence wasobserved in the number of daily cases in state-wise analysis [28]. These studies conﬁrm thestrength of ML as a great tool to investigate the eﬀects of MM order on mortality rates ofCOVID-19.Another important factor regarding the eﬀectiveness of MM order is society’s adherenceto the regulations. One study that tried to quantify public compliance with COVID-19public health recommendations found notable regional diﬀerences in intent to follow healthguidelines [29]. Some studies noticed a correlation between level of education and intentto voluntarily adhere to social distancing guidelines [29, 30]. However, not only the levelof education but also level of income, race and political orientation can play a role in theadherence to the regulations [31]. Based on these ﬁndings, it’s important to take into ac-count the features that might be correlated with people’s compliance with the MM order.Additionally, we will use a data based on the survey provided by New York times availableon Github, which quantiﬁes people’s adherence to the MM order [32]. As a result, in thisstudy, we will include factors that might play a role in people’s adherence to the MM order5s our input features.In the proposed work, utilizing diﬀerent ML classiﬁcation algorithms, we aim to unveilhow the change in the mortality rate correlates with certain features. The features will bechosen in a way that they can reﬂect abidance by MM order in diﬀerent counties. We willuse the data provided by CDC to ﬁnd the average monthly number of COVID-19 cases. Ad-ditionally, the exact dates of the executive orders signed by the state oﬃcials are availablefor each state. To have appropriate unbiased data, similar to what Maloney et al. [28] hasdone in his study of the eﬀect of mask mandate, we will be using the data for one monthafter and before the executive orders for each preventive measure for the three states in USWest Coast. Indeed, with this data selection method, we limit the geographical region of thestudy to ensure that changes in the cases are highly attributed to the public use of masksrather than other factors such as environmental changes.As a veriﬁcation of the proposed work, the best performing algorithms are further chosen withthe calculated hyper-parameters for testing four additional states (Arizona, New Jersey, NewYork and Texas). The ﬁndings demonstrate an acceptable accuracy scores, which justiﬁesthe correlation of the chosen features with the eﬀect of COVID-19.The rest of the paper is organized as follows. First, we will represent how our data wascollected and arranged. Then we will explicate the ML methods we have used for ourprediction. Finally, we will represent and compare the results obtained from diﬀerent MLmethods. 6 Methodology

In this section, we will explain the collected data and the ML algorithms used for the trainingand prediction.

We deﬁned the parameter of interest as the average ratio of the number of deaths to thetotal number of cases, referred to as the death ratio, which can be interpreted as a measureof the severity of the disease. The eﬀective date of the executive orders by the governors,requiring mask mandate at all the counties in the three West Coast states of California,Oregon and Washington has been identiﬁed, which is publicly available [33]. We used theaverage death ratio one month before and after the order to study the mortality rate. Therationale behind this selection is to minimize the eﬀects of other factors that might play rolein changing the COVID-19 data. The raw dataset for the daily cases and deaths for all theUS counties over time is extracted from the USAFACTS website [34], where county-leveldata is conﬁrmed by the state and local agencies directly. After obtaining the daily values ofdeath and case numbers for a month before and after the MM order, we divided the monthlyaverage number of deaths by the monthly average number of cases for each county. Then wefound the diﬀerence between the death ratio for one month before and after the MM order.Finally, we categorized the variation based on its sign to quantify whether the death ratioincreases, decreases, or no change occurs. Out of the 130 samples, 47, 30, and 53 of thembelong to the ”decrease”, ”increase”, and ”no change” classes, respectively. We dropped the”no change” data as they all correspond to small counties, where there were zero reportedCOVID-19 cases and deaths, leaving 77 counties in total. Consequently, the two categories ofincrease (denoted by class 0) and decrease (shown by class 1) are remained for the predictiontask. A histogram of the output classes is shown in the Fig. (1), which expresses that thedata is not biased. 7igure 1: Histogram of change in death ratio for the three statesSince it is not known exactly what percentage of population follows the MM order anduse face coverings, it is necessary to come up with features that can capture how likely isan individual to follow the recommended practice. For bridging this gap, four main featuresare chosen as primary indicators which are listed below:1. County Population2. Median Household Income3. Political Inclination4. Mask Usage based on New York Times SurveyPopulation in each county is obtained from the most recent surveys for the year 2019. Theincome level is the median household in each county in the years 2015-2019. The raw datafor these features is all obtained from the US Census website [35]. The US Census measuresthe median income as the regular income received excluding other payments like tax, etc[36]. The data for the political inclination is constructed based on the 2020 US presidentialelection results [37]. This feature has been converted to the categorical type in a vectorized8anner, i.e. the winner takes the value of 1 in the column, and the opponent takes 0 in theirown. Furthermore, we used a survey data, provided by the New York Times, that quantiﬁesthe mask usage from 7/2/2020 to 7/14/2020 [32]. Since the survey timeline lies within themonth after the MM order for all three studied states, it is valid to use its data for ourpurpose. Finally, we will try to establish an AI-based relationship between the features andthe death ratios of the Paciﬁc Coast states at the county level using 9 diﬀerent classiﬁcationalgorithms, provided in section 2.2.

In this study, we have developed machine learning models to correlate the speciﬁed featuresmentioned in section 2.1 with the aim of shedding light on the relationship between adherenceto mask mandate and mortality rate.Classic ML methods of Logistic Regression [38] and Naive Bayes classiﬁer [39] are used. Inaddition, ensemble learning-based models, Random Forest and Extra Trees, are also analyzed[40]. Moreover, the extreme boosting method, XGBoost is explored [41]. Other methodssuch as Support Vector Machine, K-Nearest Neighbors [42], Decision Trees [43], and NeuralNetwork [44] are additionally used for prediction of eﬀect of Mask Mandate on mortalityrate.It should be noted that for carrying out the analysis, the data is split into training and testsets, with a test size of 20%. A k-fold cross validation scheme with 5 folds has been used toevaluate the performance of each method on the validation set and tune its hyper-parameterswith the classiﬁcation accuracy as the metric accordingly. The hyper-parameter tuning isdone using either grid search or random search for all the methods. A statistical summaryof the ﬁnal dataset for the purpose of binary classiﬁcation is outlined in the table 1, whichindicates a large diﬀerence between the orders of magnitudes of the features. Therefore,min-max and max-abs scaling have been used to transform the input features and output,respectively, before passing the data to the ML algorithms for training.9able 1: Statistical summary of the ﬁnal dataset before scaling. Columns are P:population,MI:median income, Dem:voted democratic, Rep:voted republican. Mask usage - N:never,R:rarely, S:sometimes, F:frequently, A:always. DR:change in death ratio between one mothbefore and after the corresponding MM order date

P MI Dem Rep Mask Usage DR(%)N R S F ACount

77 77 77 77 77 77 77 77 77 77

Mean

Std

Min

Max

The change in death ratio from one month before to one month after the date of mandatingface-covering in the three states is visualized for each county in Fig.(2). Two clusters ofincrease in death ratio can be seen, one near northern Washington, and one near centralCalifornia. Our ﬁrst intuition was that by increasing population, the chance of viral spreadwould increase, and therefore we expected to see a positive change in death ratio for morepopulated counties. However, as it can be seen from the map, there is an inherent randomnesswhich deﬁes our initial intuition about the spread mechanism. Further, it is shown that morecounties experienced a decrease in death ratio one month after the usage of face-covering wasmandated by each state, as shown in Fig. (1). Therefore, usage of face-covering is chosen asthe main factor aﬀecting the decrease of the change in death ratio. As explained previously,to quantify adherence to the mask mandate, other auxiliary features are chosen, namely,population, median income, and political inclination for each county.As a preliminary analysis, political inclination, based on the result of the 2020 presidentialelection, is chosen as the focal criterion to categorize the data for changes of death ratio forall three states, as presented in Fig. (3). Fig. 3(a) shows that in general, communities thatvoted republican in presidential election of 2020 were aﬀected worse compared to democratic10ounties. Further, a noticeable correlation is observed between average median income andthe change of death ratio, presented in Fig. 3(b). It is shown that, on average, the commu-nities with less median income experienced a positive change in death ratio, meaning moremortality rate regardless of their political inclination. The strongest correlation however isobserved by considering county population, shown in Fig. 3(c). The counties with fewerresidents were aﬀected more adversely by the pandemic compared to high-population coun-ties. The counter intuitive relation between population and change in death ratio furthercorroborates necessity of inclusion of the two other supplementary features.Figure 2: Change in death ratio in US West Coast states countiesTo have an initial assessment of the variation of percent change in the death ratio, weplotted the percent death ratio as functions of population, median income, and percent ofthe population that frequently uses mask, which has a relatively high correlation coeﬃcient.Fig 4 a-c shows no detectable pattern between parameters of interest and death ratio. As aresult, it is not possible to predict the value of change in the death ratio using regression. Onthe other hand, as we will show, converting changes to categories of increase and decrease11 a) (b)(c)

Figure 3: Visualization of the combined data for California, Oregon and Washington. Changein death ratio and (a) representation of number of counties (b) median income (c) averagepopulation, based on political inclination.would pave the way for capturing the status of change. A summary of the overall deathrates in the months before and after the mask mandate order for the 3 states is presented intable 2. It can be observed that change in death ratio signiﬁcantly decreases in Californiaand Washington, but slightly increases in Oregon. This suggests an intrinsically complexpattern between the death rate as the desired output and the selected inputs. Accordingto a recent study, there is a number of factors attributing to the possibility of a person tofollow or not follow the health guidelines set by the state oﬃcials [31]. Three features amongthese parameters plus the mask usage as the fourth feature have been used to conduct thecurrent study.Using the obtained data, the combined eﬀect of features is analyzed on the death ratio. Then12able 2: Total death rates in the month before and after the corresponding date of themandatory face coverings executive order in each state

State 1 month before MM order 1 month after MM order Change (%)California 63.13 32.67 -48Washington 28.16 21.15 -25Oregon 38.03 39.14 +3 −10−505 C h a n g e i n D e a t h R a t i o [ % ] (a) 40000 60000 80000 100000 120000Median Income [US$]−10−505 (b)0.10 0.15 0.20 0.25 0.30Po tion of People Using Mask F equentl$−10−505 C h a n g e i n D e a t h R a t i o [ % ] (c) Figure 4: Scatter plot of the percent change in the death ratio as a function of a) populationb) median income c) percent people frequently using mask.the performance of each algorithm is evaluated for test and train sets. The eﬀect of eachfeature on the change of death ratio is visualized by the correlation heatmap provided in theﬁgure 5. Each row of the correlation matrix is an appropriate indicator of how correlatedthat feature is with change in death ratio. A more negative value implies that increase of thatspeciﬁc feature is positively correlated by a decrease in change of death ratio. For instance,increase in population, median income, and votes for democratic party would result in adecrease in change of death ratio. On the other hand, the positive correlation for republicanvotes leads to a higher change of positive increase in death ratio. An interesting observation13s the disordered correlation pattern for mask usage. It can be seen that, as one expects,increasing the number never and rarely mask users is positively correlated with change indeath ratio. However, the data associated with frequently mask users have resulted in apositive correlation value. Such erratic correlation behavior necessitates inclusion of otherfeatures in the analysis.

ΔDRPopulationMedian incomeDemoc atRepublicanNeve Ra elySometimesF equentlyAlwaysΔDR -0.16-0.17-0.230.230.0230.13-0.00420.2-0.151 −0.20.00.20.40.60.81.0

Figure 5: Correlations between the features and the outputAll implemented algorithms in this study are capable of providing us with high classiﬁcationaccuracy i.e, to predict whether a county has experienced a decrease in its death ratios or anincrease. As provided in Table (3), it can be seen that in general, most of the algorithms haverelatively high accuracy scores for both training and test sets. The lowest accuracy comesfrom neural net algorithm with a score of 63% for the test set. This could be a result of thelow sample data set. In general, neural network would incrementally increase in accuracy byproviding more training data set. In our case, we were limited by the existing data.Despite the lack of suﬃcient training data set, Naive Bayes, Random Forest, and XGBoosthave an accuracy of 94%. The selected hyper-parameters for XGBoost and Random Forestclassiﬁers are shown in table 4. The random search method has been done to tune thesehyper-parameters for XGBoost, and grid search is used for Random Forest. Naive Bayes does14able 3: Train and test accuracies for all the studied algorithms.

Algorithm Train Accuracy Test Accuracy

Support Vector Machine 0.82 0.81Decision Tree 1.00 0.81KNN 0.74 0.69Logistic Regression 0.79 0.75Neural Net 0.75 0.63Extra Trees 0.93 0.81Naive Bayes 0.7 0.94Random Forest 1.00 0.94XGBoost 0.98 0.94not have any important hyper-parameter because of which, it has the capability of beinggeneralized well. Random Forest and XGBoost also have the popularity of rarely over-ﬁttingthe data. These reasons could be why these three algorithms have outperformed the others.Table 4: Model Parameters for XGBoost and Random Forest. Columns of XGBoost -CSbT:column sample by tree, G:gamma, LR:learning rate, MD:max depth, NE:number ofestimators, S:subsamples, RS:random state. Columns of Random Forest - MD:maximumdepth of the tree, MF:number of features for best split, MSS:minimum number of samplesto split an internal node, NE:number of estimators.

Extreme Gradient Boosting

CSbT G LR MD NE S RS0.9605 0.4735 0.0975 4 119 0.6232 27

Random Forest

MD MF MSS NE7 2 2 10Using the calculated hyper-parameters from the best performing algorithms, it would bepossible to predict eﬀect of similar viral illnesses in future. To verify the legibility of the pro-posed work, the best performing algorithms (Naive Bayes, Random Forest, and XGBoost),were chosen with the computed hyper-parameters to process the data for four additionalstates, namely, Arizona, New Jersey, New York, and Texas. For choosing states for testingpurposes, three main criteria were considered: ( i ) availability of data provided by NY Timessurvey ( ii ) population ( iii ) versatility of death rate ratio. The NY Times mask usage surveyis only available for the time period of interest, July 2 nd -14 th ; therefore, the month after the15orresponding MM order should contain this period for validity of our analysis. The chosenstates all have high population. Lastly, Arizona, New Jersey, and New York all experienceda negative change of death ratio, while Texas suﬀered signiﬁcant losses in the month afterthe MM was placed, as shown in the Table (5). Inclusion of cases with extreme positive andnegative change of death ratio was done deliberately to assay functionality of the selected al-gorithms. The accuracy score for the processed algorithms on these four states are presentedin the Table (6).Table 5: Total death rates in the month before and after the corresponding date of themandatory face coverings executive order for test states State 1 month before MM order 1 month after MM order Change (%)Arizona 45.06 38.88 -14New Jersey 910.12 126.04 -86New York 240.14 113.73 -53Texas 197.30 608.66 +208Table 6: Accuracy results for the four states of Arizona, New Jersey, New York, and Texas.

Algorithm Test Accuracy

Naive Bayes 0.76Random Forest 0.68XGBoost 0.69It should be noted that the results of the three west coast states were chosen as training dataset. The entire data from the four states is treated as test data set. Hence, it is expectedfor the accuracy score to drop for testing the additional states. However, the trend of highaccuracy for train and test data sets, signiﬁes the existence of a pattern between the chosenfeatures and the change in death ratio.For instance, against the common belief that highly populated areas might experience harshereﬀect of COVID-19, in the west coast of the United States, the areas with lower popula-tion endured worse conditions. Further, the result of this work would further signify theimportance of political leadership in guiding communities and ensuring the well-being of thegeneral public. Additionally, such a modeling approach could be used to optimize distribu-16ion of services and media coverage for possible future adversities. A possible solution fordecreasing eﬀect of future pandemics such as COVID-19 would be improving media coverageand public knowledge, especially in more vulnerable areas.

In this body of work, we have analyzed the eﬀect of mask covering on the intensity ofspread of the COVID-19 virus by considering the death ratio at the county level to be theprimary indicator. To bridge the gap between level of adherence to mask mandate, four mainfeatures are used as input data, population, income, political inclination, and the results ofthe survey on mask usage from New York Times. The change in the death ratio is used asthe metric to quantify the eﬀectiveness of face-coverings on the COVID-19 spread. Afterextracting and reﬁning the data-set from reliable sources, we analyzed the information using9 diﬀerent algorithms. Among all the methods used, Random Forest, XGBoost, and NaiveBayes had the best performance with a classiﬁcation accuracy of 94%. The high performingalgorithms, with the computed hyper-parameters, are then used to process four additionalstates, Arizona, New Jersey, New York, and Texas entirely used as test data set. Theacceptable accuracy results for the large test case, further veriﬁes legibility of the chosenfeatures as inﬂuential criteria for modeling purposes. The obtained hyper-parameters forthese models (except for Naive Bayes) can now be used to predict future conditions of thespread of the virus.It is shown that, in most of the counties, there exist a connection between adherence to themask mandate and change in death ratio. The ﬁndings of this work emphasizes importanceof immediate legislative action on well-being of societies. It is hoped that the ﬁndings of thiswork, highlight importance of socioeconomic and political settings on behavior of diﬀerentcommunities, which as portrayed could be complex and counter-intuitive. For instance, if themask mandate had been issued earlier, with better implementation procedures along with17ﬀective incentives targetted at speciﬁc communities, more people would be encouraged toabide by the issued ordinance, and consequently, fewer individuals and families would havebecome the victim of the pandemic. 18 eferences [1] centers for Disease Control and Prevention, “Previous u.s. covid-19 case data.” . cdc . gov/coronavirus/2019-ncov/cases-updates/previouscases . html .[2] M. H. Shakil, Z. H. Munim, M. Tasnia, and S. Sarowar, “Covid-19 and the environment:A critical review and research agenda,” Science of the Total Environment , p. 141022,2020.[3] A. E. Aiello, R. M. Coulborn, T. J. Aragon, M. G. Baker, B. B. Burrus, B. J. Cowling,A. Duncan, W. Enanoria, M. P. Fabian, Y.-h. Ferng, et al. , “Research ﬁndings fromnonpharmaceutical intervention studies for pandemic inﬂuenza and current gaps in theresearch,”

American journal of infection control , vol. 38, no. 4, pp. 251–258, 2010.[4] P. Saunders-Hastings, J. A. Crispo, L. Sikora, and D. Krewski, “Eﬀectiveness of personalprotective measures in reducing pandemic inﬂuenza transmission: A systematic reviewand meta-analysis,”

Epidemics , vol. 20, pp. 1–20, 2017.[5] N. C. Brienen, A. Timen, J. Wallinga, J. E. Van Steenbergen, and P. F. Teunis, “Theeﬀect of mask use on the spread of inﬂuenza during a pandemic,”

Risk Analysis: AnInternational Journal , vol. 30, no. 8, pp. 1210–1218, 2010.[6] J. Xiao, E. Y. Shiu, H. Gao, J. Y. Wong, M. W. Fong, S. Ryu, and B. J. Cowling, “Non-pharmaceutical measures for pandemic inﬂuenza in nonhealthcare settings—personalprotective and environmental measures,”

Emerging infectious diseases , vol. 26, no. 5,p. 967, 2020.[7] B. Cowling, Y. Zhou, D. Ip, G. Leung, and A. E. Aiello, “Face masks to preventtransmission of inﬂuenza virus: a systematic review,”

Epidemiology & infection , vol. 138,no. 4, pp. 449–456, 2010. 198] T. Li, Y. Liu, M. Li, X. Qian, and S. Y. Dai, “Mask or no mask for covid-19: A publichealth and market study,”

PloS one , vol. 15, no. 8, p. e0237691, 2020.[9] V. C. Cheng, S.-C. Wong, V. W. Chuang, S. Y. So, J. H. Chen, S. Sridhar, K. K. To,J. F. Chan, I. F. Hung, P.-L. Ho, et al. , “The role of community-wide wearing of facemask for control of coronavirus disease 2019 (covid-19) epidemic due to sars-cov-2,”

Journal of Infection , 2020.[10] S. Caddy, “Coronavirus: does the amount of virus you are exposed to determine howsick you’ll get?.” https://theconversation . com/coronavirus-does-the-amount-of-virus-you-are-exposed-to-determine-how-sick-youll-get-135119 .[11] A. Granados, A. Peci, A. McGeer, and J. B. Gubbay, “Inﬂuenza and rhinovirus viral loadand disease severity in upper respiratory tract infections,” Journal of Clinical Virology ,vol. 86, pp. 14–19, 2017.[12] E. T. Martin, J. Kuypers, J. Heugel, and J. A. Englund, “Clinical disease and viralload in children infected with respiratory syncytial virus or human metapneumovirus,”

Diagnostic microbiology and infectious disease , vol. 62, no. 4, pp. 382–388, 2008.[13] M. Houben, F. Coenjaerts, J. Rossen, M. Belderbos, R. Hoﬂand, J. Kimpen, andL. Bont, “Disease severity and viral load are correlated in infants with primary respi-ratory syncytial virus infection in the community,”

Journal of medical virology , vol. 82,no. 7, pp. 1266–1271, 2010.[14] J. P. DeVincenzo, C. M. El Saleeby, and A. J. Bush, “Respiratory syncytial virusload predicts disease severity in previously healthy infants,”

The Journal of infectiousdiseases , vol. 191, no. 11, pp. 1861–1868, 2005.[15] Y. Liu, W. Liao, L. Wan, T. Xiang, and W. Zhang, “Correlation between relativenasopharyngeal virus rna load and lymphocyte count disease severity in patients withcovid-19,”

Viral immunology , 2020. 2016] J. Fajnzylber, J. Regan, K. Coxen, H. Corry, C. Wong, A. Rosenthal, D. Worrall,F. Giguel, A. Piechocka-Trocha, C. Atyeo, et al. , “Sars-cov-2 viral load is associatedwith increased disease severity and mortality,”

Nature communications , vol. 11, no. 1,pp. 1–9, 2020.[17] X. He, E. H. Lau, P. Wu, X. Deng, J. Wang, X. Hao, Y. C. Lau, J. Y. Wong, Y. Guan,X. Tan, et al. , “Temporal dynamics in viral shedding and transmissibility of covid-19,”

Nature medicine , vol. 26, no. 5, pp. 672–675, 2020.[18] Y. J. Hou, K. Okuda, C. E. Edwards, D. R. Martinez, T. Asakura, K. H. Dinnon III,T. Kato, R. E. Lee, B. L. Yount, T. M. Mascenik, et al. , “Sars-cov-2 reverse geneticsreveals a variable infection gradient in the respiratory tract,”

Cell , vol. 182, no. 2,pp. 429–446, 2020.[19] N. Huang, P. Perez, T. Kato, Y. Mikami, K. Okuda, R. C. Gilmore, C. D. Conde,B. Gasmi, S. Stein, M. Beach, et al. , “Integrated single-cell atlases reveal an oral sars-cov-2 infection and transmission axis,” medRxiv , 2020.[20] Y. Goh, B. Y. Tan, C. Bhartendu, J. J. Ong, and V. K. Sharma, “The face mask howa real protection becomes a psychological symbol during covid-19?,”

Brain, behavior,and immunity , 2020.[21] S. K. Sharma, M. Mishra, and S. K. Mudgal, “Eﬃcacy of cloth face mask in preventionof novel coronavirus infection transmission: A systematic review and meta-analysis,”

Journal of education and health promotion , vol. 9, 2020.[22] S. Lalmuanawma, J. Hussain, and L. Chhakchhuak, “Applications of machine learningand artiﬁcial intelligence for covid-19 (sars-cov-2) pandemic: A review,”

Chaos, Solitons& Fractals , p. 110059, 2020.[23] S. Agrebi and A. Larbi, “Use of artiﬁcial intelligence in infectious diseases,” in

ArtiﬁcialIntelligence in Precision Health , pp. 415–438, Elsevier, 2020.2124] M. H. D. M. Ribeiro, R. G. da Silva, V. C. Mariani, and L. dos Santos Coelho, “Short-term forecasting covid-19 cumulative conﬁrmed cases: Perspectives for brazil,”

Chaos,Solitons & Fractals , p. 109853, 2020.[25] L. Yan, H.-T. Zhang, J. Goncalves, Y. Xiao, M. Wang, Y. Guo, C. Sun, X. Tang, L. Jing,M. Zhang, et al. , “An interpretable mortality prediction model for covid-19 patients,”

Nature Machine Intelligence , pp. 1–6, 2020.[26] Z. Malki, E.-S. Atlam, A. E. Hassanien, G. Dagnew, M. A. Elhosseini, and I. Gad,“Association between weather data and covid-19 pandemic predicting mortality rate:Machine learning approaches,”

Chaos, Solitons & Fractals , vol. 138, p. 110137, 2020.[27] L. K. Shrivastav and S. K. Jha, “A gradient boosting machine learning approach inmodeling the impact of temperature and humidity on the transmission rate of covid-19in india,”

Applied Intelligence , pp. 1–13, 2020.[28] M. J. Maloney, N. J. Rhodes, and P. R. Yarnold, “Mask mandates can limit covid spread:Quantitative assessment of month-over-month eﬀectiveness of governmental policies inreducing the number of new covid-19 cases in 37 us states and the district of columbia,” medRxiv , 2020.[29] R. P. Lennon, S. M. Sakya, E. L. Miller, B. Snyder, T. Yaman, A. E. Zgierska, M. T.Ruﬃn, and L. J. Van Scoy, “Public intent to comply with covid-19 public health recom-mendations,”

HLRP: Health Literacy Research and Practice , vol. 4, no. 3, pp. e161–e165,2020.[30] S. Sathianathan, L. J. Van Scoy, S. M. Sakya, E. Miller, B. Snyder, E. Wasserman,V. M. Chinchilli, J. Garman, and R. P. Lennon, “Knowledge, perceptions, and preferredinformation sources related to covid-19 among healthcare workers: Results of a crosssectional survey,”

American Journal of Health Promotion , p. 0890117120982416, 2020.2231] B. D. Weiss, M. K. Paasche-Orlow, et al. , “Disparities in adherence to covid-19 publichealth recommendations,”

HLRP: Health Literacy Research and Practice , vol. 4, no. 3,pp. e171–e173, 2020.[32] N. Y. Times, “Mask-wearing survey data.” https://github . com/nytimes/covid-19-data/tree/master/mask-use .[33] A. Markowitz, “State-by-state guide to face mask requirements.” . aarp . org/health/healthy-living/info-2020/states-mask-mandates-coronavirus . html .[34] USAFACTS, “Usa coronavirus cases and deaths.” https://usafacts . org/visualizations/coronavirus-covid-19-spread-map/state/oregon .[35] USCensus, “United states census bureau.” . census . gov .[36] USCensus, “United states census bureau.” . census . gov/library/visualizations/interactive/2014-2018-median-household-income-by-county . html .[37] POLITICO, “Us 2020 presidential election results.” . politico . com/2020-election/results/president/ .[38] V. K. Ayyadevara, “Pro machine learning algorithms,” Apress: Berkeley, CA, USA ,2018.[39] W. Richert,

Building machine learning systems with Python . Packt Publishing Ltd,2013.[40] O. Steinki and Z. Mohammad, “Introduction to ensemble learning,”

Available at SSRN2634092 , 2015.[41] W. Liang, S. Luo, G. Zhao, and H. Wu, “Predicting hard rock pillar stability usinggbdt, xgboost, and lightgbm algorithms,”

Mathematics , vol. 8, no. 5, p. 765, 2020.2342] I. Gad and D. Hosahalli, “A comparative study of prediction and classiﬁcation modelson ncdc weather data,”

International Journal of Computers and Applications , pp. 1–12,2020.[43] Priyanka and D. Kumar, “Decision tree classiﬁer: a detailed survey,”

InternationalJournal of Information and Decision Sciences , vol. 12, no. 3, pp. 246–269, 2020.[44] L. Deng and Y. Liu,