[PDF] Data Mining Approach to Analyze Covid19 Dataset of Brazilian Patients

Abstract

The pandemic originated by coronavirus(covid-19), name coined by World Health Organization during the first month in 2020. Actually, almost all the countries presented covid19 positive cases and governments are choosing different health policies to stop the infection and many research groups are working on patients data to understand the virus, at the same time scientists are looking for a vacuum to enhance imnulogy system to tack covid19 virus. One of top countries with more infections is Brazil, until August 11 had a total of 3,112,393 cases. Research Foundation of Sao Paulo State(Fapesp) released a dataset, it was an innovative in collaboration with hospitals(Einstein, Sirio-Libanes), laboratory(Fleury) and Sao Paulo University to foster reseach on this trend topic. The present paper presents an exploratory analysis of the datasets, using a Data Mining Approach, and some inconsistencies are found, i.e. NaN values, null references values for analytes, outliers on results of analytes, encoding issues. The results were cleaned datasets for future studies, but at least a 20\% of data were discarded because of non numerical, null values and numbers out of reference range.

Full PDF

DData Mining Approach to AnalyzeCovid19 Dataset of Brazilian Patients

Josimar Edinson Chire Saire [email protected] Institute of Mathematics and Computer Science (ICMC)University of Sao Paulo (USP)Sao Carlos, SP, Brazil

Abstract.

The pandemic originated by coronavirus(covid-19), name coinedby World Health Organization during the ﬁrst month in 2020. Actually,almost all the countries presented covid19 positive cases and govern-ments are choosing diﬀerent health policies to stop the infection andmany research groups are working on patients data to understand thevirus, at the same time scientists are looking for a vacuum to enhanceimnulogy system to tack covid19 virus. One of top countries with moreinfections is Brazil, until August 11 had a total of 3,112,393 cases. Re-search Foundation of Sao Paulo State(Fapesp) released a dataset, it wasan innovative in collaboration with hospitals(Einstein, Sirio-Libanes),laboratory(Fleury) and Sao Paulo University to foster reseach on thistrend topic. The present paper presents an exploratory analysis of thedatasets, using a Data Mining Approach, and some inconsistencies arefound, i.e. NaN values, null references values for analytes, outliers on re-sults of analytes, encoding issues. The results were cleaned datasets forfuture studies, but at least a 20% of data were discarded because of nonnumerical, null values and numbers out of reference range.

Keywords: data mining, data science, covid-19, coronavirus, brazil,sars-cov2, south america

The outbreak of Coronavirus(Covid19) started with ﬁrst cases on December2019, in Wuhan(China). The ﬁrst reported case[4] in South America was inBrazil on 26 February 2020, in So Paulo city. The strategy to stop the infectionsin the country was a partial lockdown to avoid the propagation of the virus.On 28 January 2020, Ministry of Health of Brazil reported a suspected caseof Covid19 in Belo Horizonte, Minas Gerais state, recently one student returnedfrom China [1], [13]. The same day were reported two suspected cases in PortoAlegre and Curitiba [5]. The ﬁrst conﬁrmed COVID-19 case [11] were reportedin Brazil, a man of 61-year-old who returned from Italy. The patient was testedin Israelita Einstein Hospital in Sao Paulo state. On 14 May[12], more than 200000 cases were conﬁrmmed, this number double during the ﬁrst days of May. a r X i v : . [ c s . C Y ] A ug Josimar Edinson Chire Saire [email protected]

Until August 11, the numbers of Brazil are: total of 3,112,393 cases, with anincreasing rate of new cases of 44,255(+1.4%) and a total of 2,243,124 recoveredcases.Nowdays, many scientists are working around coronavirus covid19, but search-ing for conducted studies in South America, there is only a few number. Aftera searching in IEEX Xplorer using coronavirus, covid19 terms, one paper withBrazilian Aﬃliation is found [18], related to data augmentation for covid19 de-tection. Considering a preprint repository related to Medicine(Medxriv), usingterms: covid19, coronavirus, data mining more than 50 papers are found.The table 1 presents the top 10 results of MedxRiv query. Four of this papersis a conducted study for South America countries and there is any work analyzingBrazilian context. In spite of, there is 4 papers with Brazilian Aﬃliation. Author Title Countryof Study Keywords Aﬃliation [8] Covid19 Surveillance in Peruon April using Text Mining Peru Natural Language Processing, Text Mining,People behaviour, Coronavirus, Covid-19 University of Sao Paulo(Brazil),Universidad Privada del Norte(Peru)[9] Text Mining Approachto Analyze CoronavirusImpact: Mexico City as Case of Study Mexico Natural Language Processing, Text Mining,People behaviour, Coronavirus, Covid-19 University of Sao Paulo(Brazil),Tecnologico Nacional del Mexico /Instituto Tecnologico de Matamoros(Mexico)[6] How was the Mental Health ofColombian people on Marchduring Pandemics Covid19? Colombia Not available University of Sao Paulo(Brazil),[10] Mining Twitter Data onCOVID-19 for Sentiment analysisand frequent patterns Discovery Algiers tweets Analytics, COVID-19, sentimentanalysis, frequent patterns, associationrules mining University of Science andTechnology Houari Boumedine(Algiers)[7] Infoveillance based onSocial Sensors to Analyzethe impact of Covid19in South American Population SouthAmerica(not Brazil) Not available University of Sao Paulo(Brazil),[2] Spread of SARS-CoV-2 Coronaviruslikely constrained by climate Notapplicable Not available National Museum of NaturalSciences (Spain),University of vora (Portugal),University of Helsinki (Finland)[3] The Role of Host Genetic Factorsin Coronavirus Susceptibility:Review of Animal andSystematic Review of Human Literature Notapplicable Coronavirus; COVID-19;Host genetic factors ; SARS-CoV-2 University of Florida College ofVeterinary Medicine(Usa),National Institutes of Health(Usa),Johns Hopkins Bloomberg Schoolof Public Health ,(Usa)[16] Early epidemiological assessmentof the transmission potentialand virulence of coronavirusdisease 2019 (COVID-19)in Wuhan City: China,January-February, 2020 China Not available University Yoshida(Japan),Kyoto University(Japan),Georgia State University(Usa)[14] Analysis of Epidemic Situation ofNew Coronavirus Infection at Homeand Abroad Basedon Rescaled Range (R/S) Method China Not available Sichuan Academy of Social Sciences(China)[19] State heterogeneity of human mobilityand COVID-19 epidemics inthe European Union EuropeanUnion Coronavirus 2019, epidemics, geographic,trends, public health intervention Shanghai Jiao Tong UniversitySchool of Medicine(China),University at Buffalo(Usa),Yale University School of Medicine(Usa)

Table 1.

Ten results of Medrxiv Query about covid19 papers in South America

Considering, the previous evidence it is necessary to conduct studies withBrazilian data, then the initiative of Fapesp is valuable to foster research oncovid19 topic. The actual paper uses Data Mining Approach to perform anexploratory analysis of the dataset of Brazilian patients of Sao Paulo State. Themethodology to explore data is presented in Section 2, the experiments andresults in Section 3. Conclusion states in Section 4, ﬁnal recommendations andfuture work are presenten in Section 5, 6. Data extracted from website: https://virusncov.com/ata Mining Approach to Analyze Covid19 Dataset of Brazilian Patients 3

The conducted work follows a methodology inspired in CRISP-DM[17]. Theimage 1 presents the ﬂow between the phases of the exploration.

Exploring Data

Data Exploration

Pre-processing

Cleaning

Analysis

Filttered Data

Visualization

Question Graphics

Fig. 1.

Methodology

This step involves: check format ﬁles, open the ﬁles using a Language Program-ming or a tool. Review number of registers or rows per each ﬁle. Check existenceof null values, check kind of each variable or ﬁeld. For this step, Python LanguageProgramming and pandas package are used to manipulate the data.

This step is related how to deal with data before of generate graphics for analysis. – If a speciﬁc variable must be numerical, but there is string values, so it isdiscarded – If null values are found, a discarding process must be considered. – If range reference for one exam, analytes is null then the analysis is notpossible.

Using clean data is possible to answer some questions related to age distribu-tion, sex distribution, distribution of results to detect anomalies or outliers. Thequestions can require a kind of speciﬁc graphic to suppot analysis.

Considering distribution of few classes, a pie chart is useful to check propor-tions, subsection 3.3, 3.8 . For age distribution, bar plot can show how is thedistribution, see subsection 3.4, 3.5, 3.6. The analysis is dozen of values can besupported for boxplot graphics, in subsection 3.9, 3.10.

Josimar Edinson Chire Saire [email protected]

The release of the datasets is the result of collaboration between Research Foun-dation (FAPESP)[15], Fleury Institute, Israelita Albert Einstein Hospital, Sirio-Libanes Hospital and the University of Sao Paulo. The goal is to contribute andpromote research related to Covid19. The datasets share the data dictionariesof Patients(see Tab. 1), Test (Tab. 2).

Table 2.

Data Dictionary of Patient Dataset- Einstein, Fleury, Sirio-Libanes Hospital

Variable Description Format Content

ID PACIENTE Unique identification of patient Alphanumeric characters String, key patientIC SEXO Genre Alphanumeric character F - Feminino(Female)M - Masculino(Male)AA NASCIMENTO Birth date Number Example: 1959(*) AAAA - for people was born before or equel 1930CD PAIS Country of residence Alphanumeric Exemplo: BRCD UF Federal State Identifier Alphanumeric characters AC - Acre, AL - Alagoas, AM - Amazonas, AP - Amapa, BA - Bahia,CE - Cear, DF - Distrito Federal, ES - Espirito Santo, GO - Gois,MA - Maranho, MG - Minas Gerais, MS - Mato Grosso do Sul,MT - Mato Grosso, PA - Par, PB - Paraba, PE - Pernambuco,PI - Piau, PR - Paran, RJ - Rio de Janeiro, RN - Rio Grande do Norte,RO - Rondnia, RR - Roraima, RS - Rio Grande do Sul,SC - Santa Catarina, SE - Sergipe, SP - So Paulo, TO - TocantinsCD MUNICIPIO Residence City Alphanumeric Example: SAO PAULO, CAMPINAS, SANTO ANDREMMMM - for the lowest occurrencesCD CEP Postal Code Number (**) First five digits of Postal Code, (**) CCCC - for low number of ocurrences

Table 3.

Data Dictionary of Tests - Einstein, Fleury, Sirio-Libanes Hospital

Variable Name Description Format Content

ID PACIENTE Unique identification of patient Alphanumericcharacter String, patient keyDT COLETA Exam collection date Date (yyyy/MM/dd) DateDE ORIGEM Origin of patient Alphanumericcharacter (4) HOSP Exam made in a hospitalDE EXAME Description of Exam Alphanumeric Example: HEMOGRAMA(blood count)DE ANALITO Analyte description Alphanumeric Example: Eritrcitos(Erythrocytes),Leuccitos(Leukocytes), Glicose(Glucose)DE RESULTADO Result of exam,related to DE ANALITO Alphanumeric If DE ANALITO requires numerical values,Integer ou FloatIf DE ANALITO requeries qualitative,String with restrict domainCD UNIDADE Unit of measurement Alphanumeric StringExemplo: g/dL (grams por deciliter)DE VALOR REFERENCIA Reference valuesfor DE RESULTADO Alphanumeric String - Reference value for de analito inthe population

MinV alue , MaxV alue

No Detectado(Not detected)/Detectado(Detected)Example for glucose: 75 to 99Example for progesterone: until 89

The size of dataset are presented in Table 3 for three data sources. SL Hospitalprovided a dataset about outcomes of the patients.

Table 4.

Features of Dataset

Einstein Hospital Fleury SL HospitalPatient(size)

Test(size)

Test(Dates)

Outcome(size) - - 9,634

Outcome(Dates) - - 2020-02-26to 2020-06-29 ata Mining Approach to Analyze Covid19 Dataset of Brazilian Patients 5

This subsection present some graphics to describe data and let posterior analysis,besides the requeriment of some graphics related to distribution, i.e. bar plot,boxplot.

Description of datasets

The Figure 2 is presented with counting values,unique values, top for each ﬁeld. The name of columns were transformed tolowercase to have an uniform name of ﬁelds.

Fig. 2. (a) Einstein, (b) Fleury and (c) SL Datasets Description – Figure 3.b presents a diﬀerent number of id paciente in patient dataset andexam dataset, 129596(patient) 129595(exam). – Einstein and SL Hospitals( cd pais ) presents people living in countries dif-ferent than Brazil. – The most frequent age of patients is: 38(Einstein, Fleury) and 34(SL). – Female patients are higher in number in Einstein, Fleury. – Most frequent cd uf, cd municipio is Sao Paulo State or city and CCCC ismost common in Postal Code, so this places do not have meaningful numberof ocurrences. – Einstein and Fleury have a unique de origem: Hosp, Lab respectively. ButSL Hospital has 56 diﬀerent. – The exam hemograma(blood count) is the most frequent in the datasets, andde analito more frequent in Eistein, Fleury are related to

Covid19 . – Eistein has the lowest number of diﬀerent de exame(61), de analito(127).Fleury has the highest de exame(722), de analito(978). SL has de exame(478),de analito(652). Therefore, numer of de valor referencia are related. – SL Hospital presentes NaN(Not a number) values, then it is possible ﬁndNaN values in the datasets.

Female population is slightly bigger than male population in Einstein, Fleurybut SL presents male population bigger for 0.05%(29 people), see Fig. 3.

Josimar Edinson Chire Saire [email protected]

Fig. 3.

Sex Distribution(Einstein, Fleury and HL)

Datasets of Einstein, Fleury have younger patients from 0 to 14 until 89 but SLHospital only from 14 to older(86), this graphics are presented in Fig. 4

Fig. 4.

Age Distribution (Einstein, Fleury, SL)

The graphic Fig. 5 presents the number of collect exams per day and month,Einstein presents an increasing number from January to June, Flury a decreasingfrom January to April but a peak on May, June. Besides, SL Hospital has anincreasing from February to June.

To answer what were the most frequent exams during the month of each dataset,graphic Fig. 6 presents the 20 most frequents. – Three datasets has blood count exam on the top of each month. – Only Fleury has exams related to covid19 detection on April, May, June onthe top 5. ata Mining Approach to Analyze Covid19 Dataset of Brazilian Patients 7

Fig. 5.

Date Distribution (Einstein, Fleury, SL) – There are many kind of exams related to covid19 for Hospital, i.e. PCR,Sorologia SARS-Cov-2/Covid19 (Einstein). Fleury has NOVO Coronavirus2019, Covid19 Anticorpos lgG, lgM, lgA and more. SL Hospital has Covid-19 PCR para Sars-Cov2 and a problem with encoding is detected in thisdataset. – For the previous reason, each dataset is studied separately.

Fig. 6.

Exam Distribution (Einstein, Fleury, SL)

Einstein and Fleury presents analytes related to covid19, i.e. resultado covid19,Covid19 deteccao por PCR, Covid19 material and more. Again, Fleury presentsa variety of names for analytes related to covid19. And SL Hospital does nothave any in the top 20(see Fig. 7).

Josimar Edinson Chire Saire [email protected]

Fig. 7.

Analyte per month(Einstein, Fleury, SL)

Considering analytes related to covid19, graphic 8 presents the number of de-tected/not detected during the months for Hospital Einstein. Fleury and SL donot have an standardized outputs of covid19 exams, therefore is not possible togenerate the graphics yet.

Fig. 8.

Analyte per month(Einstein)

Considering top 14 of de analito and de resultado, the graphic Fig. 9 is present-ing boxplot of the values of Einstein Hospital. It is necessary not to considerqualitative values, then only numerical values were used to build the plot. An-alyzing the graphic is remarkable to many outliers in many of analytes, then acleaning process is necessary. ata Mining Approach to Analyze Covid19 Dataset of Brazilian Patients 9

Fig. 9.

Boxplot of top 14 analytes (Einstein)

Splitting data of covid19 detected and no detected, ﬁgure Fig. 10 is presented.Again, outliers are present in Fleury dataset. Red ones(detected), blue(not de-tected).

Fig. 10.

Boxplot of top 14 analytes(Fleury)

Using a cleaning process using standard deviation(std) is proposed, becausethe outliers are further than median and in normal case two or three timeshigher is considered an abnormal value but in this situation, to have a bettervisualization of boxplot was used 0.5*std(see Fig. 11) and 0.2*std(see Fig. 11)on Einstein dataset considering analytes with abnormal values.

Fig. 11.

Boxplot of Cleaned dataset of Analytes with Abnormal Values, 0.5*std0 Josimar Edinson Chire Saire [email protected]

The next graphics are created splitting Einstein dataset for genre. There is pres-ence of NaN values in the reference value then these analytes are discared forthe graphic, table 3.10 presents the no valid de analito, it is a total of 8.

Table 5.

No valid de analito for no valid reference range

De analito Unity RangeReference

Neutrﬁlos % nanDosagem de Glicose nan nanBasﬁlos % nanEosinﬁlos % nanMoncitos % nanLinfcitos % nanLeuccitos x10ˆ3/uL nanPlaquetas x10ˆ3/uL nan

Plotting the distribution(Fig. 12) for 30 most frequents analytes for men.

Fig. 12.

Men Analytes

The next graphic 13 present the distribution for positive cases of covid19.In the two previous images 12 and 13 is possible to observe a concentrationof outliers in the sides of the normal distribution, i.e. TGO, TGP, Creatinina,Neutrﬁlos ata Mining Approach to Analyze Covid19 Dataset of Brazilian Patients 11

Fig. 13.

Men Analytes - Positives covid19 cases

And graphic 14 introduces the result after of cleaning values and consideringpatients with positive cases and the date when it was detected until it ﬁnishesor open(no date for discard test). Because the aim of the analysis is understandhow is the behaviour of the patients with positive diagnosis of covid19 duringthe active phase of virus, from the start until the end. Analyzing, Fig. 14, it ispossible to notice that the presence of outliers has disappeared, an exceptionwith Basﬁlos

Fig. 14.

Filtered Men Analytes - Positives covid19 cases

Finally, Table 3.10 presentss the steps used to clean data and generate Fig.14. First, only numerical values are considered, null values are discarded, and [email protected] values out of reference range are not considered. For checking if values are insideof reference range, it was manually because there was many reference values too,only the lowest and highest value were used to ﬁlter data. Then, the reductioncan be from 0.83 to 75.30 %. An initial number of exams was 108,152 and ﬁnalvalue after ﬁltering 86,814 with a reduction of almost 20% of the available data.Now, dataset is ready to answer more question and the research can continue.

Table 6.

Reduction of Dataset de analito Initial OnlyNumericals Not null Range Reduction

Magnsio 2733 2733 2725 675 75.30TGO 1884 1884 1865 1799 4.51TGP 1887 1887 1873 748 60.36Clcio Inico mmol/L 3585 3585 3553 3494 2.54Neutrfilos

Total 108152 86814 19.73

Coronavirus pandemic is active in the world, scientist are working to understandhow to stop the virus, many areas are studying the covid19 impact in Heath,Economy therefore datasets related to patients are useful and important. Fapespinitiative to gather university and hospital is remarkable because it can fosterresearch on the topic.Real world datasets are not clean or ready for Data Mining or Data Sciencetasks then an exploratory phase is mandatory to see if data can be representativeor useful to answer questions. Then, many cleaning steps were necessary togenerate the ﬁnal dataset and graphic, besides this cleaning step reduced theavailable dataset of men in 20%, with a maximum value of 75.30% for MagnesiumAnalyte, then it is possible a meanignful reduction of data is a cleaning task isperformed.Finally, share the process of analysis is useful for researchers interested toanalyze with this dataset, so it can save time, eﬀort to future research. ata Mining Approach to Analyze Covid19 Dataset of Brazilian Patients 13

For researchers interested to work with these datasets, consider: – Check if range of dates for each dataset to know if this data is useful foryour study. – Sirio-Libanes Hospital has some issues related to encoding, this is the small-est dataset then you must analyze if it useful for analysis and search for theproblems to ﬁx them. – Only Einsteing dataset has a standardized output for covid19 exams: de-tected or not detected. If you are from Computer Science or related ﬁeld,this is better for your study. Because, Fleury has a variety of outputs, there-fore is necessary the presence or advice of one person related to Medicine toexplain you the diﬀerent values. – If you want to automatize ﬁltering considering reference range of values,remember there are many for many analytes, then the suggestion is checkthis manually to check if it is possible to code the process.

For further work, a crossing of data is proposed to improve the analysis consider-ing other variables, i.e. social-economic data, previous existence of health issuesrelated to patients, considering data of other hospital to enhance the study. Bythe other hand, a deep analysis will be performed with this new cleaned dataset.

Acknowledgement

The author wants to thank to Fabio Faria, professor of UNIFESP(Federal Uni-versity of So Paulo) for the invitation to analyze this dataset, to the team DS-Covid for the discussion about the generated graphics during the data analysistask, more news about future will be available in: https://dscovid.github.io/ .

References

1. Abril, E.: Ministrio da Sade conﬁrma 3 casos suspeitosde coronavrus no Brasil (Jan 2020), https://web.archive.org/web/20200129042253/https://exame.abril.com.br/brasil/ministerio-da-saude-confirma-3-casos-suspeitos-de-coronavirus-no-brasil/

2. Araujo, M.B., Naimi, B.: Spread of sars-cov-2 coronavirus likely to be constrainedby climate. medRxiv (2020). https://doi.org/10.1101/2020.03.12.20034728,

3. Araujo, M.B., Naimi, B.: Spread of sars-cov-2 coronavirus likely to be constrainedby climate. medRxiv (2020). https://doi.org/10.1101/2020.03.12.20034728,

4. AS/COA: The Coronavirus in Latin America (Aug 2020), [email protected]

5. Braziliense, C.: Casos suspeitos de coronavrus so registrados em PortoAlegre e Curitiba (Jan 2020),

6. Chire Saire, J.E.: How was the mental health of colombianpeople on march during pandemics covid19? medRxiv (2020).https://doi.org/10.1101/2020.07.02.20145425,

7. Chire Saire, J.E.: Infoveillance based on social sensors to analyze theimpact of covid19 in south american population. medRxiv (2020).https://doi.org/10.1101/2020.04.06.20055749,

8. Chire Saire, J.E., Oblitas, J.: Covid19 surveillance in peru on april using textmining. medRxiv (2020). https://doi.org/10.1101/2020.05.24.20112193,

9. Chire Saire, J.E., Pineda-Briseno, A.: Text mining approach to ana-lyze coronavirus impact: Mexico city as case of study. medRxiv (2020).https://doi.org/10.1101/2020.05.07.20094466,

10. Drias, H.H., Drias, Y.: Mining twitter data on covid-19 for sen-timent analysis and frequent patterns discovery. medRxiv (2020).https://doi.org/10.1101/2020.05.08.20090464,

11. Folha: Brasil conﬁrma primeiro caso do novo coronavrus (Jan2020),

12. Globo: Brasil tem 13.993 mortes e 202.918 casos conﬁrma-dos de novo coronavrus, diz ministrio (May 2020), https://g1.globo.com/bemestar/coronavirus/noticia/2020/05/14/brasil-tem-13993-mortes-causadas-pelo-novo-coronavirus-diz-ministerio.ghtml

13. Globo: Ministrio investiga caso suspeito de coronavrus emMG e pede que viagens China sejam evitadas (Jan 2020), https://g1.globo.com/ciencia-e-saude/noticia/2020/01/28/ministerio-da-saude-confirma-caso-suspeito-de-coronavirus-em-mg.ghtml

14. Ji, X., Tang, Z., Wang, K., Li, X., Li, H.: Analysis of epidemic situation ofnew coronavirus infection at home and abroad based on rescaled range (r/s)method. medRxiv (2020). https://doi.org/10.1101/2020.03.15.20036756,

15. Mello, L.E., Suman, A., Medeiros, C.B., Prado, C.A., Rizzatti, E.G., Nunes,F.L.S., Barnab, G.F., Ferreira, J.E., S, J., Reis, L.F.L., Rizzo, L.V., Sarno,L., de Lamonica, R., Maciel, R.M.d.B., Cesar-Jr, R.M., Carvalho, R.: Open-ing Brazilian COVID-19 patient data to support world research on pandemics(Jul 2020). https://doi.org/10.5281/zenodo.3966427, https://doi.org/10.5281/zenodo.3966427

16. Mizumoto, K., Kagaya, K., Chowell, G.: Early epidemiological assess-ment of the transmission potential and virulence of coronavirus disease2019 (covid-19) in wuhan city: China, january-february, 2020. medRxivata Mining Approach to Analyze Covid19 Dataset of Brazilian Patients 15(2020). https://doi.org/10.1101/2020.02.12.20022434,

17. Shearer, C.: The crisp-dm model: The new blueprint for data mining. Journal ofData Warehousing (4) (2000)18. Waheed, A., Goyal, M., Gupta, D., Khanna, A., Al-Turjman, F., Pinheiro, P.R.:Covidgan: Data augmentation using auxiliary classiﬁer gan for improved covid-19detection. IEEE Access , 91916–91923 (2020)19. Yuan, X., Hu, K., Xu, J., Zhang, X., Bao, W., Lynch, C.F., Zhang, L.: State hetero-geneity of human mobility and covid-19 epidemics in the european union. medRxiv(2020). https://doi.org/10.1101/2020.06.10.20127530,, 91916–91923 (2020)19. Yuan, X., Hu, K., Xu, J., Zhang, X., Bao, W., Lynch, C.F., Zhang, L.: State hetero-geneity of human mobility and covid-19 epidemics in the european union. medRxiv(2020). https://doi.org/10.1101/2020.06.10.20127530,