Open Data Resources for Fighting COVID-19
Teodoro Alamo, Daniel G. Reina, Martina Mammarella, Alberto Abella
OOpen Data Resources for Fighting COVID-19
Teodoro Alamo ∗ , Daniel G. Reina † , Martina Mammarella ‡ , Alberto Abella § May 12, 2020
Abstract
We provide an insight into the open data resources pertinent to the study of the spread of Covid-19pandemic and its control. We identify the variables required to analyze fundamental aspects like seasonalbehaviour, regional mortality rates, and effectiveness of government measures. Open data resources, alongwith data-driven methodologies, provide many opportunities to improve the response of the differentadministrations to the virus. We describe the present limitations and difficulties encountered in most ofthe open data resources. To facilitate the access to the main open data portals and resources, we identifythe most relevant institutions, at a world scale, providing Covid-19 information and/or auxiliary variables(demographics, mobility, etc.). We also describe several open resources to access Covid-19 datasets at acountry-wide level (i.e., China, Italy, Spain, France, Germany, U.S., etc.). In an attempt to facilitate therapid response to the study of the seasonal behaviour of Covid-19, we enumerate the main open resourcesin terms of weather and climate variables. We also assess the reusability of some representative open datasources.
Keywords:
Covid-19; Coronavirus; SARS-CoV-2; Open data; Data driven methods, Machine Learning;Seasonal behaviour; Government measures ∗ Departamento de Ingenier´ıa de Sistemas y Autom´atica, Universidad de Sevilla, Escuela Superior de Ingenieros, Camino delos Descubrimientos s/n, 41092 Sevilla, Spain (e-mail: [email protected]) † Departamento de Ingenier´ıa Electrnica, Universidad de Sevilla, Escuela Superior de Ingenieros, Camino de los Descubrim-ientos s/n, 41092 Sevilla, Spain (e-mail: [email protected]) ‡ Institute of Electronics, Computer and Telecommunication Engineering, National Research Council of Italy, Turin, Italy(e-mail: [email protected]). § FIWARE Foundation. Germany (e-mail: [email protected]) This research was funded by ”Plan Propio de la Universidad de Sevilla” under the contract of ”Contratos de acceso alSistema Espaol de Ciencia, Tecnologa e Innovacin para el desarrollo del programa propio de I+D+i de la Universidad de Sevilla”of Dr. Daniel G. Reina. a r X i v : . [ q - b i o . O T ] M a y ontents Open source communities 16
Our World in Data
Dataset) . . . . . . . . . . . . . . . . . . . . . . 187.2 Examples of Regional Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187.2.1 Africa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187.2.2 Argentina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187.2.3 Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2.4 China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2.5 France . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2.6 Germany . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2.7 Iceland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2.8 Italy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2.9 Paraguay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2.10 South Africa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2.11 South Korea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2.12 Spain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2.13 United Kingdom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2.14 United States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Introduction
We provide in this document a survey on the main open-resources for addressing the Covid-19 pandemic froma data science point of view. Since the number of institutions and research teams working nowadays againstthe virus is growing at a very fast pace, it is impossible to provide an exhaustive list of all the meaningful opendata providers. At a global world scope, we identify the most relevant sources. However, the enumerationof the regional institutions providing local information is so extensive that we address it specifically only forsome countries (like China, Italy, Spain, and the U.S, among others). We focus on the variables that havepossible effects on the evolution and control of the disease at a global and regional scale [34]. That is, wedo not cover in this document the data specifically related to medical treatments, vaccines, etc. [36]. Wedo provide open resources for the number of hospitalized cases, intensive care units (ICU) cases, number oftests, etc. These variables are very relevant to monitor the evolution of the pandemic and also to evaluatethe actions taken by the decision-makers [34].With this document, we try to make accessible many significant open data resources on Covid-19 for thescientific community. In many situations, identifying adequate sources is difficult, especially for non-expertdata scientists. For example, GitHub repository contains many meaningful datasets of global and regionalscope, but it might be challenging to discover them without adequate guidance. Besides, the reliability ofthe data source provider can be a concern. Therefore, this paper is aimed at providing a big picture of theavailable data source providers for analyzing Covid-19 propagation and control. We have tried to find stableand reliable resources so that the utility of this paper endures in time.The paper is organized as follows. We first analyze in Section 2 the different variables that have asignificant effect on the evolution and control of the epidemic (demographics, mobility, weather conditions,government measures, etc.). The opportunities that open data resources on Covid-19 offer to fight thepandemic are highlighted, from a data-driven perspective, in Section 3. Different limitations and inaccuraciesof the currently available sources, along with the difficulties encountered when using them in a data-sciencecontext are discussed in Section 4. The most relevant open data institutions at a global scale, addressing theCovid-19 pandemic, are enumerated in Section 5. More functionally, in Section 6, we identify open sourcecommunities that facilitate access to the required data. In Section 7, we identify open datasets related tospecific Covid-19 variables at a global and regional scale. The open access to auxiliary variables of interestto model specific aspects of the pandemic, like seasonal behaviour or local mortality rate, is described inSection 8. In Section 9, we discuss the reusability of the available datasets. Finally, a concluding Section 10is included.
Coronavirus disease 2019 (Covid-19), technically known as SARS-CoV-2, is an infectious disease that wasfirst identified on December 31st 2019 in Wuhan, the capital of China’s Hubei province. The World HealthOrganisation (WHO) declared the 201920 coronavirus outbreak a Public Health Emergency of InternationalConcern on January 30th 2020 and a pandemic on March 11th.The virus is mainly spread during close contact and by small droplets produced when those infected cough,sneeze or talk. These small droplets may also be produced during breathing. The virus is most contagiousduring the first 4-6 days after onset of symptoms [18], although spread is possible in asymptomatic conditions[5] and in later stages of the disease [18]. The time from exposure to onset of symptoms (incubation period)is typically around 5 days but may range from 2 to 14 days [35]. Recommended measures to control thepandemic include social distancing, mobility constraints, pro-active testing and isolation of detected cases[28].
To monitor the spread of Covid-19, the different regional institutions are measuring the number of confirmedcases, deaths, recovered, hospitalized cases, intensive care unit (ICU) cases, etc. Because of the incubationperiod [35], all these variables are related with the number of infected cases in a delayed way. One of the mainobjectives of those institutes is to estimate the basic reproductive number R , which serves to characterizethe spread of the virus [45]. Several works have calculated R for some outbreaks of specific locations. Theestimated values are ranging from 2 to 3 [38]. However, only limited data have been used in the majorityof works. On the other hand, achieving an accurate model of the virus reproduction is a challenging task,4hich involves many variables and validation steps. Unfortunately, the open data sets available nowadaysare locally collected, imprecise with different criteria (lack of standardization on data collection), inconsistentwith data models, and incomplete.One of the main limitations of these datasets is that often only cases confirmed by a laboratory testare included. The standard method of diagnosis is by real-time reverse transcription polymerase chainreaction (RT-PCR) from a nasopharyngeal swab. The infection can also be diagnosed from a combination ofsymptoms, risk factors and a chest CT (computed tomography) scan showing features of pneumonia. Thus,on a general basis, the infected cases without a positive laboratory test are not considered confirmed cases inthe time-series data available on the different open-source repositories. The same problem can be encounteredwhen analyzing death cases. In many situations, specially at the beginning of the outbreak, only the onesthat were previously confirmed infected by a laboratory test are included in these datasets.Moreover, there are relevant variables that are not accurately measured. For example, the fraction ofinfected non-asymptomatic cases in a given population can be only estimated by means of massive tests orby effective contact-tracing methods. The massive tests carried out in small towns, for example in the northof Italy, indicated that the fraction of asymptomatic cases in the population could be significant (comparableor even larger than the symptomatic cases). Therefore, asymptomatic cases play an important role in virustransmission [5]. Furthermore, important inaccuracies have been reported on the use of fast tests. It is animportant issue since their use can improve the detection of real cases.The above limitations on the available datasets have to be taken into consideration in any data-drivenmethod to model or forecast the future spread of the pandemic. Being able to predict the number of patients that will develop life-threatening symptoms is important since thedisease frequently requires hospitalisation and even ICU in the worst case, challenging the healthcare systemcapacity [24]. One of the most important ways to measure the burden of Covid-19 is through mortality. Theprobability of dying when getting infected depends on different factors [69], [54], [30]: • Demographics [53]: age, gender, prevalence of diabetes, high blood pressure, obesity, and other riskfactors [37]. • Health System [30]: availability of artificial respiration equipment, ICU, specialized medical surveillanceand treatments, etc.On the one hand, several studies have reported a higher level of mortality for older people [53], even moreaggravated in men. Thus, protection strategies should be focused on more vulnerable age and gender groups.Moreover, the capacity of each regional health system to cope with the pandemic is time-varying. Mostof the countries, which had already suffered in a severe way the pandemic, had their hospitals and physiciansoverwhelmed by the numbers of critical cases (e.g., Italy, Spain, the U.S.) [12]. The main objective in thecontrol strategies, e.g., contention and mitigation of the disease, is to prevent the saturation or overload ofthe health system because it will be directly translated into a significant increase in mortality.
Many respiratory viruses have a seasonality because lower temperature and lower humidity help facilitatethe transmission of the virus [39] [9]. There is no clear evidence that Covid-19 is going to behave seasonally,reducing its transmission in summer. Indeed, during the summer season in the Southern hemisphere, e.g., insome regions of South America and Australia, significant Covid-19 outbreaks have been already reported. In[60], the authors show that on March 2020, the areas with significant community transmission of Covid-19had distribution roughly along the 30-50 N corridor, at consistently similar weather patterns consisting ofaverage temperatures of 5-11C and low specific (3-6 g/kg), absolute humidity (4-7 g/m3). In [64], the authorsstudy the relationship between temperature, humidity and the transmission rate of Covid-19. They used datacollected from all the cities in China with more than 100 cases. The authors use a lineal regression frameworkas model. Results indicate that increments of one-degree Celsius in temperature and one per cent in relativehumidity lower R by 0.0225 and 0.0158, respectively. The authors developed a web application, where R values for major worldwide cities can be obtained from temperature and humidity. http://covid19-report . com/ .4 Current actions to control Covid-19 pandemic For the control community, the different confinement, pro-active testing and isolation strategies that can beimplemented by a government clearly constitute control inputs to the system [4]. Many of these strategies toslow or stop the spread of Covid-19 are being implemented worldwide, with different intensities. However,these are not the unique actions that a government can undertake in order to control the pandemic. Forexample, forcing the population to wear masks (or scarves) and plastic gloves might have an inhibitory effecton the spread of the virus [15] and has not a significant impact on the economy (provided masks are producedat large scale). From a control point of view, the objective is twofold. On one hand, it is important to assessthe effectiveness of the different measures against the spread of the virus. On the other hand, actions shouldbe planned in advance to mitigate the effects of the pandemic on health system, economy and society.It is not simple to determine the effect of the possible anti-measures to be undertaken by the regionalgovernments for several reasons: (i) various inhibitory actions are generally implemented simultaneously,therefore it cannot be evaluated which one has more impact; ii) the efficacy of the anti-measures dependson a number of factors, like demographics and weather conditions of the specific region under consideration;iii) the available data are, in many situations, imprecise and incomplete. The difficulties in predicting theeffects of the Covid-19 anti-measures on the regional evolution of disease is one side of the problem. Anotherone is the inherent time-delay system nature of the dynamics of this disease. The effects of the undertakenmeasures are observed only weeks later. Another issue is the level of fulfilment of the confinement measuresfound in each country. In the following, current methods for contention and mitigation of the spread of thevirus are described.
Following the emergence of this novel coronavirus SARS-CoV-2 and its spread outside China, many countrieshave implemented unprecedented non-pharmaceutical interventions including case isolation, the closure ofschools and universities, banning of mass gatherings and/or public events, and wide-scale social distancingincluding local and national lockdowns. Many governments around the world closed the educational insti-tutions in an attempt to contain the spread of the Covid-19 pandemic, impacting over 91% of the worldsstudent population [62]. Another important aspect has been tackled by the New York Times: how incomeaffects peoples abilities to stay home and practice social distancing [63]. Wealthier people not only havemore job security and benefits but also may be better able to avoid becoming sick. In [11], authors use asemi-mechanistic Bayesian hierarchical model to attempt inferring the impact of these interventions acrosseleven European countries. They assume that changes in the reproductive number, i.e., a measure of trans-mission, are an immediate response to these interventions being implemented rather than broader gradualchanges in behaviour. In particular, this model estimates these changes by calculating backwards from thedeaths observed over time to estimate transmission that occurred several weeks prior, allowing for the timelag between infection and death. One of the key assumptions of the model is that each intervention has thesame effect on the reproduction number R across countries and over time. This allows leveraging a greateramount of data across Europe to estimate these effects. It also means that these results are driven stronglyby the data from countries with more advanced epidemics, and earlier interventions, such as Italy and Spain.The main conclusion of this research was that it is critical that the trends in cases and deaths are closelymonitored. Mobility of people is crucial to understand the spread of the virus. Higher mobility implies higher numberof contacts among people [67]. Furthermore, national and international mobility explains the rapid spatialpropagation of the virus worldwide. The authors in [65] use the Baidu Mobility Index, measured by the totalnumber of outside travels per day divided by the resident population, to find that reducing the number ofoutings can effectively decrease the new-onset cases; a 1% decline in the outing number will reduce about 1%of the new-onset-cases growth rate in one week (one serial interval).Sensor technology can be a crucial tool to obtain mobility measures [51]. Nowadays, everyone has a mobilephone equipped with a number of sensors, including GPS, that are able to collect data about people mobility.Furthermore, the Internet and mobile phone operators can use their telecommunications towers to gathermobility patterns. Of course, citizen privacy is an issue that has to be taken into consideration for dataanonymization. A first quantitative assessment of the impact of the Italian Government on the mobility and6he spatial proximity of Italians, through the analysis of a large-scale dataset on de-identified, geo-locatedsmartphone users can be found in [55].
The distinction between diagnosed and non-diagnosed is important because non-diagnosed individuals aremore likely to spread the infection than diagnosed ones. Indeed, the latter are typically isolated and this canexplain mis-perceptions of the case fatality rate and of the seriousness of the epidemic phenomenon [24]. Themain problem for developing massive tests and serology studies is the scarcity of resources, especially in somecountries. Accurate testing requires specific labs to analyze RT-PCR tests. On the other hand, the market ofrapid tests is under development [31] [49]. Some countries are carrying out serology-based testing. Serologytests are blood-based tests that can be used to identify whether people have been exposed to a particularpathogen by looking at their immune response. In this case, the objective is to have a big picture of the stateof population with respect to Covid-19. For instance, to check if herd immunity has been reached in somelocations [19].
Tracing the contacts of infected people is crucial to isolate potential infected individuals [18]. Once a personis confirmed as an infected one, tracing people contacted with in the last few days can help to reduce thepropagation of virus. However, tracing contacts results to be a challenging task. Manual registers canrequire an amount of resources unaffordable for the majority of countries. Therefore, technology should playan important role [58] [18], in particular mobile devices [52] and wireless technologies, such as WiFi andBluetooth.
Currently, the majority of data available on Covid-19 is used for describing the pandemic in terms of reportsand visualizations. Although these techniques are useful to highlight the magnitude of the crisis, they arenot enough for contending and mitigating the problem. Also, these are insufficient for decision-makers toanticipate the response to the virus propagation and evaluate the effectiveness of the implemented actions.Classic epidemic models are also useful to obtain mathematical models for epidemics [45]. However, manyparameters of these models, such as infected rate and basic reproduction number, require data-driven ap-proaches to estimate them accurately. Also, classic epidemic models, which are normally based on curvefitting techniques, require data on different phases of the epidemic to obtain the parameters. For these rea-sons, it is obvious that more efficient approaches are needed rapidly to: i) model and forecast the spreadand the consequences of the pandemic; and ii) evaluate mitigation approaches that have been carried out.Data-driven models (see ,e.g., [24] and [25]) can be such solution [68], [17]. Many data-based techniquescan be applied [40], ranging from classical statistical and machine learning approaches, e.g., linear regression[56] [7] and Bayesian inference [20], to sophisticated models based on neural networks [13]. These techniquesrequire sufficient and high-quality data to provide a good estimation. Depending on the methodology used,the quantity of data can vary notably from hundreds to millions of samples. Moreover, a wide variety of datacan be necessary for accomplishing an accurate model of a complex and dynamic system like the Covid-19pandemic. Therefore, data from different disciplines are required, which hinders the data collection task. Wehighlight three pillars of data-driven approaches for fighting Covid-19: i) informative variables for develop-ing an accurate model; ii) objectives of the model: characterizing Covid-19 pandemic, epidemic models andforecasting, etc.; and iii) its use for efficient decision making. • Wish-list of variables: the list of variables is large, since many aspects should be taken into consid-eration to develop accurate models. The considered variables can be divided into different categories,according to their discipline. – Covid-19 variables : regional time series of the number of confirmed cases, suspicious cases, deaths,recovered, number of tests, hospitalized cases, ICU cases, isolated positive cases, serology studies, For example, https://againstcovid19 . com/singapore/dashboard The following list can be improved including other disciplines and variables. – Geographic variables : locations of Covid-19 variables. The locations can be obtained from eithernames, e.g., countries, cities, etc., or GPS coordinates, i.e., longitude and latitude. – Demographic variables : population and density of population by location. These variables are re-quired for normalization of the rest of the variables. Other parameters are the age structure of thepopulation, the prevalence of secondary health conditions related to higher Covid-19 mortality, etc. – Health system variables : total number of ICU beds, number of doctors and nurses, personal pro-tective equipment (PPE), respirators, number and types of tests. – Government measures : social distancing, movement restrictions, lockdowns, etc. – Weather variables : temperature, relative humidity, radiation, etc. – Contamination variables : air pollution, i.e., fine particulate matter
P M . . – International and national mobility and connectivity : number of international and national flights,number of train connections international and national mobility patterns, traffic patterns, etc. • The use of data to estimate the state of the epidemic and develop forecasting models:
Byusing the aforementioned variables, different models can be developed to estimate the current state ofthe pandemic and anticipate the response to the propagation of Covid-19. Examples of estimation andforecasting analyses are: – Estimation of the infected population. – Estimation of economic impact. – Forecast of impact in health system through number of infected. – Assessing the impact in terms of mortality. – Analysis of seasonal behaviour. • Decision making:
The final objective of the data-driven models is developing useful tools for helpinggovernments and institutions to anticipate the response to the Covid-19 propagation and evaluate theiractions. Among them, the most relevant are: – Assessing the effectiveness of the measures. – Planing ahead government actions.
There exist different issues that can hinder the use of open data to address the challenges raised by theCovid-19 pandemic. The main obstacles are addressed in the following sections.8 .1 Variety of formats
Since there is no a common shared open database on Covid-19, the different sources and variables required toundertake a given analysis are often addressed by assembling several data sets into a single one. Although theincreased quantity of data sources presents new opportunities, working with such a variety of data reinforcesthe validity challenges [47]. Another issue is related to the wide range of disciplines from which the datasources are coming from. Indeed, these disciplines can be familiar with very different formats and datarepresentation. For instance, some available APIs (Application Programming Interfaces) to get data onCovid-19 provide JavaScript Object Notation (JSON) files. This format is widely used in computer sciencefor web applications. However, for instance, mathematicians and epidemiologists could not be familiar withsuch format.
The needs of the outbreak require immediate response, which translates in obtaining the latest informationavailable. This raises some important challenges. For example, government measures are changing rapidly.Often information is outdated by the time it has been identified. The number of countries implementing oramending measures increases daily [3]. The daily availability of the data can be an issue for working withmultiple data sources simultaneously.
In the WHO global Covid-19 surveillance document, a confirmed case is defined as a person with laboratoryconfirmation of Covid-19 infection, irrespective of clinical signs and symptoms. At the outbreak of the pan-demic, the access to massive tests was very limited and often only a reduced fraction of the hospitalized caseswere tested at a laboratory level. Thus, most reports of infection are extremely filtered by the complex andlimited testing process. Furthermore, very few datasets provide information about the number of suspectedcases.Even under the hypothesis that everyone with minor symptoms is tested, this would only provide anestimate of the symptomatic cases of the disease. The study of the fraction of asymptomatic cases is anactive field of research (see e.g., [46] and [18]) not only because it is one key to the estimation of the totalnumber of infected cases, but because it plays a fundamental role in the spread of the virus [18].
During the most severe periods of the virus spread in a country, in many situations the number of deathcases reported by the administration differs considerably from the real one. This is because only the deathswith previous laboratory confirmation of the disease are included. Thus, the study of national death registerssuggests that there are notably and unexpected increases in death rates, according to the historical numbers.For instance, New York City has reported 5330 more deaths than expected in April 2020 , only 3350 ofthese can be accounted for Covid-19 reasons. These figures suggest that there exist an undercounting onthe real number of deaths. Another example is reported for Spain, where the ”Sistema de Monitorizacinde la Mortalidad diaria (MoMo) system registers the total number of deaths under any circumstance. Thereport on April 7th indicates an increase of more than 50% of unexpected deaths in the month before. Suchincrement is even more significant in men, where it reaches more than 60%.The mortality rates are much more difficult to estimate since the estimates are often based on the numberof deaths relative to the number of confirmed cases of infection, which can be a small fraction of the real ones[6]. Consequently, the comparison of mortality rates between countries makes compulsory the implementationof correcting factors based on the estimation of Covid-19 infected cases and deaths non registered by therespective administrations. Also, when considering the increase of mortality due to saturation of the healthcare system, one has to take into consideration the fact that the patients who die on any given day wereinfected much earlier. Thus, the denominator of the mortality rate should be the total number of patientsinfected at the same time as those who died [6]. . nytimes . com/interactive/2020/04/10/upshot/coronavirus-deaths-new-york-city . html . isciii . es/QueHacemos/Servicios/VigilanciaSaludPublicaRENAVE/EnfermedadesTransmisibles/MoMo/Paginas/MoMo . aspx To better understand the disease and to improve models and strategies to fight Covid-19, each case should betracked with its own timeline. That is, for each case, relevant information about when symptoms appeared,medical treatments, evolution, degree of isolation, etc., should be available on a country-wide level. Then,this data should be published anonymously, with a de-identification process, to prevent personal identity frombeing revealed. The data, and the time corresponding to the change of each individual, should be publishedby an official source in a structured way, at least, with daily frequency. This possibility is supported by theopinion of many experts and members of the open-source community. An effort of obtaining individual case data can be found in [50]. The authors carried out a survey of24 questions related to the impact of Covid-19 (Covid19Impact) on citizens in Spain. The survey wasresponded to by 146,728 participants over a period of less than two days (i.e., 44 hours). The questionswere about social contact behaviour, financial impact, working situation, and health status. The results ofthe survey show the negative impact of Covid-19 on the life of citizens. It is a clear example of how thecollaboration of the citizens can be relevant to gather information on the effects of Covid-19. A similar workhas been pursued in UK and the results can be found in [32], where the authors created the Real WorldWorry Dataset of 5,000 texts. The data analysis suggests that people in the UK specially worry about theirfamily and the economic situation.
Since the governments are continuously adjusting their response to the virus, it is common to find out abruptchanges in the trend of a timeseries because a new methodology has been implemented. For example, onFebruary 12th, a sudden spike of 15,152 new Covid19 cases in China was observed and it was related to themodified method used for diagnosis, i.e., a combination of SARSCoV2 nucleic acid test and clinical Covid19features [66].Another relevant issue is that regions in the same country may provide data under the same label, butwith a different meaning. A good example is represented by the number of ICU cases. There might beregions reporting the accumulative number of confirmed cases that required ICUs, and others the number ofICUs used by Covid-19 patients. Something similar happens with the number of laboratory tests. They canrefer either to the total number of tests carried out or to the number of individuals tested. Indeed, in manysituations, the sources do not describe accurately the meaning of the counts.
The open data sources on Covid-19 are constantly improving. To provide more meaningful information, newvariables are incorporated into datasets. This translates into a change in the structure of the data, whichrequires adjusting the code to download and process the information. When regional data are collectedfrom the official open data portals of different countries, a surveillance effort is required to keep track of thedifferent modifications. In many situations, the new data-files appear in different locations with differentnames.
There are important differences in how the governments are reporting the data related to Covid-19. Further-more, there are some concerns about the transparency of countries regarding the data provided. See, for example, https://github . com/jgehrcke/covid-19-germany-gae . https://survey123 . arcgis . com/share/d29378b51fe8496d8dd77f08ce73973f https://github . com/ben-aaron188/covid19worry .9 Rush in academia publications Many scientific papers are being rapidly published even without peer-review, which is a sub-optimal way topublish science, and more studies are being based on data that is essentially non-peer-reviewed that mayhave a potential for bias or may contain genuine errors in research methodologies.
Numerous institutions of different nature, e.g., global institutions, European Union (EU) institutions, uni-versities, newspapers, etc., are providing daily reports on the evolution of the Covid-19 pandemic. In thissection, we enumerate those that, from our experience, resulted to be the most relevant and reliable ones.In particular, we highlight the ones that provide updated information on a regular basis in the open-datarepository with easy access. Some of the enumerated institutions are making a great effort to provide con-solidated data, describing in a rather exhaustive form, the sources and limitations of the provided datasets.In this section, we describe the nature and characteristics of the information provided, detailing the specificsof the datasets only for the most relevant ones.
The primary role of WHO is to direct international health within the United Nations’ system and to leadpartners in global health responses. In the framework of the pandemic Covid-19, WHO is providing continuousupdates about the current situation all around the world. In [26], WHO provides guidelines to follow, inthe privacy of our house as well as in public, Q&A pages on the most common questions about the virus,how it spreads and how it is affecting people worldwide,. Moreover, it addresses also myth busters related toCovid-19, in order to provide a reliable source of information (see [27]).
Johns Hopkins experts in global public health, infectious disease, and emergency preparedness have beenat the forefront of the international response to Covid-19 (see https://coronavirus . jhu . edu/ ) since thebeginning. This university provides a daily update on the global map of the pandemic. The dataset providedby the Johns Hopkins University (JHU) (see sub-subsection 7.1.1) is one of the most frequently used byresearchers and journal media. The Blavatnik School of Government is a department of University of Oxford that is working on the Covid-19pandemic and on the policy responses we see around the world. One of their projects related to the study ofCovid-19 is focused on tracking what governments around the world are responding to the pandemic and howthey compare to others. Regarding the comparison of confinement strategies developed by governments,they have created a common index named
Stringency Index . This index is based on data obtained by theOxford Covid-19 Government Response Tracker (OxCGRT), which systematically collects information onseveral different common policy responses governments have taken.
The European Data Portal (EDP) , which is the official open data portal of the European Union, givesaccess to open data published by EU institutions and bodies. EDP acts as single access point to open dataand it is published by national open data portals and institutions in the EU Member States as well as byother non-EU countries. There are numerous datasets on EDP that reference ”covid” or ”corona”. Also, lessspecific datasets describing former health infections, epidemics or pandemics are also provided. . who . int/westernpacific/emergencies/covid-19 Further information on the actions developed can be found at: . bsg . ox . ac . uk/news/coronavirus-research-blavatnik-school . https://data . europa . eu/euodp/es/data/ . europeandataportal . eu/en/highlights/covid-19
11n an effort to promote research on Covid-19, the European Union has opened a specific data portal,called COVID-19 Data Portal . covid19dataportal . org/ . The datasets included in the portalare divided into six categories, such as sequences, expression data, protein, structures, literature and otherresources.In the follows, some of the most relevant European research centers, which have been tackling with theCovid-19 outbreak, are briefly presented. The Joint Research Centre (JRC) is the European Commission’s science and knowledge service, whichemploys scientists to carry out research in order to provide independent scientific advice and support to EUpolicy. The European Center for Disease Prevention and Control (ECDC), established in 2004 after the 2003 SARSoutbreak and located in Solna, Sweden, is an independent EU agency, whose mission is to strengthen Europe’sdefences against infectious diseases. ECDC publishes numerous scientific and technical reports coveringvarious issues related to the prevention and control of infectious diseases. Towards the end of every calendaryear, ECDC publishes its Annual Epidemiological Report, which analyses surveillance data and infectiousdisease threats. In addition to offering an overview of the public health situation in the EU, the reportoffers an indication of where further public health action may be required to reduce the burden causedby communicable diseases. As other organizations, ECDC is closely monitoring the Covid-19 pandemic,providing risk assessments, public health guidance, advice on response activities to EU Member States andthe EU Commission, and daily-updated data on current outbreak [16].For EU level surveillance, ECDC requests countries from EU and from the European Economic Area(EEA) and UK to report laboratory-confirmed cases of Covid-19 within 24 hours after identification. This isdone through the Early Warning and Response System (EWRS).
The European Centre for Medium-Range Weather Forecasts (ECMWF) is an independent intergovernmentalorganization supported by 34 states based in Reading [8]. ECMWF is both a research institute and a 24/7operational service, producing and disseminating numerical weather predictions to EU Member States, Co-operating States and the broader community. ECMWF also archives data and makes them available toauthorized users. Some data are also made available under licence, and some are publicly available.
Good examples of open-data provided by the United Nations (UN) are reported in [48]. Moreover, [29]contains the most up-to-date Covid-19 cases and latest trend plot. It covers China, Canada, Australia atprovince/state level whereas the rest of the world, including US, is covered at country level, represented byeither the country centroids or their capitals.
The New York Times is releasing a series of data files with cumulative counts of Covid-19 cases in the U.S.,at state and county level, over time. The timeseries data are compiled from states, local governments andhealth departments. Since January 2020, The NY Times has tracked cases of coronavirus in real-time as theywere identified after testing. Then, these data have been used to power maps and generate reports aboutthe outbreak. The data collection began with the first reported coronavirus case in Washington State, onJanuary 21st, 2020. Since then, the NY Times publishes regular updates of data in a GitHub repository. https://ec . europa . eu/knowledge4policy/organisation/jrc-joint-research-centre en .7 Our World In Data Our World in Data (OWID) is an online scientific publication that focuses on large global problems, such aspoverty, disease, hunger, climate change, war, existential risks, and inequality. Covid-19 data provided byOWID can be found at their open-data portal.
Africa Centres for Disease Control and Prevention (CDC) is a specialized technical institution of the AfricanUnion established to support public health initiatives of Member States and strengthen the capacity of theirpublic health institutions to detect, prevent, control and respond quickly and effectively to disease threats. They provide reports on status, mitigation strategies and guidelines on Covid-19 at https://africacdc . org/covid-19/covid-19-resources/ . The multinational technology company Google has developed a visual Covid-19 map, where also relevantinformation can be found, worldwide and by country https://google . com/covid19-map/ . The map iscontinuously updated and the data exploited are taken from Wikipedia. They also present statistics aboutthe number of confirmed cases, cases per one-million of people (normalized data), number of people recovered,and deaths.Another relevant tool developed by Google, which can be used to obtain data about Covid-19, is theGoogle DataSet Search. Numerous data sets can be found looking for the term
Covid-19 . The applicationallows users to filter the datasets by several fields, such as last updated, download format, usage rights, topic,and accessibility, etc.
ACAPS, initially known as The Assessment Capacities Project, is an independent information providerhelping humanitarian actors respond more effectively to disasters ( . acaps . org ). ACAPS wasestablished in 2009 as a non-profit, non-governmental project with the aim of providing independent, ground-breaking humanitarian analysis to help humanitarian workers, influencers, fundraisers, and donors makebetter decisions. It is not affiliated to the UN or any other organization but is a non-profit project of aconsortium of two NGOs, i.e., the Norwegian Refugee Council and Save the Children, and it receives supportfrom several international sources, e.g., the Humanitarian Aid and Civil Protection organization. The ACAPSanalysis team is mainly dedicated to researching and analyzing global and crisis specific data. They provideregional reports on the pandemic, and additional information like description of the worldwide measuresagainst the spread of the virus available at . acaps . org/what-we-do/reports and in [3]. The Organisation for Economic Co-operation and Development (OECD) is an international organizationthat, together with governments, policymakers and citizens, has the goal of establishing evidence-based inter-national standards and finding solutions to a range of social, economic and environmental challenges. Fromimproving economic performance and creating jobs to fostering strong education and fighting internationaltax evasion, they provide a forum and knowledge-hub for data and analysis, experiences exchange, best-practice sharing, and advice on public policies and international standard-setting. OECD provides differentreports and data about government actions and economic impact due to the pandemic, which can be foundat . oecd . org/coronavirus/en/ . https://africacdc . org/ https://en . wikipedia . org/wiki/Template:2019%E2%80%9320 coronavirus pandemic data https://datasetsearch . research . google . com/ . oecd . org .12 Medical Research Council Centre for Global Infectious Disease Analysis The Medical Research Council Centre for Global Infectious Disease Analysis (MRC GIDA) of the ImperialCollege of London is an international resource and centre of excellence for research and capacity buildingon the epidemiological analysis and modelling of infectious diseases, and to undertake applied collaborativework with national and international agencies to support policy planning and response operations againstinfectious disease threats. The MRC presents reports on Covid-19 under five categories: i) weekly-forecasts;ii) resources; iii) information; iv) video updates; and v) publications.Furthermore, in collaboration with several departments of Imperial College London (Imperial CollegeCovid-19 Response Team) and Oxford University, they developed a model for estimating the number ofinfections and the impact of non-pharmaceutical interventions on Covid-19 in eleven European countries [20]. The Institute for Health Metrics and Evaluation (IHME) is an independent global health research center atthe University of Washington. They have developed a model to determine the extent and timing of deathsand excess demand for hospital services due to Covid-19 in the US [12]. The work uses: i) data on confirmedCovid-19 deaths from WHO and from local and national governments; ii) data on hospital capacity andutilization for US states; and iii) observed Covid-19 utilization data from different locations. A web service,where the projections of the model can be determined for each country and for the following four months,is available. The information provided is: i) hospital resources needs, including the number of beds, thenumber of ICU beds, and ventilators; ii) the number of death per day; and iii) the total number of deaths.
It is a research institution in USA. Its main focus is on advancing the study of complex systems. Theyhave developed a portal with the following goals: i) stop the spread of COVID-19, ii) consult governments,iii) institutions and individuals, iv) provide useful data and guidelines, and v) crush the curve. The portalincludes guidelines and reports on governments, communities, medical institutions, companies, families andindividuals.
MIDAS is a global network of scientists and practitioners from academia, industry, government, and non-governmental agencies, who develop and use computational, statistical and mathematical models to improvethe understanding of infectious disease dynamics as it relates to pathogenesis, transmission, effective controlstrategies, and forecasting. They have created a portal for Covid-19 modeling, which provides an importantand reliable catalogue of data resources, including datasets, webinars, and funding announcements.
The COVID-19 Data Hub projct has been funded by the Institute for Data Valorization IVADO, Canada. The goal of the project is to provide the research community with a unified data hub by collecting worldwidefine-grained case data merged with demographics, air pollution, and other exogenous variables helpful fora better understanding of COVID-19. In addition, they provide R package to download Covid-19 relateddatasets. . imperial . ac . uk/mrc-global-infectious-disease-analysis/covid-19/ The updates of the model can be accessed at https://github . com/ImperialCollegeLondon/covid19model . . healthdata . org/ https://covid19 . healthdata . org/projections https://necsi . edu/ https://midasnetwork . us/covid-19/ https://ivado . ca/en/ https://covid19datahub . io/ .17 Science.gov Science.gov, a gateway portal to U.S. government science information with free access to research and devel-opment results and scientific and technical information from scientific organizations across 13 federal agencies,uses software that supports federated search in real-time, over 70 information sources (e.g., databases) acrossthe leading federal science and technology agencies in the United States. Using a combination of searchterms for Covid-19, Science.gov has provided a link off its homepage that the public can use to quickly accessfederally-funded research on the COVID-19 disease. Upon linking to the coronavirus research results, userscan access freely available peer-reviewed literature (journal articles and accepted manuscripts).
The National Institute of Standards and Technology (NIST) is a physical sciences laboratory and a non-regulatory agency of the United States Department of Commerce. Its mission is to promote innovationand industrial competitiveness. NIST’s activities are organized into laboratory programs that include alsoinformation technology. For the Covid-19 pandemic, they provide a dedicated open portal where it is possibleto search for specific datasets related to the virus outbreak. Moreover, in collaboration with Allen Institute forArtificial Intelligence (AI2), the National Library of Medicine (NLM), Oregon Health & Science University(OHSU), and the University of Texas Health Science Center at Houston (UTHealth), NIST has formedthe so-called TREC-COVID challenge, which is currently building a set of Information Retrieval (IR) testcollections based on the CORD-19 datasets (see Section 6.3 for further details on CORD-19 competition)and the Text Retrieval Conference (TREC) model. Additional information on this challenge can be found at https://ir . nist . gov/covidSubmit/ . The National Institutes of Health, which represents the primary agency of the United States government re-sponsible for biomedical and public health research, is one of the most prominent source of data on Covid-19pandemic . In particular, the NIH Office of Data Science Strategy provided a portal dedicated to open-accessdata and computational resources related to Covid-19 fight available at https://datascience . nih . gov/covid-19-open-access-resources , seeking to provide the research community with links to open-accessdata (see e.g., . ncbi . nlm . nih . gov/pmc/about/covid-19/ ), computational, and supporting re-sources. Open Data Watch is a non-profit, non-governmental organization founded by three development data spe-cialists. It monitors progress and provides information and assistance to guide the implementation of opendata systems. The Open Data Watch team is experienced in the development of data management and sta-tistical capacity-building in developing countries. They have collected data from different sources all aroundthe world related to the Covid-19 pandemic. Indeed, to address the ongoing need for data-driven decisionmaking, Open Data Watch has put together some articles, organized by the stages of the data value chain:availability, openness, dissemination, and use and uptake. These papers are updated as new informationbecomes available. These references and related links can be found in [14].
It is a European mortality monitoring activity, aiming to detect and measure excess deaths related to seasonalinfluenza, pandemics and other public health threats . They report weekly bulletins on excess of mortalityof European countries. . nih . gov/health-information/coronavirus https://opendatawatch . com/ . euromomo . eu/ .22 World Bank Open Data The World Bank Group (WBG) is a family of five international organizations that make leveraged loansto developing countries. The World Bank’s activities are mainly focused on developing countries, in fieldssuch as education, health, agriculture, etc. During the Covid-19 pandemic, WBG help developing countriesstrengthen their pandemic response and health care systems. Furthermore, WBG has highlighted the im-portance of data to support countries in managing the global Covid-19 outbreak, including in their opendata portal, i.e., the World Bank Open Data, an entire section dedicated to Covid-19 and datasets withreal-time data, statistical indicators, and other types of data that are relevant to the coronavirus pandemic,particularly focused on the economic and social impacts of the pandemic and the World Banks efforts toaddress them.This dataset is of particular relevance to assess the correlation among the health emergency and theextraordinary shock the global economy is facing, trying to reply to the question: how is the deadly virusimpacting global poverty? . Indeed, estimating how much global poverty will increase because of COVID-19is challenging and comes with a lot of uncertainty. To answer this question, they propose a model based onhousehold survey data provided by PovcalNet (an online tool provided by the World Bank for estimating globalpoverty) and extrapolate forward using the growth projections from the recently launched World EconomicOutlook. Comparing these Covid-19-impacted forecasts with the forecasts from the previous edition of theWorld Economic Outlook provides an assessment of the impact of the pandemic on global poverty, assumingthat the pandemic does not change inequality within countries.
This section covers repositories of open source communities, which are dedicated to joining people with similarinterests. These have been widely developed in the software field, where many professionals and practitionersjoin their efforts to achieve bigger goals on software projects. These communities are playing a very activerole in facilitating access to Covid-19 datasets from official open portals all over the world.
GitHub is a subsidiary company of Microsoft for hosting software development using Git. It provides controlversions and project management, among other tools. Numerous open software projects are daily posted,free of charge. Since Covid-19 outbreak, many projects and related datasets have been posted. The majorityof those included in this paper can be obtained from GitHub. Some examples are: i) Open Covid-19 Dataset;ii) Covid-19 Data Processing Pipelines and Datasets; and iii) JSON timeseries of coronavirus cases dataset.
Harvard Dataverse is a free data repository, open to all researchers from any discipline, both inside andoutside of the Harvard community. Researches and practitioner can share, archive, cite, access, and exploreresearch data. They have opened a link at https://dataverse . harvard . edu/dataverse/2019ncov forworks related to Covid-19, where both the papers and the data used for the analysis can be found. Kaggle is a community for data scientist and machine learning practitioners. Kaggle allows users to findand publish datasets, to explore and build models in a web-based data-science environment, to work withother data scientists and machine learning engineers, and to enter competitions to solve data-science chal-lenges. Regarding Covid-19 pandemic, the portal opens a new challenge weekly to work on Covid-19data. The challenge consists of forecasting confirmed cases and fatalities for the following week. Fur-thermore, some data analysis posts can be found for each competition. The challenges opened up to https://data . worldbank . org/ https://blogs . worldbank . org/opendata/impact-covid-19-coronavirus-global-poverty-why-sub-saharan-africa-might-be-region-hardest https://support . dataverse . harvard . edu/ . kaggle . com/tags/covid19 For example . kaggle . com/frlemarchand/covid-19-forecasting-with-an-rnn . . kaggle . com/c/covid19-global-forecasting-week-1 ; ii) March 25th: . kaggle . com/c/covid19-global-forecasting-week-2 ; iii) April 1st: . kaggle . com/c/covid19-global-forecasting-week-3 ;and iv) April 8th: . kaggle . com/c/covid19-global-forecasting-week-4 .Moreover, the Covid-19 Open Research Dataset Challenge (CORD-19) competition has been launched,aimed at developing text and data mining tools that can help the medical community to develop answers tohigh priority scientific questions . kaggle . com/allen-institute-for-ai/CORD-19-research-challenge . The available dataset, based on data sources provided by the Center for Security and Emergingtechnology of Georgetown University is composed by a corpus of more than 44,000 full-text documents, aboutCovid-19/SARS-CoV-2 and related coronaviruses.Another relevant competition based on Covid-19 data is the UNCOVER COVID-19 Challenge . kaggle . com/roche-data-science-coalition/uncover . In this case, the objective is modeling solutionsto key questions that were developed and evaluated by a global front-line of healthcare providers, hospitals,suppliers, and policy makers. In this case, the challenge is promoted by Hoffmann-La Roche Limited (RocheCanada). Zindi is the first data-science competition platform in Africa. Zindi hosts an entire data-science ecosystem ofscientists, engineers, academics, companies, NGOs, governments and institutions, focused on solving Africasmost pressing problems. Regarding Covid-19 pandemic, they have open a competition aimed at building anepidemiological model that predicts the spread of Covid-19 throughout the world. The target variable is thecumulative number of deaths caused by COVID-19 in each country by each date. The challenge can be foundat https://zindi . africa/competitions/predict-the-global-spread-of-covid-19/data . This section presents the main available datasets that can be found on the Internet related to Covid-19. Thesection is divided into two parts. First, we present international datasets that provide global informationrelated to the virus impact of each country, such as number of total/new confirmed cases and number oftotal/new confirmed death. Second, we include a number of regional data sets, where local information canbe found. Although the information can be redundant on several data sets, we believe that it could beinteresting to validate the developed models/analysis.
In this section, we briefly introduce the institutions that provide international datasets, including also thelink (URL) to an easier access to them.
Johns Hopkins experts in global public health, infectious disease, and emergency preparedness have been atthe forefront of the international response to Covid-19. JHU provides a daily update of the global map ofthe pandemic, which can be found at https://coronavirus . jhu . edu/map . html .The JHU Covid-19 dataset can be downloaded in .csv format from the Github repository. In this folder,five different .csv files can be downloaded: i) global number of confirmed cases; ii) global number of deaths;iii) global number of recovered; iv) total number of confirmed cases in US; and v) total number of deaths inUS. The global files refer to worldwide Covid-19 data. A reduced number of countries are further dividedinto regions, e.g., China and Australia, whereas, most of them like Spain or Italy, are not. The U.S. data.csv files correspond to the United States. In both cases, all the data refer to accumulated cases, i.e., casesup to the date of the row in which the data is consigned. Furthermore, the geographical coordinates of eachregion/country are also provided.The data plots, which can be recovered at https://coronavirus . jhu . edu/data/new-cases , are obtainedby means of a 5-days moving window, averaging the values of that day, the two days before, and the two https://coronavirus . jhu . edu/ The Geographical Distribution of Covid-19 Worldwide Dataset is sourced from the ECDC, which publishesfull timeseries data for the number of confirmed Covid-19 cases and deaths daily for various countries aroundthe world. On daily basis, the ECDC collects data from 6am to 10am CET and publishes this data viaits Covid-19 dashboard. Then, this dataset is also made publicly available through downloadable files indifferent formats. This dataset can be also downloaded from the open-portal of
Our World in Data . Inparticular, it is possible to recover the following datasets: i) total confirmed cases; ii) total deaths; iii) newconfirmed cases; iv) new deaths; v) all four metrics; and vi) population data.
The Covid-19 Data Hub project makes all the data available at https://github . com/covid19datahub/COVID19 . The dataset includes a large ranges of variables such as Covid-19 variables (confirmed cases, death,etc.), population, density, ICU, number of tests, ventilators, testing policy and contact tracing, among others. The MIDAS network publishes an open dataset with several data resources to study Covid-19 pandemic. The resources are divided into different sections, such as data catalog, parameter estimates, software tools,and documents. In particular, a collection of .csv files can be found in the catalog section about the situationof each country.
Our World in Data
Dataset)
Our World in Data publishes useful information on how the different countries are carrying out laboratorytests to detect Covid-19 cases. The dataset on the number of tests carried out globally is published by
OurWorld in Data in the following GitHub repository Owid/covid.19-data. The majority of the following datasets can be found in GitHub, searching by the term
Covid-19 . For the African continent, reliable datasets can be found at https://github . com/dsfsi/covid19africa asGitHub repository [44]. The Argentina Ministry of Health provides daily updates on the Covid-19 spreading, including data on thenumber of infected people divided by regions. The data can be downloaded from the GitHub repositoryCovid19arData. https://qap . ecdc . europa . eu/public/extensions/COVID-19/COVID-19 . html . ecdc . europa . eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide https://github . com/midas-network/COVID-19 https://ourworldindata . org/covid-testing https://github . com/owid/covid-19-data/tree/master/public/data/testing . argentina . gob . ar/coronavirus/informe-diario .2.3 Australia The Health Department Health Department of the Australian Government publishes the Covid-19 data.The data corresponding to the different regions of Australia can be downloaded from the JHU Githubrepository. The regional Australian Covid-19 data are integrated into the global timeseries .csv files, whichinclude information on confirmed cases, number of deaths, and number of recovered. See section 7.1.1 forfurther details. An additional GitHub repository is available at https://github . com/covid-19-au/covid-19-au . github . io . The data corresponding to the different regions of China can be downloaded from the JHU Github reposi-tory. The regional Covid-19 Chinese data is integrated into the global time series .csv files. Moreover, theNational Health Commission of the People’s Republic of China updates daily the available information onthe situation in China. Relevant information about the pandemic in China can also be found at the MidasGitHub repository: https://github . com/midas-network/COVID-19/tree/master/data/cases/china . The data corresponding to France is provided by the different regions and published by the Public FranceHealth System at the official open data portal . data . gouv . fr/ . Among the different datasetsavailable under the search of the term Covid , three of them are highlighted by the portal (organized into .csvfiles): • Covid-19 Hospital Data: hospitalized cases, ICU cases, deaths per department, region, gender and agerange [21]. • Covid-19 Emergency Room Admissions: hospitalized cases, ICU cases, deaths per department, region,gender and age range [57]. • Covid-19 Laboratory Tests: number of positive and negative laboratory tests per department, gender,and age group [22].Moreover, French Covid-19 datasets can be found in these two GitHub repositories: i) Covid-19 epidemicFrench national data; and ii) Projet d’historisation du nombre de cas par rgion du Covid-19. Finally, thenational mortality register can be accessed at . insee . fr/fr/information/4470857 to compare the number of deaths with previous years. The main official open data provider in Germany is the Robert Koch Institute , a public health institutein Germany. It provides, by means of a catalogue of infectious diseases, pertinent information on eachdisease listed in the catalogue, e.g., SARS. In particular for Covid-19, data on risk assessments, spread ofthe epidemic, epidemiological studies, etc., can be found at . rki . de/D . E . /Content/InfAZ/S/SARS/SARS . html?nn=2386228 . Moreover, it provides also daily reports of Covid-19 outbreak in Germany[33]. Additional data on Covid-19 case numbers in Germany, divided by state over time, can be foundat the GitHub repository https://github . com/jgehrcke/covid-19-germany-gae . The national mortalityregister can be found at Destatis. During this pandemic, it has been reported that Iceland is one the best countries in terms of data on testingpopulation. The Iceland government publishes all information at the following link: . covid . is/data . A GitHub repository can be found at https://github . com/gaui/covid19 . https://github . com/CSSEGISandData/COVID-19/ https://github . com/CSSEGISandData/COVID-19/ http://en . nhc . gov . cn/DailyBriefing . html . rki . de/EN/Home/homepage node . html . . rki . de/D . E . /Content/InfAZ/InfAZ marginal node . html https://ourworldindata . org/covid-testing .2.8 Italy The Italian Civil Protection Department, i.e., the national body in Italy that deals with the prediction,prevention and management of emergency events, daily updates a GitHub repository organized by regions andprovinces, where the Covid-19 time-series can be downloaded ( https://github . com/pcm-dpc/COVID-19 ).The .csv file corresponding to the daily data of each of the 20 Italian regions provides the number ofconfirmed cases, deaths, recovered, hospitalized, confined at home and ICU cases, in addition to the numberof daily tests. Furthermore, GEDI Gruppo Editoriale , a relevant Italian media conglomerate, provides aportal where those data are arranged in several interactive graphs, which include also the impact on the localmobility. The national mortality register of Italy can be consulted in order to evaluate the magnitude ofthe epidemic with respect the number of deaths in previous years.
The official portal for data reports on Covid-19 for Paraguay can be found at . mspbs . gov . py/reporte-covid19 . html . The data are provided by the Public Health system and the reports are stratifiedby age and gender, including data about the number of cases, number of deaths, and recovered people. Fur-thermore, data on Covid-19 spreading in Paraguay can be also found at https://github . com/torresmateo/covidpy-rest/blob/master/data/covidpy . csv as GitHub repository. The information on Covid-19 spreading in South Africa can be found at https://github . com/dsfsi/covid19za [43] [41] as GitHub repository. The repository, named Covid-19 Data for South Africa , is main-tained and hosted by Data Science for Social Impact research group, led by Dr Vukosi Marivate, at theUniversity of Pretoria. These data have been used in [42] to determine what data should be included in apublic repository amidst the COVID-19 outbreak and how this data should be disseminated within a publicdashboard.
Korea Centers for Disease Control and Prevention (KCDC) provides data sets on Covid-19 cases regularlyat . cdc . go . kr/board/board . es?mid=a30402000000&bid=0030 . A specific GitHub repository isavailable at https://github . com/parksw3/COVID19-Korea . The regional Covid-19 Spanish data are collected by the Spanish government and they are available atthe national open data portal https://datos . gob . es/ . Different health datasets can be searched at itsopen data catalogue. The specific search
Covid provides datasets related to the global Spanish dataclassified into regions, e.g.,
Evolucin de enfermedad por el coronavirus (Covid-19) , or specific of a particularSpanish region, e.g.,
Evolucin del coronavirus (Covid-19) en Euskadi . In the GitHub repository https://github . com/datadista/datasets/tree/master/COVID%2019 , the Covid-19 timeseries by regions (CCAA)can be downloaded. Also, auxiliary information, like number of available ICUs per region before the pandemicoutbreak, age distribution of confirmed cases, etc., can be found there. Furthermore, similar data can be alsofound at . epdata . es/ searching by the term Covid-19 . It is important to highlight that each ofthe different regions might report case numbers with different criteria. The national mortality register canbe accessed at MoMo.
The UK government is collecting data and making them officially available by the Public Health England(PHE), i.e., the executive agency of the Department of Health and Social Care in the UK. The PHE tookon the role of the Health Protection Agency, the National Treatment Agency for Substance Misuse anda number of other health bodies. The official open data resource provided by the UK government can Available at https://github . com/pcm-dpc/COVID-19/raw/master/dati-regioni/dpc-covid19-ita-regioni . csv . https://lab . gedidigital . it/gedi-visual/2020/coronavirus-in-italia/ https://datos . gob . es/es/catalogo?theme id=salud
20e found at . gov . uk/government/publications/covid-19-track-coronavirus-cases . Thisdashboard is showing reported cases by Upper Tier Local Authority in England (UTLA). An Excel filewith relevant information can be downloaded from the dashboard. The information is organized at dif-ferent levels: i) total number of confirmed cases and deaths in the UK ; ii) deaths by country : England,Scotland, Wales and North Ireland; iii) deaths by NHS regions : London, South East, South West, Eastof England, Midlands, North East and Yorkshire, North West; and iv) deaths by UTLA authorities : dailycases at each of more than 149 different UTLAs. A description of how the confirmed and deaths casesare counted is also available at . gov . uk/guidance/coronavirus-covid-19-information-for-the-public . The .csv files corresponding to the number of confirmed casesand deaths can also be downloaded from the official public health system. Additional datasets report-ing the UK Covid-19 cases can be found at https://github . com/tomwhite/covid-19-uk-data as GitHubRepository. The national mortality register can be found at Office for National Statistics. The data corresponding to the United States can be obtained from the dataset from Johns Hopkins University Center for Systems Science and Engineering (JHUCSSE). This dataset is available as GitHub repository at https://github . com/CSSEGISandData/COVID-19/ ,which is daily updated by JHU-CSSS itself [29]. Another relevant source for the U.S. is the Centers for DiseaseControl and Prevention (CDC). This entity publishes different data on the Covid-19 cases by state andauxiliary information as the number of tests carried out. The CDC also publishes weekly surveillance reports,which can be found at . cdc . gov/coronavirus/2019-ncov/cases-updates/ . Moreover, theCOVID Tracking Project collects and publishes the testing data available for the US states and territories,divided by states. Similar information can be obtained from https://coronavirus . . com/en ,including Canada. Last, the New York Times is releasing a series of data files with cumulative countsof Covid-19 cases in the US, at state and county level, over time. These data can be found at https://github . com/nytimes/covid-19-data as Github repository. Source
GitHub repositories
Argentina
Ministry of Health Covid19arData
Australia
Australian Health Department covid-19-au
China
China National Health Commission JHU, Midas-China
France
Public France Health System opencovid19-fr, FRANCE-COVID-19
Germany
Robert Koch Institute covid-19-germany-gae
Iceland
Government of Iceland gaui-covid19
Italy
Italian Civil Protection Department pcm-dpc
Paraguay
Ministry of Public Health and Soc. Welfare covidpy-rest
South Africa
National Inst. Communicable Diseases covid19za
South Korea
Centers for Disease Control and Prevention COVID19-Korea
Spain
Ministry of Health datadista-Covid-19
United Kingdom
Pubic Health England covid-19-uk-data
United States
Centers for Disease Control and Prevention JHU, NytimesTable 1: Some examples of regional Covid-19 data resources.
In this section, we include datasets relevant for the study and development of models of Covid-19, such asdemography, government measures, weather, and climate data. These are variables that are under researchto evaluate their influence on the virus propagation. . gov . uk/government/publications/covid-19-track-coronavirus-cases . cdc . gov/ https://covidtracking . com/data .1 Demographics Datasets Demographics datasets are of significant importance for COVID-19 analysis. In this section, they have beenarranged in three main groups: i) population; ii) population density; and iii) age structure. We highlight thefollowing datasets on population: • European Countries population : the dataset Eurostat
Population on 1 January dataset from theEU open data portal, available at https://appsso . eurostat . ec . europa . eu/nui/show . do?dataset=demo pjan&lang=en , provides the population information per country at 2019. • Global population : the Population reference Bureau has published at . prb . org/worldpopdata/ • List of countries by their population 2020 : the global population at 2020 per country can be retrievedon Kaggle. The dataset contains not only population values but also other features for each country.Moreover, there are also some portals that provide demographic information, e.g. population pyramid as at . populationpyramid . net/ .For the population density, the European Environment Agency provided the Population density dis-aggregated with Corine land cover 2000 dataset as a GeoTiFF format file, which can be found at . eea . europa . eu/data-and-maps/data/population-density-disaggregated-with-corine-land-cover-2000-2 .Last, about age structure, Our World in Data provides a report on the present situation on the planet,divided by countries, [59]. The corresponding dataset is made available at https://ourworldindata . org/age-structure . For datasets related to the government measures, ACAPS publishes reports and datasets on governmentmeasures on Covid-19 at . acaps . org/projects/covid19 (see section 5.10 for further deatils).In particular, updated reports can be downloaded from [3]. Moreover, the ACAPS https://covidtracker . bsg . ox . ac . uk/about-api whereas the full dataset can be found at . bsg . ox . ac . uk/research/research-projects/oxford-covid-19-government-response-tracker .The EpidemicForecasting.org website provides a dataset on mitigation measures carried out by countries,which can be found at http://epidemicforecasting . org/about . In addition to that, they provide a sim-ulator, i.e. the GLEAMviz simulator, which allows to explore realistic epidemic spreading scenarios at theglobal scale. An application named CHIME, i.e., COVID-19 Hospital Impact Model for Epidemics, has been developedby the Penn Medicine academic medical center from the University of Pennsylvania. This app is designed toassist hospitals and public health officials to understand hospital capacity needs as they relate to the COVID-19 pandemic. The application is based on a data model available at https://code-for-philly . gitbook . io/chime/ .Last, the Open Government Partnership (OPG) organization has created a list of open government ap-proaches to fight Covid-19 available at . opengovpartnership . org/collecting-open-government-approaches-to-covid-19/ . In particular, these approaches are organized by country and regions, and abrief description and related URL are also provided for each one. . kaggle . com/tanuprabhu/population-by-country-2020 . gleamviz . org/simulator/ .3 Weather DataSets and Applications In this section, we focus on datasets related to weather, which are provided by several organizations allaround the world, as described in the follows.The first group of organizations providing weather datasets are the following EU providers: • European Centre for Medium-Range Weather Forecasts (ECMWF) : is a research institute and an oper-ational service, producing global numerical weather predictions and other data. It operates two servicesfrom the EUs Copernicus Earth observation programme, the Copernicus Atmosphere Monitoring Ser-vice (CAMS) and the Copernicus Climate Change Service (C3S). Two main services are provided by theECMWF. The first one is the European Climate Data Store: The European Commission has entrustedECMWF with the implementation of the Copernicus Climate Change Service (C3S). The mission of C3Sis to provide authoritative, quality-assured information to support adaptation and mitigation policiesin a changing climate. At the heart of the C3S infrastructure is the Climate Data Store (CDS), whichprovides information about the past, present and future climate in terms of Essential Climate Vari-ables (ECVs) and derived climate indicators. The second ECMWF service is the Copernicus ClimateChange Service (C3S*), which has worked with environmental software experts B-Open to developan application that allows health authorities and epidemiology centres to explore whether temperatureand humidity affect the spread of the coronavirus. This application is freely accessible from the C3SClimate Data Store [10]. • European Commissions Joint Research Centre (JRC) : different open-data projects at JRC can be ofinterest for the scientific community fighting Covid-19. We highlight here the most relevant one, rep-resented by the Photovoltaic Geographical Information System (PVGIS). The focus of PVGIS is theresearch in solar resource assessment, photovoltaic (PV) performance studies, and the dissemination ofknowledge and data about solar radiation and PV performance. The PVGIS web application allows toaccess to meteorological data pertinent to the study of the seasonal behaviour of the pandemic. Threetools are available: i) Photovoltaic Performance; ii) Solar Radiation; and iii) Typical MeteorologicalYear (TMY tool).The second group of organizations providing weather datasets are US providers: i) the National Oceanicand Atmospheric Administration (NOAA); and ii) the National Aeronautics and Space Administration(NASA). NOAA is an American scientific agency within the United States Department of Commerce thatfocuses on the conditions of the oceans, major waterways, and the atmosphere. It provides throught its openclimate data portal free access to global historical weather and climate data, in addition to station historyinformation. These data include quality controlled daily, monthly, seasonal, and yearly measurements oftemperature, precipitation, wind, etc.On the other hand, NASA’s goal in Earth science is to observe, understand, and model the Earth systemto discover how it is changing. From an open-data perspective, NASA’s project Prediction of WorldwideEnergy Resource (POWER) can be very useful to recollect time series and monthly means of the mostrelevant weather and climate variables for a given location. POWER project was initiated to improveupon the current renewable energy data set and to create new data sets from new satellite systems. ThePOWER project targets three user communities: (1) Renewable Energy; (2) Sustainable Buildings; and (3)Agroclimatology. The access to the information can be done through the Data Access Viewer at https://power . larc . nasa . gov/data-access-viewer/ , which is a responsive web mapping application providingdata sub-setting, charting, and visualization tools in an easy-to-use interface.Last, there are many online APIs that provide weather data . Some of them can be used free of chargefor a limited number of requests. As an example, see World Weather online . This section includes datasets related to mobility of people. https://cds . climate . copernicus . eu/ . bopen . eu/ https://ec . europa . eu/jrc/en/pvgis Climate Data Online (CDO): . ncdc . noaa . gov/cdo-web/ . https://power . larc . nasa . gov/ See, for example, the list presented in https://datarade . ai/data-categories/weather-data/overview . . worldweatheronline . com/developer/api/historical-weather-api . aspx Mobility reports:
Google has developed Covid-19 Community Mobility Reports, in which each reportis broken down by location and displays the change in visits to places, like grocery stores and parks.The reports can be obtained by location at . google . com/covid19/mobility/ . As a result,a PDF document can be downloaded containing figures and trends. A similar tool has been developedby Apple and it can be found at . apple . com/covid19/mobility . The reports can beobtained filtering by country. One important difference with respect to the Google app is that the rawdata can be retrieved in the form of .csv files. Last, the GeoDS Lab (Department of Geography atUniversity of Wisconsin-Madison) has developed a web application to identify mobility pattern changesin the U.S. [23]. The application can be accessed at https://geods . geography . wisc . edu/covid19/physical-distancing/ . • Aiport connectivity:
FLIRT is a tool that allows to get data about commercial flights. It showsdirect flights from a selected location, and can simulate passengers taking multi-leg itineraries. Thedata can be downloaded in different formats (.csv, JSON, etc.) at https://flirt . eha . io/ • Contact tracing:
Another important source of data related to mobility for modeling the pan-demic is human behaviour inferred from wireless technologies, such as cell communications, WiFiand Bluetooth, among others. On this line, CRAWDAD is the Community Resource for ArchivingWireless Data At Dartmouth, a wireless network data resource for the research community. Thisrepository contains wireless trace data from many contributing locations, and staff to develop bet-ter tools for collecting, anonymizing, and analyzing the data. The repository can be accessed at http://crawdad . org/index . html and it allows to filter the data, for instance, Human Behavior Mod-eling and Opportunistic Connectivity, among other fields. To maximize the value of the data sources about Covid-19, it is necessary that data sources are not onlyavailable but also have a set of characteristics that make them reusable. Due to the global affection of thepandemic, data sources are most of the cases coming from public institutions. These open government datashould follow the eight principles of open data as reported in [61].MELODA 5 [1] is a metric to assess the reusability of an open data datasets. This metric considers 8dimensions that affects the reusability of a dataset, which are listed hereafter:1.
Legal license : assesses the legal rights given to the reusers of the dataset.2.
Technical format : assesses the digital storage format in which the data is stored and released.3.
Access : assesses the possibilities offered to reusers to interact with the dataset to retrieve the necessaryset of data.4.
Standardization : assesses how popular and agreed are the fields composing the dataset and its descrip-tion.5.
Geolocalization : assesses the geographical content of the released data.6.
Updating frequency : assesses the frequency of updating of the dataset.7.
Dissemination : assesses the efforts and resources done by the publishing entity to makes popular thereleased datasets.8.
Prestige : assesses the reputation of he publishing entity for the reusers of their data. According to these dimensions, the assessment of the main data sources mentioned in previous sections isreported in the following list: • : 31 • Our world in data. Coronavirus Source Data : 34 For Covid-19, this dimension cannot be set due to the novelty of the phenomenon. In this list the prestige dimension has not been removed and therefore there are 6 points of difference with next table. Argentina : 30 • Australia : 34 • China : 31 • Italy : 38 • France : 34 • Germany : 37 • Paraguay : 25 • South Africa : 40 • Spain : 37 • United Kingdom : 35 • United States : 32 AR AU CN DE FR GB IT JHU OWID PY SP US ZALicense
Technical Format
Access
Standarization
Geolocalization
Updating Frequency
Dissemination
TOTAL
24 25 25 31 28 29 32 25 28 19 31 26 34
Table 2: Total score corresponding to the first 7 reusability dimensions of MELODA 5 for different openinstitutional data sources. (AR: Argentina; AU: Australia; CN: China; DE: Germany; FR: France; GB:United Kingdom; IT: Italy; JHU: Johns Hopkins University; OWID: Our World In Data; PY: Paraguay; SP:Spain; US: United States; ZA: South Africa).Although the maximum score for MELODA 5 is 61 points [1], in Table 2 the
Prestige dimension of thepublishing institution regarding Covid-19 is not included due to the novelty of the situation. To obtain a faircomparison, this criterion has been removed from the analysis, thus leaving only 7 dimensions for the datasources. Accordingly, a maximum of 55 points can be achieved. From this table, it is clear that none of thesources score results higher than 35 points, a value that can be considered good but far from optimum. Weshall highlight that some sources are releasing their data with a license that restrict commercial use (they arenot open data). Hence, a score of 1 has been set for them on
License dimension. Another remarkable pointis the general lack of an API to access individual data in the data sources. This forces the reusers to updatethe full dataset on a daily basis. For this reason, most of the data sources score 1 in
Access dimension. It isalso remarkable the general lack of geolocalization contents for most of the data sources. A mere indicationof the region/area is the most common geographic content. Consequently, 3 is the more frequent score for
Geolocalization dimension. Regarding technical format, .csv is the most popular, together with some sourcesusing JSON file formats. This last format provides additional key identification for each value. Althoughmany sources include a definition of the field, no Standardization effort is detected for sharing the sameinformation between sources. In fact, there is a myriad of different field names and contents. Hence, the
Dissemination dimension score has been considered the maximum for those sources that have a website todisseminate the data sources. See definition of open data at . In this paper, we provide a review of relevant open data sources for better understanding the worldwide spreadof the Covid-19. We enumerate the variables required to obtain consistent epidemiological and forecastingmodels. In particular, we focus not only on the specific Covid-19 timeseries but also on a set of auxiliaryvariables related to the study of its potential seasonal behaviour, the effect of age structure and prevalenceof secondary health conditions in the mortality, the effectiveness of government actions, etc.We analyze the present situation of the available Covid-19 open data. Unfortunately, it is far from idealbecause of a good number of issues like data inconsistency, changing criteria, a large diversity of sources,non-comparable metrics between countries, delays, etc. Despite the difficulties, the availability of opendata resources on Covid-19 and related variables provides many opportunities to different communities. Inparticular, epidemiologists, data-driven researches, health care specialists, machine learning community, datascientists, etc. With the goal of facilitating these communities the access to the required open-sources, weidentify the principal open data entities pertinent to the study of Covid-19. Furthermore, we enumeratedifferent open datasets, and their corresponding repositories, related to Covid-19 cases at a worldwide scale,but also at a regional/local level. In addition, we provide specific information about the data resources for aselection of countries that have been selected because of the intensity with which the pandemic has impactedthem, or for their relevance in the seasonal study of Covid-19, e.g., south-hemisphere countries. Finally, weprovide other open resources that facilitate the incorporation of demographics, weather and climate variables,etc.
Acknowledgments
The authors belong to the
CONCO-Team (CONtrol COvid19 Team) and would like to thank the restof its members for their support . In addition, we would like to thank other contributors, such as Mr.Nadir Bouchama
Researcher at Centre de Reserche Sur L’Information Scientifique et Technique (Algeria);
Dr. Ejay Nsugbe researcher at Collins Aerospace UK;
Dr. Federica Garin researcher at Gipsa-Lab,Grenoble, France;
Dr. Vukosi Marivate , Senior Lecturer at Department of Computer Science, Republic ofSouth Africa;
Dr. Terrence Patrick McGarty , Research Associate at Research Laboratory of Electronics,Massachusetts Institute of Technology, USA;
Dr. Sriram Gubbi medical doctor at National Institute ofDiabetes and Digestive and Kidney Diseases (NIDDK), USA;
Dr. Ram´on B´ejar
Associate Professor atDepartment of Computer and Industrial Engineerings, University of Lleida, Spain, and
Dr. Thomas Meu-nier , Associate Research at Department of Physical Oceanography, Woods Hole Oceanographic Institution,USA.
References [1] Alberto Abella, Marta Ortiz-de Urbina-Criado, and Carmen De-Pablos-Heredero. Meloda 5: a metricto assess open data reusability. . elprofesionaldelainformacion . com/contenidos/2019/nov/abella-ortiz-pablos . pdf , 2019.[2] ACAPS. COVID19 government measures dataset. . acaps . org/covid19-government-measures-dataset , 2020.[3] ACAPS. Report on COVID19 government measures updates. . acaps . org/special-report/covid-19-government-measures-update , 2020.[4] Roy M. Anderson, Hans Heesterbeek, Don Klinkenberg, and T. D´eirdre Hollingsworth. How will country-based mitigation measures influence the course of the COVID-19 epidemic? The Lancet , 395(10228):931–934, 2020.[5] Yan Bai, Lingsheng Yao, Tao Wei, Fei Tian, Dong-Yan Jin, Lijuan Chen, and Meiyun Wang. Pre-sumed asymptomatic carrier transmission of COVID-19.
Research Letter: https: // jamanetwork . com/journals/ jama/ article-abstract/ 2762028 , 2020. The composition and goals of CONCO-team can be found at https://github . com/CONCO-Team/CONtrol-COvid19-TEAM/blob/master/Conco Team Members Goals and Contributions . pdf . The Lancet infectious diseases , 2020.[7] Giuseppe C. Calafiore, Carlo Novara, and Corrado Possieri. A modified SIR model for the COVID-19contagion in Italy, 2020.[8] European Centre for Medium-Range Weather Forecasts. Ecmwf Forecasts. . ecmwf . int/en/forecasts , 2020.[9] K.H. Chan, J.S. Peiris, S.Y. Lam, L.L.M. Poon, K.Y. Yuen, and W.H. Seto. The effects of temperatureand relative humidity on the viability of the SARS coronavirus. Advances in Virology , 2011, 2011.[10] Copernicus Climate Change Service. C3S helps health experts explore how temperatureand humidity affect virus spread. https://climate . copernicus . eu/c3s-helps-health-experts-explore-how-temperature-and-humidity-affect-virus-spread?q=coronavirus-and-climate-c3s-helps-health-experts-explore-how-temperature-and-humidity-affect-virus&utm campaign=COVID19&utm medium=posts&utm source=social media&fbclid=IwAR2h6Z3mUf1L7AvHhyOJ6F2PKIB4gmYIJyZuCS8sVaWgMIxlmjbQe jkYaU , 2020.[11] Imperial College London. Report 13 - Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries. . imperial . ac . uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe-npi-impact/ , 2020.[12] IHME COVID, Christopher J.L. Murray, et al. Forecasting COVID-19 impact on hospital bed-days,ICU-days, ventilator-days and deaths by US state in the next 4 months. medRxiv , 2020.[13] Raj Dandekar and George Barbastathis. Neural Network aided quarantine control model estimation ofCOVID spread in Wuhan, China. arXiv , pages arXiv–2003, 2020.[14] Open Data Watch. What is being said: data in the time of COVID-19. https://opendatawatch . com/what-is-being-said/data-in-the-time-of-covid-19/ , 2020.[15] Steffen E. Eikenberry, Marina Mancuso, Enahoro Iboi, Tin Phan, Keenan Eikenberry, Yang Kuang, EricKostelich, and Abba B. Gumel. To mask or not to mask: modeling the potential for face mask use bythe general public to curtail the COVID-19 pandemic. arXiv preprint arXiv:2004.03251 , 2020.[16] European Centre for Disease Prevention and Control. Situation dashboard: latest available data. https://qap . ecdc . europa . eu/public/extensions/COVID-19/COVID-19 . html , 2020.[17] Yaqing Fang, Yiting Nie, and Marshare Penny. Transmission dynamics of the COVID-19 outbreak andeffectiveness of government interventions: a data-driven analysis. Journal of Medical Virology , 2020.[18] Luca Ferretti, Chris Wymant, Michelle Kendall, Lele Zhao, Anel Nurtay, Lucie Abeler-D¨orner, MichaelParker, David Bonsall, and Christophe Fraser. Quantifying SARS-CoV-2 transmission suggests epidemiccontrol with digital contact tracing.
Science , page eabb6936, mar 2020.[19] Paul Fine, Ken Eames, and David L. Heymann. herd immunity: a rough guide.
Clinical InfectiousDiseases , 52(7):911–916, 2011.[20] Seth Flaxman, Swapnil Mishra, Axel Gandy, et al. Estimating the number of infections and the impact ofnon-pharmaceutical interventions on COVID-19 in 11 European countries.
Imperial College COVID-19Response Team , 30, 2020.[21] Sant´e Publique France. Donnes hospitalires relatives l’pidmie de COVID-19. . data . gouv . fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/ ,2020.[22] Sant´e Publique France. Donnes relatives aux tests de dpistage de COVID-19 raliss en laboratoire deville. . data . gouv . fr/fr/datasets/donnees-relatives-aux-tests-de-depistage-de-covid-19-realises-en-laboratoire-de-ville/ , 2020.2723] Song Gao, Jinmeng Rao, Yuhao Kang, Yunlei Liang, and Jake Kruse. Mapping county-level mobilitypattern changes in the United States in response to COVID-19. Available at SSRN 3570145 , 2020.[24] Giulia Giordano, Franco Blanchini, Raffaele Bruno, Patrizio Colaneri, Alessandro Di Filippo, AngelaDi Matteo, and Marta Colaneri. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy.
Nature Medicine , pages 1–6, 2020.[25] Antonio G´omez Exp´osito, Jos´e Antonio Rosendo Mac´ıas, and Miguel ´Angel Gonz´alez Cagigal. Modeladoy an´alisis de la evoluci´on de una epidemia v´ırica mediante filtros de Kalman: el caso del COVID-19 enEspa˜na. Technical report, Universidad de Sevilla, 2020.[26] World Health Organization. Coronavirus disease 2019 - Situation reports. . who . int/emergencies/diseases/novel-coronavirus-2019/situation-reportst , 2020.[27] World Health Organization. Myth busters. . who . int/emergencies/diseases/novel-coronavirus-2019/advice-for-public/myth-busters , 2020.[28] Joel Hellewell, Sam Abbott, Amy Gimma, Nikos I. Bosse, Christopher I. Jarvis, Timothy W. Russell,et al. Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts. The LancetGlobal Health , 8(4):e488–e496, apr 2020.[29] John Hopkins Univeristy Center for Systems Science and Engineering. Coronavirus COVID-19 global cases by the Center for Systems Science and Engineering (CSSE) at Johns Hop-kins University (JHU). https://gisanddata . maps . arcgis . com/apps/opsdashboard/index . html , 2020.[30] Yunpeng Ji, Zhongren Ma, Maikel P. Peppelenbosch, and Qiuwei Pan. Potential association betweenCOVID-19 mortality and health-care resource availability. The Lancet Global Health , 8(4):e480, 2020.[31] Junaid Kashir and Ahmed Yaqinuddin. Loop mediated isothermal amplification (lamp) assays as a rapiddiagnostic for covid-19.
Medical Hypotheses , page 109786, 2020.[32] Bennett Kleinberg, Isabelle van der Vegt, and Maximilian Mozes. Measuring emotions in the COVID-19real world worry dataset. arXiv preprint arXiv:2004.04225 , 2020.[33] Robert Koch Institut. Coronavirus disease 2019 (COVID-19) daily situation report ofthe Robert Koch Institute. . rki . de/DE/Content/InfAZ/N/Neuartiges Coronavirus/Situationsberichte/Archiv . html , 2020.[34] S. Lakshmi Priyadarsini and M. Suresh. Factors influencing the epidemiological characteristics of pan-demic COVID-19: a TISM approach. International Journal of Healthcare Management , pages 1–10,2020.[35] Stephen A. Lauer, Kyra H. Grantz, Qifang Bi, Forrest K. Jones, Qulu Zheng, Hannah R. Meredith,Andrew S. Azman, Nicholas G. Reich, and Justin Lessler. The incubation period of coronavirus disease2019 (COVID-19) from publicly reported confirmed cases: estimation and application.
Annals of InternalMedicine , 2020.[36] Tung Thanh Le, Zacharias Andreadakis, Arun Kumar, Ra´ul G´omez Rom´an, Stig Tollefsen, MelanieSaville, and Stephen Mayhew. The COVID-19 vaccine development landscape.
Nature Reviews DrugDiscovery , 10, 2020.[37] Char Leung. Risk factors for predicting mortality in elderly patients with COVID-19: a review of clinicaldata in China.
Mechanisms of Ageing and Development , page 111255, 2020.[38] Ying Liu, Albert A. Gayle, Annelies Wilder-Smith, and Joacim Rockl¨ov. The reproductive number ofCOVID-19 is higher compared to SARS coronavirus.
Journal of Travel Medicine , 2020.[39] Anice C. Lowen, Samira Mubareka, John Steel, and Peter Palese. Influenza virus transmission is depen-dent on relative humidity and temperature.
PLoS Pathog , 3(10):e151, 2007.2840] Parikshit Mahalle, Asmita B Kalamkar, Nilanjan Dey, Jyotismita Chaki, Gitanjali R Shinde, et al.Forecasting models for coronavirus (covid-19): A survey of the state-of-the-art.
TechRxiv , 2020.[41] Vukosi Marivate and Herkulaas MvE Combrink. Use of available data to inform the covid-19 outbreakin south africa: A case study.
Data Science Journal , 19(1), 2020.[42] Vukosi Marivate and Herkulaas MvE Combrink. Use of available data to inform the covid-19 outbreakin south africa: A case study, 2020.[43] Vukosi Marivate, Alta de Waal, Herkulaas Combrink, Ofentswe Lebogo, Shivan Moodley, NompumeleloMtsweni, Vuthlari Rikhotso, Jay Welsh, and S’busiso Mkhondwane. Coronavirus disease (COVID-19)case data - South Africa, 2020.[44] Vukosi Marivate, Elaine Nsoesie, Esube Bekele, and Africa open COVID-19 data working group. Coro-navirus COVID-19 (2019-nCoV) Data Repository for Africa, 2020.[45] Maia Martcheva.
An introduction to mathematical epidemiology . Springer Science and Business MediaLLC, 2015.[46] Kenji Mizumoto, Katsushi Kagaya, Alexander Zarebski, and Gerardo Chowell. Estimating the asymp-tomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruiseship, Yokohama, Japan, 2020.
Eurosurveillance , 25(10), 2020.[47] Stephen J. Mooney, Daniel J Westreich, and Abdulrahman M. El-Sayed. Epidemiology in the era of bigdata.
Epidemiology (Cambridge, Mass.) , 26(3):390, 2015.[48] United Nations. Publish existing data following open data guidelines. https://covid-19-response . unstatshub . org/open-data/publish-existing-data-as-open-data/ , 2020.[49] Trieu Nguyen, Dang Duong Bang, and Anders Wolff. 2019 novel coronavirus disease (covid-19): pavingthe road for rapid detection and point-of-care diagnostics. Micromachines , 11(3):306, 2020.[50] Nuria Oliver, Xavier Barber, Kirsten Roomp, and Kristof Roomp. The covid19 impact survey: assessingthe pulse of the COVID-19 pandemic in Spain via 24 questions. arXiv preprint arXiv:2004.01014 , 2020.[51] Nuria Oliver, Bruno Lepri, Harald Sterly, Renaud Lambiotte, S´ebastien Delataille, Marco De Nadai,Emmanuel Letouz´e, Albert Ali Salah, Richard Benjamins, Ciro Cattuto, et al. Mobile phone data forinforming public health actions across the covid-19 pandemic life cycle, 2020.[52] Nuria Oliver, Emmanuel Letouz´e, Harald Sterly, S´ebastien Delataille, Marco De Nadai, Bruno Lepri,Renaud Lambiotte, Richard Benjamins, Ciro Cattuto, Vittoria Colizza, Nicolas de Cordes, Samuel P.Fraiberger, Till Koebe, Sune Lehmann, Juan Murillo, Alex Pentland, Phuong N. Pham, Frdric Pivetta,Albert A. Salah, Jari Saramki, Samuel V. Scarpino, Michele Tizzoni, Stefaan Verhulst, and PatrickVinck. Mobile phone data and COVID-19: missing an opportunity? arXiv preprint arXiv:2003.12347 ,2020.[53] Graziano Onder, Giovanni Rezza, and Silvio Brusaferro. Case-fatality rate and characteristics of patientsdying in relation to COVID-19 in Italy.
JAMA , 2020.[54] Noah C. Peeri, Nistha Shrestha, Siddikur Rahman, Rafdzah Zaki, Zhengqi Tan, Saana Bibi, MahdiBaghbanzadeh, Nasrin Aghamohammadi, Wenyi Zhang, and Ubydul Haque. The SARS, MERS andnovel coronavirus (COVID-19) epidemics, the newest and biggest global health threats: what lessonshave we learned?
International Journal of Epidemiology , 2020.[55] Emanuele Pepe, Paolo Bajardi, Laetitia Gauvin, Filippo Privitera, Brennan Lake, Ciro Cattuto, andMichele Tizzoni. COVID-19 outbreak response: a first assessment of mobility changes in Italy followingnational lockdown. medRxiv , 2020.[56] Gaetano Perone. An ARIMA model to forecast the spread of COVID-2019 epidemic in Italy. arXivpreprint arXiv:2004.00382 , 2020. 2957] Sant´e Publique France. Dones des urgences hospitalires et de SOS mdecins relatives l’pidmie de COVID-19. . data . gouv . fr/en/datasets/donnees-des-urgences-hospitalieres-et-de-sos-medecins-relatives-a-lepidemie-de-covid-19/ , 2020.[58] Arni SR Srinivasa Rao and Jose A Vazquez. Identification of covid-19 can be quicker through artificialintelligence framework using a mobile phone-based survey in the populations when cities/towns are underquarantine. Infection Control & Hospital Epidemiology , pages 1–18, 2020.[59] Hannah Ritchie and Max Roser. Age structure.
Our World in Data , 2020.https://ourworldindata.org/age-structure.[60] Mohammad M. Sajadi, Parham Habibzadeh, Augustin Vintzileos, Shervin Shokouhi, Fernando Miralles-Wilhelm, and Anthony Amoroso. Temperature and latitude analysis to predict potential spread andseasonality for Covid-19.
Available at SSRN 3550308 , 2020.[61] Joshua Tauberer and Larry Lessig. The 8 principles of open government data. . opengovdata . org/home/8principles , 2007.[62] UNESCO. Covid-19 educational disruption and response. https://en . unesco . org/covid19/educationresponse , 2020.[63] Jennifer Valentino-DeVries, Denise Lu, and Gabriel J.X. Dance. Location data says it all: staying athome during coronavirus is a luxury. The New York Times , 2020.[64] Jingyuan Wang, Ke Tang, Kai Feng, and Weifeng. High temperature and high humidity reduce thetransmission of COVID-19.
Available at SSRN: https: // ssrn . com/ abstract= 3551767 , 2020.[65] Jingyuan Wang, Ke Tang, Kai Feng, and Weifeng. When is the COVID-19 pandemic over? Evidencefrom the stay-at-home policy execution in 106 Chinese cities. Available at SSRN: https: // ssrn . com/abstract= 3561491 , 2020.[66] Yishan Wang, Hanyujie Kang, Xuefeng Liu, and Zhaohui Tong. Combination of RT-qPCR testing andclinical features for diagnosis of COVID-19 facilitates management of SARS-CoV-2 outbreak. Journalof Medical Virology , 2020.[67] Juanjuan Zhang, Maria Litvinova, Yuxia Liang, Yan Wang, Wei Wang, Shanlu Zhao, Qianhui Wu,Stefano Merler, C´ecile Viboud, Alessandro Vespignani, Marco Ajelli, and Hongjie Yu. Changes incontact patterns shape the dynamics of the covid-19 outbreak in china.
Science , 2020.[68] Sheng Zhang, MengYuan Diao, Wenbo Yu, Lei Pei, Zhaofen Lin, and Dechang Chen. Estimation of thereproductive number of novel coronavirus (COVID-19) and the probable outbreak size on the DiamondPrincess cruise ship: a data-driven analysis.
International Journal of Infectious Diseases , 93:201–204,2020.[69] Fei Zhou, Ting Yu, Ronghui Du, Guohui Fan, Ying Liu, Zhibo Liu, Jie Xiang, Yeming Wang, Bin Song,Xiaoying Gu, Lulu Guan, Yuan Wei, Hui Li, Xudong Wu, Jiuyang Xu, Shengjin Tu, Yi Zhang, HuaChen, and Bin Cao. Clinical course and risk factors for mortality of adult inpatients with COVID-19 inWuhan, China: a retrospective cohort study.