Comparative visualization of epidemiological data during various stages of a pandemic
CComparative visualization of epidemiological data during various stages of a pandemic
Thomas Kreuz ∗ Institute for Complex Systems, CNR, Sesto Fiorentino, Italy (Dated: February 24, 2021)After COVID-19 was first reported in China at the end of 2019, it took only a few months forthis local crisis to turn into a global pandemic with unprecedented disruptions of everyday life.However, at any moment in time the situation in different parts of the world is far from uniform andeach country follows its own epidemiological trajectory. In order to keep track of the course of thepandemic in many different places at the same time, it is vital to develop comparative visualizationsthat facilitate the recognition of common trends and divergent behaviors. Similarly, it is importantto always focus on the information that is most relevant at any given point in time. In this study welook at exactly one year of daily numbers of new cases and deaths and present data visualizations thatcompare many different countries and are adapted to the overall stage of the pandemic. During theearly stage when cases and deaths still rise we focus on the time lag relative to the current epicenterof the pandemic and the doubling times. Later we monitor the rise and fall of the daily numbers viawave detection plots. The transition between these two stages takes place when the daily numbersstop rising for the first time.
I. INTRODUCTION
Human cases of COVID-19, the disease caused by thenovel coronavirus Severe Acute Respiratory SyndromeCorona-Virus 2 (SARS-CoV-2), were first reported inWuhan City, China, in December 2019 [1]. After initialtransmissions were restricted to Central China’s Hubeiprovince, already by January 2020 first cases had beenreported not only in other Asian countries (starting withTaiwan, South Korea and Japan) but also in Australia,the US and several European countries [2]. Similarlyto what had happened before in China [3, 4], withinFebruary 2020 first cases turned into first deaths [5]and South Korea, Iran and increasingly Italy emergedas early hotspots outside of China where the epidemicnow seemed to be under control [6]. On March 11,2020, the Director General of the World Health Organ-isation (WHO) declared the novel coronavirus outbreaka worldwide pandemic [7]. A few days later Italy be-came the clear global epicenter as the first country tosurpass China in number of deaths [8]. Since then thecoronavirus has spread across the globe with an unprece-dented impact on healthcare [9], economy [10], finances[11], science [12, 13], education [14], travel [15], sports[16], mental health [17] and basically all other sectors ofsociety.Already on January 22, 2020, the Center for SystemsScience and Engineering (CSSE) at John Hopkins Uni-versity (Baltimore, MD, USA) started publishing a freelyavailable COVID-19 Data Repository that was updateddaily [18]. Once global datasets like this one becamepublicly available, the scientific community sprang intoaction and within a short time a host of studies appearedthat focused to a large extent on modeling the data andusing these models to predict the future course of the ∗ Electronic address: [email protected] pandemic (e.g., [19–22]).While modelling the Covid-19 pandemic has alreadyyielded some important results and policy recommenda-tions [23–25] and much work still remains to be done[26, 27], extrapolation based on often incomplete, inac-curate or unreliable data does also have its limitationsand pitfalls [28]. In this article we refrain from makinginferences about the future but rather restrict ourselvesto pure visualization of past and present data [29]. Theaim, in a nutshell, is to develop a simple and consistentcomparative data visualization framework that is gen-eral and adapted to the various stages of a pandemic,thereby summarizing in one sweep the dynamic and het-erogeneous situation worldwide.First, we focus on visualizations that allow a meaning-ful comparison of the course of the pandemic for manydifferent countries (or on smaller spatial scales: states,regions etc.) at the same time. This is in contrast to thecommonly used histograms in which the temporal profileof the data is plotted for one country at a time. Second,we argue that the most relevant information to be foundin the data changes between the early and the later stagesof a pandemic and accordingly we present two differentkind of data visualizations.In the early stage (as long as cases and deaths con-tinue to rise) it is most important to monitor both thetime lag compared to the current epicenter (typically thecountry where the epidemic first took hold but later thiscan shift) and the severity of the spread of the disease(usually expressed by means of the doubling time). Bothof these quantities provide very useful information aboutthe urgency of the situation [30] and can help with gen-eral decision making in order to find the right momentfor the implementation of preventive measures [31, 32].On the other hand, the later stages are more aboutmonitoring the course of the pandemic in each country interms of peaks, valleys and plateaus. This is when thereis often a back and forth between imposing, tighteningand relaxing of contact restrictions depending on the tra- a r X i v : . [ phy s i c s . d a t a - a n ] F e b jectory of the epidemiological dynamics in the populationat that point in time [33, 34]. The transition betweenthese two stages takes place when the daily numbers ofcases and deaths stop rising for the first time [35].The remainder of this article is organized as follows:First in Section II we describe the dataset and the prepro-cessing performed. The two Method Sections III A andIII B illustrate the quantities, graphs and sorting crite-ria we will use to visualize the data in the early and thelater stages of the pandemic, respectively. In SectionsIV A and IV B we show the data plots for 42 selectedcountries from all over the world as well as two morelocal examples, the US states and the regions of Italy.Finally, in Section V we summarize and conclude. II. DATA
Dataset:
We illustrate our data visualizations us-ing the freely available dataset from the COVID-19 Data Repository by the Center for Systems Sci-ence and Engineering (CSSE) at Johns HopkinsUniversity (Github Webpage: https://github.com/CSSEGISandData/COVID-19 ) [18]. We here use the dailycumulative data for both cases and deaths right fromits first publication on January 22, 2020, until January21, 2021, thus covering exactly one year of data. Weselected 42 representative countries focusing on Europeand larger countries in other parts of the world. The datafor the US states were available within the same repos-itory. The data for the Italian regions can be found onthe webpage https://github.com/pcm-dpc/COVID-19 .The data from Liguria were not available and this re-gion was thus not included. The population sizes used inthe normalization were taken from the webpage (data as ofJanuary 21st, 2021).
Preprocessing:
Datasets of both the overall num-ber of cases and deaths up to a certain date can beexpected to increase monotonically with time, but notstrictly monotonously since there might be days withoutany new cases or deaths. However, occasionally some ofthe datasets do contain negative jumps from one day tothe next, typically due to elimination of double count-ing or other kinds of retrospective reevaluations such asfundamental changes in the way cases and deaths weredefined (see the COVID-19 Data Repository [18] for de-tails). To clean the data and eliminate these negativejumps we follow the reasonable assumption that laterdata are more correct than earlier data (after all that iswhat corrections are for) and each time decrease all thespuriously high early data points to the corrected latervalue. As a positive side effect this smoothing also elim-inates most of the plateaus that were created due to theaforementioned corrections, and this helps in the latercalculations of the doubling times. Finally, we apply amoving average of order 7 days in order to smooth outweekday variations such as the tow-day dips that often occur due to reporting delays on weekends [36].The cumulative absolute (not normalized by popula-tion) number of deaths and cases for the selected 42 coun-tries and the year from January 22, 2020, to January 21,2021, obtained in this way are shown in
Fig. 1a and
Fig. 1b , respectively.
III. METHODS
In the Method Section we illustrate the various data vi-sualization plots using the number of deaths as an ex-ample. Deaths tend to be more reliable [37] since theyare not affected by the number of tests performed whichitself depends on a variety of factors, not only the num-ber of either symptomatic or essential people (compare,e.g., [38, 39]) but also healthcare system capacities andpolitical decisions [29]. However, for completeness, in theResults Section IV we also show two plots based on thenumber of cases.
A. Early stages of a pandemic
In the early stages when cases and deaths are still intheir first rising phase, it is important to monitor theinitial spread of the pandemic [31, 32]: How far behindare different countries compared to the country where theepidemic started and how fast is the current spread ineach country? The most relevant quantities are the timelag with respect to the current epicenter of the pandemicand the doubling time.
1. Time Lag In Fig. 2 we show the course of the number of deaths forthree selected countries (Italy, Spain and the US) duringthe initial period of the pandemic, from the first reporteddeath in any of these three countries to the end of March.For better comparability the numbers were normalized tothe overall population of the respective country.Italy was the country in the Western hemisphere withthe earliest onset of an epidemic [40, 41] and also the firstWestern country with an officially recorded COVID-19fatality which, as the graph shows, occurred on February23, 2020, eight days earlier than in the US and elevendays earlier than Spain. In order to compare the courseof the epidemic in different countries it thus makes senseto use Italy as the reference country.The first important information to know is by howmany days your country lags behind Italy’s curve or, ata later stage, after a potential reversal of fortune, by howmany days it is ahead in its epidemic course. To this aim,we define the time lag of a country with respect to Italyas follows:
FIG. 1: The dataset: Absolute numbers of cumulative deaths (a) and cases (b) versus time for 42 selected countries and theone year interval from January 22, 2020 to January 21, 2021. This plot like all subsequent plots that show cumulative numbers,uses a log scale to facilitate the comparison also between countries at different stages of their pandemic course. The legend liststhe countries and their cumulative number of cases and deaths after one year, respectively. Source: John Hopkins COVID-19Data Repository.
FIG. 2: Definition of time lags: The monotonously increasing thick curves depict normalized number of deaths (per one millioninhabitants) versus time for three selected countries during the time interval from February 23 to March 30, 2020. Horizontallines illustrate the changing time lag of Spain (red) and the US (blue) with respect to the pandemic course of Italy (thick blackcurve), first for March 11, 2020 (label 1 and thin lines on the left) and then for the end of March (label 2 and thick lines onthe right). During these twenty days the time lag between Spain and Italy decreased from − − −
17 days to −
19 days. The time lags on March 31, together with the number ofdeaths up to that day, are reported in the legend.
1. Start with the last data point of the country andthen check when Italy’s curve crossed this value.2. Do vice versa in case Italy is behind.In Fig. 2 this is illustrated for three different countries,using the end of March (marked as thick dashed verticalline 2) as an example. By construction the time lag ofItaly to itself is always 0. Spain reached 129 deaths perone million inhabitants three days later than Italy. Ac-cordingly, the time lag is −
3. The US reached 9 deathsper one million inhabitants 19 days later than Italy. Sohere the time lag is − − − −
17 days to −
19 days (at that timethe US was falling further behind).
2. Doubling time
The second variable of importance is the doubling time,the characteristic unit for exponential growth. It is thetime it takes for the number of deaths (or cases) to dou-ble. The lower its value the faster the spread of the epi-demic. The first step in calculating this number is todetermine the percentage growth rate p ( t ) from one dayto the next: p ( t ) = d ( t ) − d ( t − d ( t − . (1)Here d(t) refers to the cumulative number of deaths untilday t. From the percentage growth the doubling time T d is calculated as: T d ( t ) = ln (2) ln [1 + p ( t )] . (2)Like the time lag the doubling time is also a time-localquantity that changes day by day. In Fig. 3 we depictthe course of the doubling times for the same three coun-tries and during the same time interval that was alreadyused in Fig. 2. In the beginning of the pandemic in each
FIG. 3: Doubling times of number of deaths for the same three countries and the same time interval used in Fig. 2. All threecurves show quite large fluctuations early but are much smoother after that. At the end of March the doubling time was longestfor Italy followed by Spain and the US. However, they were all trending towards larger values corresponding to a slowing downof the exponential growth in numbers of deaths. country the doubling time exhibits quite large fluctua-tions due to the low absolute numbers, but after a fewweeks when the numbers rise this tends to stabilize intoa more smooth course.Articles that have applied the doubling time in thecontext of the Covid19-pandemic include [42, 43].
3. Overview plots
In the next step we combine these two time-local quan-tities, time lag and doubling time, in one large overviewgraph that allows for an easy comparison of the currentstate of the epidemic in many different countries. In
Fig.4 we plot the doubling time for deaths versus the timelag with respect to Italy for ten countries (including theones from Figs. 2 and 3).The countries that are compared with Italy were se-lected as follows: Four of them (Russia, South Korea,China and Spain) were at that moment in time closestto one of the four corners in the plot. The remainingfive countries were on different stages of a rather typicalcurve of a country on this graph [44].In this plot we use the brightness of the backgroundto indicate the preferred order of the corners from Bright(B) to Dark (D): B, A, C, D. But our description begins with the typical starting point A: A – Large relative time lag, low doubling time.Basically all countries start on the lower left of the graph.There it is still early days, presumably the virus hasstarted to spread within the country not that long agoso the time lag to the epicenter is typically quite big (de-pending on how long it took for the virus to reach thecountry, the later a country enters the plot the bigger thetime lag). The initial doubling times are rather low sothe spreading advances quickly but the numbers are stillsuch that it might appear as if there is not yet that muchto worry about. However, this is actually the time wheremeasures should be taken as soon as possible in order tomake a big difference later. At the end of March 2020the closest country to this corner A was Russia whichhad just entered the plot with its first registered deaths. B – Large relative time lag, high doubling time.If a country is close to corner B it means that it hasbasically contained the virus in its earliest days and thedoubling times are so high that one can hardly speak ofan epidemic. On this day no country had really gottenthere yet, but among the countries selected here SouthKorea was the one that was slowly getting closer. FIG. 4: Doubling time versus time lag with respect to Italy for 10 different countries. Large markers indicate the position onMarch 31, 2020, the tail shows the development over the previous seven days. The four corners are marked by letters A toD and the background is shaded to indicate more or less preferable regions (from bright to dark). The legend states overallnumber of deaths until the end of March 2020 as well as time lag, doubling time and trend over the last seven days. C – Relative time lag close to zero, high doubling time.This means that for the moment the worst is over butalso that it was very bad. Eventually all countries tendto go up towards larger doubling times but of course itis much better to do it earlier rather than later. On the31st of March 2020 China was the country closest to thiscorner but since less and less new deaths were reportedit was actually moving towards corner B. D – Relative time lag close to zero, low doubling time.This is the situation to avoid at all costs (literally). Herecountries are already right in the middle of an epidemicbut the doubling times are still very low. This can be verybad because of the characteristics of unabated epidemicspread. At A it might take a few days to double thenumber of cases from 100 to 200 but at D it would takeexactly the same time to double from 10 .
000 to 20 .
000 oreven from 100 .
000 to 200 .
000 (depending on the overallstage of the epidemic). At the end of March 2020 thecountry closest to this situation was Spain (apart fromthe reference country Italy) and in fact it was right onits way of catching up with Italy.The remaining countries were at that point in timepositioned somewhere between A and C. Like Spain, theUS and the UK were moving closer to Italy. France was basically time-locked with Italy which means that itsdeath curve was following Italy’s with a constant timelag. By contrast, Germany and Canada were movingfurther away from Italy.In Fig. 4 we show the position of all the countries inthis two-dimensional plot at the end of March 2020 butto each country we also append a tail that depicts thedevelopment over the previous seven days. The directionof the movement over that week is captured in the trendwhich can be found as the very last entry in the legend: A - towards lower doubling times but larger time lagswith respect to Italy ← - no change in doubling time but towards larger timelags B - towards higher doubling times and larger time lags(best possible course) ↑ - towards higher doubling times, no change in time lag C - towards higher doubling times but shorter time lags → - no change in doubling time but towards shorter timelags D - towards lower doubling times and shorter time lags(worst possible course) ↓ - towards lower doubling times but no change in timelag • - no change in either directionOver this week, apart from Russia and Canada, thedoubling time of most of these countries had increased. FIG. 5: Number of reported new deaths per day for the ten countries from Fig. 4 and for the year from January 22, 2020 toJanuary 21, 2021. Each row represents one country and colours are normalised from 0 (white) to 1 (black) by the maximumnumber of deaths of that country over the whole interval. The last row depicts the sum over all these countries together. Localmaxima are marked by white bullet points and the ordinal number of the wave peak. On the left we indicate the absolutenumber of deaths on the last day and on the right the trend over the last week (‘+’,‘-’,‘o’ for upwards, downwards, constant,respectively). Finally, the histogram plot on the right shows for each country the number of deaths per one million inhabitantsover the whole year. In order to provide information on the relative overall impact of the epidemic in different countries wehere use this number as the criterion for the sorting of rows.
Regarding the time lag, there were three groups: for thecountries on the right (Spain, the US and the UK) ithad increased, for the countries on the left (Germany,South Korea, Canada and Russia) it had decreased andfor the two countries in the middle (UK and France) ithad remained constant. Accordingly, overall the trendswere dominated by B, ↑ and C with only two countrieson a downward trend towards A. B. Later stages of a pandemic
While the plots for the earlier stages are designed to pro-vide a comparative overview of the initial rise, in the laterstages the focus shifts to monitoring the course of thepandemic in each country in terms of peaks, valleys andplateaus in order to be able to react accordingly [34]. Thetransition between these two stages takes place aroundthe time the daily numbers of cases and deaths stop ris-ing for the first time [35].The left side of
Fig. 5 shows a 2D color plot of thenumber of new deaths per day for the same ten countriesalready depicted in Fig. 4 and for one year starting on January 22, 2020. Each row is normalized individuallyin order to provide an overview of the course of the epi-demics for each country separately. This means that foreach country the color scale ranges from zero daily deaths(white) to the maximum daily number of deaths over thewhole interval (black). As a consequence the course ofthe pandemic even in countries with numbers of differentorders of magnitude can be compared in the same plot.It also becomes immediately apparent whether a countryis already over its peak and whether there are new waves;the brighter the colors on the last day, the further awaya given country is from its peak value.We also use the color plot as a wave detector by iden-tifying for each country all the local maxima that fulfillthe following two criteria:- Minimum prominence P min The prominence P is defined as the smaller of the largestdecrease in value on both side of the local maximum be-fore encountering the next local maximum. For any givensequence of daily increases D the largest possible promi-nence is P max = max ( D ) − min ( D ).- Minimum separation S min between consecutive localmaximaThe separation S is defined in units of sample points(here days). For a given S >
0, we select the largest localmaximum and ignore all other local maximum within S units of it. This process is repeated until no more localmaxima are detected.There is no unique and unambiguous definition of awave peak, so varying these two parameter values willlead to different detections. Here we set the minimumprominence to P min = 0 . S min = 10 days. This selection eliminates all minorbumps and maintains only the large-scale peaks that ap-pear to be significant.We also added trend indicators right next to the valueof the current day that show whether the numbers fromthe last day are more than 5% higher than they were theweek before (‘+’, upward trend), whether they are within5% of that value (‘o’, plateau) or whether they are morethan 5% lower (‘-’, downward trend).The individual normalization used in the color plot fa-cilitates tracking the course of the pandemic within eachcountry and allows to infer relative time lags of peaksand valleys between different countries. However, it doesnot provide any information about the overall severityof the situation in each country. To rectify this we adda histogram (right) with the overall numbers for everycountry normalized by population size.Finally, depending on which information should bestressed, countries can be sorted in various ways:- The value of the histogram on the right hand side sortscountries by the overall severity of the situation (e.g.,deaths per one million inhabitants). This sorting wasused in Fig. 5 and will also be used in Figs. 7a and 7bfor both worldwide deaths and worldwide cases.- The value of normalized new number of deaths/cases onthe last day allows a comparison of the current state ofthe pandemic compared to the peak value of each coun-try. Which countries are currently at their absolute peakand for which countries the worst is behind? This sortingwill be used in Fig. 8 to compare the situation for all theUS states.- The occurrence of the first death/case or the positionof the peak of the first wave provides information aboutthe gradual or sudden spatio-temporal propagation of thepandemic. Where did the pandemic start and where didit arrive last? In Fig. 9 we will use the sorting based onfirst cases to trace the initial spread of the virus in theItalian regions.- The similarity of the daily new deaths/cases profiles.We use a straightforward combination of correlation co-efficient analysis and single linkage algorithm to clustercountries according to the similarity of their temporalprofiles. From the resulting hierarchical dendrogram we obtain an order that starts with the countries that aremost similar to each other and ends with those that areleast similar to any of the other countries. This sortingwill be used in Fig. 10 for the worldwide data. IV. RESULTSA. Early stages of a pandemic
In the early stages cases and deaths are still on theirfirst rise and typically there is an early epicenter whichbecomes a very useful reference to which to compare thestate of the pandemic in any given country. The mostimportant indicator of this state is the doubling time.Thus, the two relevant quantities are the time lag withrespect to that epicenter (here, Italy) and the doublingtime and in
Fig. 6 we plot one against the other.First, in Fig. 6a we look at deaths numbers for all the42 countries on April 30, 2020 which was around the timethe first worldwide peak in deaths had just passed [45].Fig. 6a is accompanied by
Supplementary Movie 1 which contains the development from the day Italy re-ported its second death up to the end of April (such thatthe final frame of the movie corresponds to this Fig. 6a).At this moment in time countries could basically bedivided into three different groups. The most severe sit-uation was found in the US which actually had alreadysurpassed Italy as the front runner of the pandemic andeven at that advanced stage had a rather low doublingtime of 20 days and was thus continuing its course to-wards higher positive time lags. The other members ofthat group were the European countries Italy, the UK,Spain and France which were all more or less phase-lockedwith Italy, i.e. the time lags remained quite constant.The second and by far largest group of countries wasshowing a trend towards increasing negative time lagsand higher doubling times, thus getting closer to cornerB (compare last entries in the legend). Typically thismeant that for those countries the initial rise was alreadyflattening considerably. This group included Iran, Ger-many and Belgium, which were all closest to Italy butstill moving away.The third and last group consisted of Taiwan, Iceland,and China which had all basically stopped reporting anynew deaths. At least for that specific moment in timethese five countries had brought the pandemic under con-trol.
Fig. 6b depicts the same kind of plot as Fig. 6a,but now for cases instead of deaths. As before,
Supple-mentary Movie 2 shows the whole history of this plotfrom the beginning of February (first reported cases inItaly) to the end of April. Overall, both the groupingsand trends for cases are very similar to the ones seen fordeaths. One notable difference is that for cases the sepa-ration between the three groups is much less pronounced.
FIG. 6: Doubling time versus time lag with respect to Italy as in Fig. 4 but now for 42 different countries and one month later,on April 30, 2020. For clarity, only the seven day tails of the eight countries that are most advanced in the pandemic (in termsof time lag) are shown, however, the seven day trends for all countries can be found in the legend. In deaths (a), at this pointin time, the US had surpassed Italy as the worldwide leader of the pandemic, with the UK, Spain and France slightly behind,while for cases (b) not only the US but also Spain had surpassed Italy, and France, the UK, and Germany followed soon after.But apart from very few exceptions, for both deaths and cases almost all doubling times were trending upwards. For the threecountries (Taiwan, Iceland, and China) for which the pandemic had basically come to a halt (daily increments of zero), thedoubling time was set to an arbitrary high value of 1000. In both plot color-coding follows the ranking with respect to deaths. B. Later stages of a pandemic
Once the initial rising phase of a pandemic has beenpassed, it becomes more important to monitor the riseand fall of deaths and cases via the characteristic peaks,valleys and plateaus in the profile of each country. Ac-cordingly, we continue with
Fig. 7a which follows Fig.5 in showing the number of reported daily new deathsfrom the beginning of the dataset up to a year later, butit does so for all 42 countries. Similar to before, Fig. 7ais the last frame of
Supplementary Movie 3 , whichtracks the development of the number of deaths over thewhole year.After this one year, the countries hit the hardest werepredominantly in Europe, but also countries like theUS, Peru and Mexico had suffered very high numbersof deaths. On the other hand, many countries in Asiaand Oceania had fared quite will. Among these coun-tries were Taiwan, South Korea, Japan as well as NewZealand and Australia.At this specific point in time there was almost equal di-vision among upwards, constant and downwards trends,but there were just a few more countries trending down-wards rather than upwards. So most countries were ei-ther plateauing on an unprecedented high level or hadjust started to slightly decrease (for confirmation just re-fer to the data for all countries together in the last row).The relatively highest peaks on that day were obtainedfor the US and Mexico, whereas a few countries (Taiwan,New Zealand, Singapore, Australia and Iceland) were re-porting no deaths at all.
Fig. 7b shows the reported number of daily casesfor the same countries and the same time interval. Theoverall situation was quite similar to the one reportedfor deaths (Fig. 7a). Also here half the countries werepeaking or close to peaking and again only a handful ofcountries (Taiwan, China, New Zealand, Australia, andIceland) reported almost no new cases at all. Fig. 7b cor-responds to the last frame of the final
SupplementaryMovie 4 .When comparing the relative positions of the countriesin the two graphs for deaths (Fig. 7a) and cases (Fig.7b) we find a rather high correlation coefficient of 0 . Fig. 8 we visualize the data of the US states(plus the country as a whole), again over the same firstyear of data availability. In this graph we sort the statesaccording to the number of deaths on the very last day (in this case January 21, 2021) normalized to the maximumvalue obtained for each state so far. This gives us a goodidea of the relative severity of the situation on that daysince it sorts states according to how close countries areto their absolute peak. The states that are currentlypeaking can be found at the top whereas the states thathave passed their peak(s) appear at the bottom.On this very day in the US mostly southern states likeSouth Carolina, Oklahoma, Kentucky, Texas, and Geor-gia were still peaking in deaths, while the state furthestaway from its own past peak was New York (followed byColorado and Nebraska). On the other hand, as the his-togram on the right shows, the highest overall numbers ofdeaths had been obtained in north-eastern states such asNew Jersey, New York, Massachusetts and Rhode Island.
Fig. 9 depicts the regions of Italy sorted by the timeof their first reported death. It started with Lombardyin the North which became the early epicenter and thengradually reached the whole country with Basilicata be-ing the last region to report a death.This plot also shows quite nicely how the severity ofthe situation in each region can vary for different waves.Marche and Lombardia, two of the regions that werehit early and hard during the first wave (in fact wereamong the first places in Europe to report Corona-relateddeaths), were relatively speaking spared during the thesecond wave of the next winter, 2020/2021. On the otherhand, regions such as Basilicata, Molise and Toscana hadmuch worse trajectories the second time around.Finally, in
Fig. 10 we return to the worldwide dataand combine a correlation coefficient analysis (Fig. 10a)with the single linkage algorithm to arrange the countriesin a hierarchical cluster tree (dendrogram, Fig. 10b) ac-cording to the similarity of their daily new deaths profiles(from Fig. 7a). Note that during this mapping from thepairwise distance matrix to a one-dimensional order im-portant information gets lost. Vicinity in the orderedlist does not correspond to a direct measure of distance.However, what hold is that countries that are most sim-ilar to each other appear on the left and countries withthe most unique profiles can be found on the right (as re-flected my the monotonous increase of the distances fromleft to right).Using the resulting order from the dendrogram we get anew wave plot (Fig. 10c). Now we have on top the coun-tries closest to each other which seem to be those witha rather weak first wave (spring 2020) but a very strongsecond wave (winter 2020/21). The next group containsthe countries with two rather strong waves. Both of thesegroups are predominantly European and North Ameri-can. They are followed by countries with a wave in sum-mer 2020. This might be due to a later arrival time of thevirus but there are also many countries from the South-ern hemisphere which also points to an explanation interms of anti-phase seasonal variations between the twohemispheres [46]. The last group include those that apartfrom a few sporadic eruptions had brought the pandemicmostly under control and ends which China with its very1
FIG. 7: Number of reported new deaths (a) and cases (b) per day for the year from January 22, 2020 to January 21, 2021 forall 42 selected countries. Layout as in Fig. 5. Countries are again sorted by the number of deaths per one million inhabitantsover the whole year (histogram on the right). At this moment in time more than half of the countries were peaking or close topeaking and typically it was either the second or the third peak. FIG. 8: Number of reported new deaths per day for all the US states during the year from January 22, 2020 to January 21,2021. Layout as in Fig. 5 but in contrast to the previous wave detection plots (Figs. 5 and 7) this one is sorted according tothe (normalized) value obtained for the very last day of the plot. This provides information about the relative severity of thesituation in each state at this point in time, while the histogram on the right still shows the overall severity up to that day.FIG. 9: Number of reported new deaths per day for the Italian regions during the same year as before. Layout as in Fig. 5with the only difference that in this wave detection plot the regions are sorted according to the time of the first reported death(from bottom to top). This allows tracing the course of the pandemic in Italy from early epicenters such as Lombardy andCampania to regions like Basilicata that were reached only much later. FIG. 10: Similarity of countries in terms of daily new deaths. (a) Pairwise Correlation coefficient matrix calculated from thetemporal profiles as depicted in Fig. 7a. (b) Hierarchical cluster tree (dendrogram) obtained from (a) via the single linkageclustering algorithm. Countries that are most similar to each other (lowest distances between them) are on the left, countriesthat are more unique (higher distances) are on the right. (c) Same data as in the wave plot of Fig. 7a but with the countriessorted according to the similarity criterion. For clarity the same sorting was already used in (a). Note that the three colors in(b) and the corresponding separating lines in (a) and (c) are just visual aids, they are not based on any strict criteria.
V. DISCUSSION
We presented visualizations of epidemiological data (suchas the number of deaths and cases) that take into accountmany countries (states, regions, etc.) at the same timeand are tailored to the specific stage of the epidemic.During the initial rising phase we focus on time lag withrespect to the current epicenter and doubling time sincethis combination can inform about the timeliness and theurgency of interventions needed to curb the spread. Incontrast, during the later stages we monitor the state ofthe pandemic by following the wavelike profile of dailynew death or case numbers with regard to peaks, valleysand plateaus. This, together with other epidemiologicalquantities such as attack rate [47] or basic reproductionnumber [48–50], can help to decide about appropriatecounter measures, i.e. whether to impose, tighten or relaxcontact restrictions in the population [51, 52].The visualizations used here are universal and can eas-ily be applied to other kinds of epidemiological data. Thespatial scale is flexible as well. While we focused on coun-tries, states and regions, it would of course also work withsmaller areas. Moreover, in the two-dimensional plots de-signed for the initial stages of an epidemic (Figs. 4 and 6)we use a country of reference, namely Italy, the early Eu-ropean epicenter of the pandemic [40, 41]. However, thisis certainly a matter of choice. Different countries couldbe chosen, e.g., in order to test different hypotheses. Oryou could select your own country and then the plot canbasically be seen from your countries’ point of view andprovide more detailed information about its relative po-sition in the pandemic. Finally, our methodology basedon time lags and doubling times was only applied to therising phase of the first wave (and indeed there it is mostuseful), but of course in cases where the individual wavesare separate enough (e.g., due to seasonal variations [46])it would also be possible to look at second, third or evenlater waves.The color-coded wave detection plots used during thelater stages can easily be modified to be sensitive to sev-eral other traits in the data. Here we used four differentsorting criterions (Figs. 7, 8, 9 and 10c) but also thenormalization of the color scale could be altered to stresscertain other aspects of the data. Similarly, one couldadapt the wave detection parameters in order to focuson specific time scales and resolutions.Note that the current study itself is not concerned withdrawing specific conclusions from the data, rather we fo-cus on efficient and informative ways of presenting thedata to then be in a better position to actually draw spe-cific conclusions. We restrict ourselves to examinationsof the past up to the present day, but there is no ex-trapolation into the future based on any kind of model,assumption or parameter selections. This way we avoid any pitfalls caused by potential deficiencies in either com-pleteness or accuracy of the data [28]. However, givenreliable data these visualizations could be used to, e.g.,correlate the data with different containment strategies[43], or to perform other more extended analyses [29, 53].We would like to close with the following appeal:Please always be aware of the tragic real-life consequencesbehind these numbers and do whatever you can to keepthem low or bring them down again.
VI. SOURCE CODES AND OUTREACH
Matlab source codes will be available at this webpage:
Since the worsening of the pandemic in Italy (March2020) regular updates based on the data analysis meth-ods described in this paper have been posted on this Face-book page:
This will continue as long as the pandemic causes signif-icant damage all over the world. Regular updates showdata for the same 42 selected countries used here. On de-mand the data from the regions/provinces of some indi-vidual countries with publicly available data (specificallythe US, Italy, Germany, Spain, the UK, and Canada) aredisplayed as well. For now this is mostly the US, the onlymajor country with continuously elevated numbers andstrong regional diversity.
VII. SUPPLEMENTARY MOVIES
Caption
Supplementary Movie 1 ( https://youtu.be/2BMZw5PHmK0 ):The first supplementary movie depicts the doubling timefor deaths versus the time lag with respect to Italy forall 42 countries from February 21, 2020 (the day Italyreported its second death) to April 30, 2020. The layoutis identical to the one used in Fig. 6a, in fact, it endswith Fig. 6, the state of the pandemic at the end ofApril 2020. Over the course of time most countries tendto slowly move towards corner B (larger time lags withrespect to Italy, higher doubling times). But there area few exceptions, notably Spain, France, the UK, andin particular the US which during April can be seen toslowly overtake Italy.Caption Supplementary Movie 2 ( https://youtu.be/vAsSQfWvJwQ ):The second supplementary movie is similar to the firstbut depicts cases instead of deaths. It runs from February52, 2020 (the day Italy reported its second case) to April30, 2020. Its last frame is identical to Fig. 6b.Caption Supplementary Movie 3 ( https://youtu.be/UC13sP7gHeU ):The third supplementary movie shows the course of thepandemic for all 42 countries over one year, from Jan-uary 22, 2020 to January 21, 2021, in terms of number ofreported new deaths per day. The layout is identical tothe one used in Fig. 7a. Countries are again sorted bythe relative number of deaths over the whole year (his-togram on the right) which makes it easier to follow theoverall impact of the epidemic for different regions. Dur-ing the initial stages China, Italy, South Korea, Iran andSpain were the epicenters of the pandemic, at the laterstages Belgium, Czechia, Peru, the UK and the US wereamong those countries that exhibited the highest relativenumbers of deaths.Caption Supplementary Movie 4 ( https://youtu.be/WCyhIEQJc7Q ):The fourth supplementary movie is similar to the thirdbut instead of deaths it depicts cases for all 42 countriesover one year. Its last frame corresponds to Fig. 7b. This movie shows even more clearly how the pandemicspread from China and its neighboring countries all overthe world.Further movies (various combinations of deaths/caseswith/without normalization and different kinds of sort-ings for the 42 selected countries, the US states, and theItalian regions can be found on the tk.corona.updatesYoutube channel: The first four movies on the channel correspond to theSupplementary Movies 1-4 of this paper.
Acknowledgments
T.K. would like to thank Alban Levy for useful discus-sions, his thorough reading of the manuscript as well ashis very early efforts in rising awareness about the up-coming pandemic. T.K. would also like to thank RalphG. Andrzejak, Don MacLeod, Kaare Bjarke Mikkelsen,and Sabine Raphael for feedback on the data analysisand Benedetta Moschitta for many inspiring discussions. [1] C. Huang, Y. Wang, X. Li, L. Ren, J. Zhao, Y. Hu,L. Zhang, G. Fan, J. Xu, X. Gu, et al., The Lancet ,497 (2020).[2] World Health Organization: Coronavirus disease 2019(COVID-19) Situation Report 11, January 31, 2020(2020).[3] N. Zhu, D. Zhang, W. Wang, X. Li, B. Yang, J. Song,X. Zhao, B. Huang, W. Shi, R. Lu, et al., New Englandjournal of medicine , 727 (2020).[4] W.-J. Guan, Z.-y. Ni, Y. Hu, W.-H. Liang, C.-q. Ou, J.-X. He, L. Liu, H. Shan, C.-L. Lei, D. S. Hui, et al., NewEngland Journal of Medicine , 1708 (2020).[5] F. Zhou, T. Yu, R. Du, G. Fan, Y. Liu, Z. Liu, J. Xiang,Y. Wang, B. Song, X. Gu, et al., The Lancet , 1054(2020).[6] World Health Organization: Coronavirus disease 2019(COVID-19) Situation Report 40, February 29, 2020(2020).[7] World Health Organization: Coronavirus disease 2019(COVID-19) Situation Report 51, March 11, 2020 (2020).[8] World Health Organization: Coronavirus disease 2019(COVID-19) Situation Report 60, March 20, 2020 (2020).[9] S. Shamasunder, S. M. Holmes, T. Goronga, H. Carrasco,E. Katz, R. Frankfurter, and S. Keshavjee, Global PublicHealth , 1083 (2020).[10] W. McKibbin and R. Fernando, Economics in the Timeof COVID-19 (2020).[11] D. Zhang, M. Hu, and Q. Ji, Finance Research Letters , 101528 (2020).[12] A. Palayew, O. Norgaard, K. Safreed-Harmon, T. H. An-dersen, L. N. Rasmussen, and J. V. Lazarus, Nature Hu- man Behaviour , 666 (2020).[13] K. R. Myers, W. Y. Tham, Y. Yin, N. Cohodes, J. G.Thursby, M. C. Thursby, P. Schiffer, J. T. Walsh, K. R.Lakhani, and D. Wang, Nature Human Behaviour , 880(2020).[14] G. Marinoni, H. Van’t Land, and T. Jensen, IAU GlobalSurvey Report (2020).[15] M. ˇSkare, D. R. Soriano, and M. Porada-Rocho´n, Techno-logical Forecasting and Social Change p. 120469 (2020).[16] B. Garcia-Garcia, M. James, D. Koller, J. Lindholm,D. Mavromati, R. Parrish, and R. Rodenberg, Int SportsLaw J , 115 (2020).[17] S. Dubey, P. Biswas, R. Ghosh, S. Chatterjee, M. J.Dubey, S. Chatterjee, D. Lahiri, and C. J. Lavie, Dia-betes & Metabolic Syndrome: Clinical Research & Re-views , 779 (2020).[18] E. Dong, H. Du, and L. Gardner, The Lancet InfectiousDiseases , 533 (2020).[19] D. Fanelli and F. Piazza, Chaos, Solitons & Fractals ,109761 (2020).[20] S. Flaxman, S. Mishra, A. Gandy, H. J. T. Unwin, T. A.Mellan, H. Coupland, C. Whittaker, H. Zhu, T. Berah,J. W. Eaton, et al., Nature , 257 (2020).[21] J. Dehning, J. Zierenberg, F. P. Spitzner, M. Wibral,J. P. Neto, M. Wilczek, and V. Priesemann, Science ,eabb9789 (2020).[22] L. E. de Sousa, P. H. de Oliveira Neto, and D. A.da Silva Filho, Physical Review E , 032133 (2020).[23] C. Sy, E. Bernardo, A. Miguel, J. L. San Juan, A. P.Mayol, P. M. Ching, A. Culaba, A. Ubando, and J. E.Mutuc, Process Integration and Optimization for Sus- tainability , 497 (2020).[24] M. J. Moon, Public Administration Review , 651(2020).[25] Z. Jia and Z. Lu, The Lancet. Infectious Diseases , 757(2020).[26] A. Vespignani, H. Tian, C. Dye, J. O. Lloyd-Smith, R. M.Eggo, M. Shrestha, S. V. Scarpino, B. Gutierrez, M. U.Kraemer, J. Wu, et al., Nature Reviews Physics , 279(2020).[27] M. K. Prakash, S. Kaushal, S. Bhattacharya, A. Chan-dran, A. Kumar, and S. Ansumali, Physical Review E , 021301 (2020).[28] D. Sridhar and M. S. Majumder, BMJ (2020).[29] S. Callaghan, Patterns (2020).[30] The Lancet, Lancet (London, England) , 1011 (2020).[31] J. Phua, L. Weng, L. Ling, M. Egi, C.-M. Lim, J. V.Divatia, B. R. Shrestha, Y. M. Arabi, J. Ng, C. D. Gom-ersall, et al., The Lancet Respiratory Medicine , 506(2020).[32] D. M. Kennedy, G. J. Zambrano, Y. Wang, and O. P.Neto, Journal of Clinical Virology , 104440 (2020).[33] U. Goldsztejn, D. Schwartzman, and A. Nehorai, PLOSONE , e0244174 (2020).[34] A. Charpentier, R. Elie, M. Lauri`ere, and V. C. Tran,Mathematical Modelling of Natural Phenomena , 57(2020).[35] K. Leung, J. T. Wu, D. Liu, and G. M. Leung, TheLancet , 1382 (2020).[36] A. Bergman, Y. Sella, P. Agre, and A. Casadevall, mSys-tems (2020).[37] N. Subbaraman, Nature, DOI: 10.1038/d41586-020-01008-1 (2020).[38] M. A. Johansson, T. M. Quandelacy, S. Kada, P. V.Prasad, M. Steele, J. T. Brooks, R. B. Slayton, M. Big-gerstaff, and J. C. Butler, JAMA network open , e2035057 (2021).[39] G. Pullano, L. Di Domenico, C. E. Sabbatini, E. Valdano,C. Turbelin, M. Debin, C. Guerrisi, C. Kengne-Kuetche,C. Souty, T. Hanslik, et al., Nature , 134 (2021).[40] M. Nacoti, A. Ciocca, A. Giupponi, P. Brambillasca,F. Lussana, M. Pisano, G. Goisis, D. Bonacina, F. Fazzi,R. Naspro, et al., NEJM Catalyst Innovations in CareDelivery (2020).[41] C. Indolfi and C. Spaccarotella, Journal of the AmericanCollege of Cardiology: Case Reports , 1414 (2020).[42] R. Nunes-Vaz, Global Biosecurity (2020).[43] M. N. Lurie, J. Silva, R. R. Yorlets, J. Tao, and P. A.Chan, The Journal of Infectious Diseases , 1601(2020).[44] World Health Organization: Coronavirus disease 2019(COVID-19) Situation Report 72, April 1, 2020 (2020).[45] World Health Organization: Coronavirus disease 2019(COVID-19) Situation Report 102, May 1, 2020 (2020).[46] C. Merow and M. C. Urban, Proceedings of the NationalAcademy of Sciences , 27456 (2020).[47] Y. Liu, R. M. Eggo, and A. J. Kucharski, The Lancet , e47 (2020).[48] S. W. Park, B. M. Bolker, D. Champredon, D. J. Earn,M. Li, J. S. Weitz, B. T. Grenfell, and J. Dushoff, Journalof the Royal Society Interface , 20200144 (2020).[49] G. Viceconte and N. Petrosillo, Infectious Disease Re-ports (2020).[50] Y. Tao, Physical Review E , 032136 (2020).[51] R. M. Anderson, H. Heesterbeek, D. Klinkenberg, andT. D. Hollingsworth, The Lancet , 931 (2020).[52] O. Valba, V. Avetisov, A. Gorsky, and S. Nechaev, Phys-ical Review E , 010401 (2020).[53] J. K. Edwards and J. Lessler, American Journal of Epi-demiology190