Trends in COVID-19 prevalence and mortality: a year in review
TTrends in COVID-19 prevalence and mortality: a year inreview
Nick James a , Max Menzies b a School of Mathematics and Statistics, University of Sydney, NSW, Australia b Yau Mathematical Sciences Center, Tsinghua University, Beijing, China
Abstract
This paper introduces new methods to study the changing dynamics of COVID-19 cases and deaths among the 50 worst-affected countries throughout 2020.First, we analyse the trajectories and turning points of rolling mortality ratesto understand at which times the disease was most lethal. We demonstratefive characteristic classes of mortality rate trajectories and determine structuralsimilarity in mortality trends over time. Next, we introduce a class of virulencematrices to study the evolution of COVID-19 cases and deaths on a global scale.Finally, we introduce three-way inconsistency analysis to determine anomalouscountries with respect to three attributes: countries’ COVID-19 cases, deathsand human development indices. We demonstrate the most anomalous countriesacross these three measures are Pakistan, the United States and the United ArabEmirates.
Keywords:
COVID-19, Time series analysis, Nonlinear dynamics, Anomalydetection
1. Introduction
Email address: [email protected] (Nick James) a r X i v : . [ phy s i c s . s o c - ph ] F e b rograms [10], and lockdowns. Due to the economic consequences of lockdowns,many countries implemented them too late [11, 12] and lifted restrictions beforecases had sufficiently reduced [13]. Such disparate responses to the virus ledto great variability in case and death counts, creating different waves of theoutbreak across many countries. Such later waves often exhibited higher caseand death counts than the first [14, 15].Close analysis of the dynamics of the disease on a country-by-country basis andover time is necessary to inform governments of the most successful strategies forreducing transmission of cases and progression to deaths. Identifying structuralsimilarities between countries’ trajectories can support conclusions that certaingovernment responses will likely result in better or worse outcomes. Moreover,identifying anomalous countries can provide insights on which responses to thepandemic were exceptionally good or poor.We implement new and existing time series analysis methods to analyseCOVID-19 prevalence and mortality trends, both on a country-by-country basisand globally, with the identification of similarity and anomalies as our focus.Time series analysis has been applied broadly in epidemiology, including to studiesof Ebola [16, 17], the Zika virus [18, 19], and more recently, COVID-19 itself[20, 21, 22, 23, 24]. Within the nonlinear dynamics community, methods suchas networks [25, 26], distance analysis [15, 27, 28, 29, 30] and power-law models[23, 20, 31] have been used to model physical phenomena in other contexts.This paper is structured as follows. In Section 2, we analyse the trajectoriesof mortality rates on a country-by-country basis. In particular, we build upon arecently introduced algorithmic framework to identify the turning points of themortality trajectories, which reveal when the disease was most and least lethal(with respect to the progression from cases to deaths). Then, we suitably modifyexisting semi-metrics to assign countries into classes of mortality trajectories. InSection 3, we analyse the eigenspectra of a new class of virulence matrices tostudy trends in the worldwide prevalence and mortality of COVID-19. In Section4, we compare countries’ case and death counts with their human developmentindex (HDI) and use a new method to identify the most anomalous countriesbetween these attributes. We believe that this is the first paper to quantitativelycombine these features and identify respective anomalies. In Section 5, wesummarise our findings regarding COVID-19 trends throughout the year 2020.
2. Mortality rate analysis
In this section, we study the dynamics of the COVID-19 mortality rateamong n = 50 countries. Our data spans 01/01/2020 to 31/12/2020, a period of T = 366 days. We choose the countries with the 50 greatest total case counts ofCOVID-19 as of 31/12/2020, order these by alphabetical order, and index them i = 1 , ..., n . Let x i ( t ) , y i ( t ) ∈ R be the multivariate time series of new daily casesand deaths, respectively, for i = 1 , ..., n and t = 1 , ..., T . For a given country, let2 i ( t ) be its 30-day rolling mortality rate, defined by r i ( t ) = (cid:80) ts = t − y i ( s ) (cid:80) ts = t − x i ( s ) , t = 30 , ..., T, (1)or zero if no cases have been observed over the last 30 days. This gives amultivariate time series r i ( t ) , for i = 1 , ..., n and t = 30 , ..., T . The data point attime t describes the rolling mortality rate over the prior 30 days. The aim of this section is to study these mortality trends on a country-by-country basis and identify structural similarity across different countries. For thispurpose, we use two (semi)-metrics between the mortality rate time series andapply hierarchical clustering [32, 33] to these measures. Hierarchical clusteringhas been used in several epidemiological applications, including inflammatorydiseases [34], airborne diseases [35], Alzheimer’s disease [36], Ebola [37], SARS[38], and COVID-19 [21].These mortality rates r i ( t ) exhibit highly undulating behaviour, movingbetween clear peaks and troughs (turning points). Our first semi-metric measuresdistance between algorithmically-identified turning points as a proxy for eachtime series’ behaviour. We modify an existing algorithmic framework for thispurpose. First, we apply a Savitzky-Golay filter to produce a collection ofsmoothed time series ˆ r i ( t ) , t = 30 , ..., T and i = 1 , ..., n . Next, we follow [15]and apply a two-step algorithm where we select and then refine a set of turningpoints. We assign each smoothed mortality rate time series a non-empty set P i and T i of local maxima (peaks) and local minima (troughs). To better suitour specific application, we modify the second step of this algorithm, in whichthe turning point list is refined. Full details are included in Appendix A. Wedisplay 12 countries’ mortality rate time series and annotate their turning pointsin Figure 1.To quantify distance between time series’ turning points, we modify the semi-metric of [39] (with p = 1 ). Given two non-empty finite sets S , S ⊂ { , , ..., T } ,this is defined as D ( S , S ) = 12 T (cid:18) (cid:80) b ∈ S d ( b, S ) S + (cid:80) a ∈ S d ( a, S ) S (cid:19) , (2)where d ( b, S ) is the minimal distance from b ∈ S to the set S , and S is thecardinality of S , and analogously for S . By the choice of normalisation, thisis always bounded between 0 and 1. To more appropriately separate differentbehaviours among mortality trends, we modify this semi-metric by including aregularisation term as follows: D (cid:48) ( S , S ) = D ( S , S ) + β | S − S | , (3)where < β < is a constant. The resulting values D (cid:48) ( S , S ) are symmetric,non-negative, and zero if and only if S = S . Then, we define the n × n matrix3 T P between turning point sets by D T Pij = D (cid:48) ( P i , P j ) + D (cid:48) ( T i , T j ) . (4)We perform hierarchical clustering on D T P in Figure 2, with β = 1 / (as acompromise between the usual sizes of D ( S , S ) and | S − S | ). Thesedistances do not capture the absolute values of the mortality rate time series;they only distinguish between their undulating behaviour, reflected in their setsof turning points. To round out our analysis, we include another metric, an L norm that does account for difference in the absolute values of mortality. Wedefine another matrix by D ij = T (cid:88) t =30 | r i ( t ) − r j ( t ) | , (5)and perform hierarchical clustering on D in Figure 3. In Figure 1, we display rolling mortality rate and turning points for 12countries: Brazil, India, Mexico, the United States (US), the Netherlands,Sweden, France, Germany, Italy, Russia, Ecuador and Bulgaria. These countriesdisplay highly heterogeneous behaviours, which are suitably captured in Figure 2.This dendrogram reveals four clusters of similarity, and one outlier. Russia (1j)is the unique country with just two detected turning points. Several developingcountries such as Brazil (1a), India (1b) and Mexico (1c) as well as developedcountries including the US (1d), the Netherlands (1e) and Sweden (1f) havethree turning points. France (1g), Germany (1h) and Italy (1i) have four turningpoints. Ecuador (1k) and others have five, while Bulgaria (1l) and others havesix.Within the 4-turning point cluster, we see a dense subcluster of similarity con-taining Austria, Belgium, Canada, Czechia, France, Georgia, Germany, Hungary,Italy, Poland, Portugal, Switzerland and the United Kingdom (UK). All thesecountries experienced a peak in the mortality rate in April or May (correspondingto the previous 30 days) and a local minimum near the beginning of September(corresponding to the previous 30 days during August). This similarity can beseen by examining members of this cluster, France (1g), Germany (1h) and Italy(1i).Turning to Figure 3, several other insights concerning the mortality ratetrajectories emerge. First, Mexico and Ecuador are identified as outliers in thecollection of countries, with only slight similarity to each other. For Mexico (1c),this is due to a consistently high mortality rate over time, over 10% for mostof the period. Ecuador (1k) is an outlier due to peaks in mortality over 30%,higher than any other country. Belgium, France, Hungary, Spain, and the UKform their own smaller cluster characterised by high mortality rates (of around20%) in their first wave of COVID-19. Indeed, these countries experienced highermortality in March-April than anywhere else in the world.4 a) (b) (c)(d) (e) (f)(g) (h) (i)(j) (k) (l)
Figure 1: Smoothed mortality rate time series and identified turning points for various countries:(a) Brazil (b) India (c) Mexico (d) the US (e) the Netherlands (f) Sweden (g) France (h)Germany (i) Italy (j) Russia (k) Ecuador (l) Bulgaria. Green and red vertical lines denotealgorithmically detected troughs and peaks, respectively. The rolling mortality rate at a giventime calculates the mortality over the previous 30 days. The aforementioned countries representat least one member of every characteristic class of trajectories. igure 2: Hierarchical clustering on the turning point distance matrix D TP , defined in Section2. This groups countries according to their similarity in undulating behaviour, measured bydistances between turning point sets. Five characteristic classes are observed: Russia has twoturning points; Brazil, India and the US have three; most European countries have four, witha strong subcluster of similarity observed including Austria, Belgium, and others. Two smallerclasses are observed containing five and six turning points, respectively.Figure 3: Hierarchical clustering on the L distance matrix D , defined in Section 2. Mexicoand Ecuador emerge as outliers, characterised by a consistently high mortality rate over thefull period and the highest peaks in mortality of all, respectively. Belgium, France, Hungary,Spain and the UK are revealed as a secondary cluster, characterised by high mortality in Apriland May, rapidly decreasing from then. . Virulence matrix analysis In this section, we develop a new framework of time-varying analysis of30-day rolling virulence matrices , inspired by, but differing from, covariancematrices in finance [40]. Let t = 30 , ..., T be a particular time. We formvectors x i ( t ) = ( x i ( t − , ..., x i ( t )) , analogously for y i (t). These two vectorsrecord the case and death counts over the past 30 days. We may also form r i ( t ) = ( r i ( t − , ..., r i ( t )) for t = 59 , ..., T , as the time series r i ( t ) only beginat t = 30 . Define (unscaled) inner products by (cid:104) x i ( t ) , x j ( t ) (cid:105) = t (cid:88) s = t − x i ( s ) x j ( s ) . (6)We then define n × n (unscaled) virulence matrices with respect to cases, deathsand mortality rates by the following ( i, j = 1 , ...n ) : V cij ( t ) = (cid:104) x i ( t ) , x j ( t ) (cid:105) , t = 30 , ..., T ; (7) V dij ( t ) = (cid:104) y i ( t ) , y j ( t ) (cid:105) , t = 30 , ..., T ; (8) V rij ( t ) = (cid:104) r i ( t ) , r j ( t ) (cid:105) , t = 59 , ..., T. (9)We could also analogously define normalised virulence matrices by using nor-malised inner products in place of the unscaled inner products above. Thesematrices are thus named because they provide a representation of the globalspread of COVID-19 over the last 30 days and contain relationships betweendifferent countries’ trajectories. The use of a standard covariance matrix herewould not appropriately measure this prevalence: a country with a constant(but severe) number of cases for the past 30 days would yield a zero covariancewith any other country. Each matrix V ( t ) is a n × n symmetric real matrix,and thus is diagonalisable with all real eigenvalues. By the positivity of theinner product, each matrix satisfies a non-negativity condition u T V u ≥ for u ∈ R n , and so all eigenvalues are non-negative. We list and order the eigenvalues λ ≥ λ ≥ ... ≥ λ n ≥ . This produces a time-varying eigenspectrum, whichwe display in Figure 4 for the first ten eigenvalues. Moreover, for any suchsymmetric matrix, the greatest eigenvalue λ holds particular significance. Bythe spectral theorem, λ coincides with the operator norm of the matrix [41], ameasure of its total size. That is, λ = || V || op = max u ∈ R n −{ } || V u |||| u || . (10)Subsequent eigenvalues also have a real-world interpretation. λ = 0 if andonly if the matrix V is rank 1, which occurs if and only if all trajectories x i (in the instance of the cases matrix) differ by a multiplicative constant. Ingeneral, a small value of λ relative to λ indicates substantial homogeneity inthe trajectories.In Figures 4a, 4b and 4c, respectively, we display the time-varying eigenspectra7or the virulence matrices associated to cases, deaths and mortality rates. Thereare several interesting properties of these time-varying eigenspectra. The firsteigenvalue λ of Figure 4a demonstrates the general increase of new COVID-19cases over the course of 2020. The sharp spike towards the end of the yeardemonstrates the rapid growth in cases in the final months of 2020. Figure 4bhas two prominent peaks in its first eigenvalue, corresponding to the periods ofMarch-April and November-December. These peaks highlight the natural historyof COVID-19, where many countries suffered significant deaths during theirfirst wave of the virus, enforced harsh restrictions resulting in fewer cases anddeaths, and subsequently experienced further growth in cases and deaths uponsuch restrictions’ easing. Finally, the first eigenvalue in Figure 4c highlights aninteresting trend in the mortality rate. There is a marked spike in March-April,followed by a significant reduction throughout the remainder of 2020. This shapein the first eigenvalue likely represents vulnerable people dying earlier and/orunder-reporting of cases early in the year, contributing to a higher calculatedmortality rate from reported cases and deaths.The relationship between the first eigenvalue and subsequent eigenvaluesis also of interest. Figure 4a shows the second eigenvalue λ becoming quitesignificant for cases towards the end of 2020, when the total number of cases islarger than ever. This shows that the behaviour of new cases in late 2020 is moreheterogeneous than the first wave, when all cases were rising quite uniformlythroughout the world. Figures 4b and 4c show a more moderate, but similarphenomenon concerning deaths and mortality rate at various stages of the year.The second eigenvalue in Figure 4b is slightly more pronounced in the secondwave of the virus, displaying more heterogeneity in COVID-19 deaths later inthe year. The second eigenvalue in Figure 4c is more pronounced during thefirst wave of the virus - highlighting more heterogeneity during the first waveof the virus with respect to mortality. Indeed, Figure 1 shows that Europeancountries experienced substantial mortality in their first wave of COVID-19,which characterised them as anomalous in Figure 3. This contributed to ameaningful heterogeneity of mortality rates across the world during the earlystages of the year.
4. Inconsistency analysis
In this section, we describe how we measure the consistency between threeattributes, and reveal anomalous countries in the process. To do so, we introducea new method of comparing three distance matrices and apply this to distancesbetween case and death time series, and human development indices (HDI). Let h i be the HDI of each country. Calculated by the United Nations DevelopmentProgramme [42], this index combines a country’s life expectancy, educationalstandards and economic standard of living. Bounded between 0 and 1, the HDI h i reflects a substantially lower living standard the further h moves from 1. Toreflect this, we use a logarithmic distance between these indices that penalises8 a) (b)(c) Figure 4: Time-varying eigenspectra (first ten eigenvalues) for the virulence matrices associatedto (a) cases (b) deaths (c) mortality rate. The first eigenvalue demonstrates broad trends inthe total size of the matrices, and shows (a) a large increase of cases towards the end of 2020,(b) two or three waves of significant deaths, (c) the highest mortality early on in the year. Thesecond eigenvalue reveals more heterogeneity in case trajectories towards the end of the year,and mortality towards the beginning of the year. D hij = | log h i − log h j | , i, j = 1 , ..., n. (11)This forms a n × n distance matrix between countries’ development indices.Given the exponential nature of the spread of the virus, we also use a logarithmicdistance between the case and death time series. Some of these time series havenegative counts due to retrospective adjustments in the data. In order to ensurenon-negative counts, we first apply a Savitzky-Golay filter to produce smoothedcase and death time series ˆ x i ( t ) and ˆ y i ( t ) respectively. Due to its moving averageand polynomial smoothing, this eliminates almost all negatives, except whenthere are very few counts. We replace any non-positive count with a 1. Then,we may calculate a logarithmic L distance as follows: D cij = || log ˆ x i − log ˆ x j || = T (cid:88) t =1 | log ˆ x i ( t ) − log ˆ x j ( t ) | ; (12) D dij = || log ˆ y i − log ˆ y j || = T (cid:88) t =1 | log ˆ y i ( t ) − log ˆ y j ( t ) | . (13)We use such a metric between case or death time series rather than a simpledifference between the total yearly counts to distinguish between countries (andhence reveal potential anomalies) according to when the cases or deaths occurred.Thus, we have defined three n × n distance matrices between countries. Given a n × n distance matrix D , its corresponding affinity matrix is defined as A ij = 1 − D ij max { D } , i, j = 1 , ..., n. (14)All elements of these affinity matrices lie in [0 , , so it is appropriate to comparethem directly by taking their difference. Given a n × n matrix C , let | C | be thematrix given by taking the absolute value of all elements, that is | C | ij = | C ij | .Then, define three n × n symmetric pairwise inconsistency matrices:INC c,d = | A c − A d | ; (15)INC c,h = | A c − A h | ; (16)INC d,h = | A d − A h | ; (17)and a total inconsistency matrixINC c,d,h = INC c,d + INC c,h + INC d,h . (18)10ext, we can define pairwise anomaly scores by a c,di = n (cid:88) j =1 INC c,dij ; (19) a c,hi = n (cid:88) j =1 INC c,hij ; (20) a d,hi = n (cid:88) j =1 INC d,hij . (21)For each country, we record an anomaly vector a i = ( a c,di , a c,hi , a d,hi ) anda total anomaly score given by a c,d,hi = a c,di + a c,hi + a d,hi . We can also definea weighted anomaly score to reduce bias in one set of anomaly scores beingsystematically larger than another. Let M c,d = max i { a c,di } , analogously for M c,h and M d,h . Let the weighted anomaly score be ˜ a c,d,hi = a c,di /M c,d + a c,hi /M c,h + a d,hi /M d,h . This aims to record a neutral contribution from each anomaly score.In Tables 1 and 2, we record the anomaly vectors, total anomaly score andweighted anomaly score for all 50 countries under consideration. In Figure 5, weplot the total consistency matrix INC c,d,h , where anomalous countries can easilybe seen due to larger entries in their respective rows and columns. An analogousweighted consistency matrix can also be defined, which is broadly similar to theone shown.The total inconsistency matrix and all computed anomaly scores yield severalinsights. First, the three most anomalous countries with respect to the weightedanomaly score are Pakistan, the US and the United Arab Emirates (UAE). Anear-identical result applies if we use the unscaled total anomaly score, withPakistan, the US, Nepal and then the UAE exhibiting the largest unscaledscores. For the US and Pakistan, the highest contribution to the total orweighted anomaly score comes from their high pairwise anomaly scores a c,h and a d,h , which are the two highest of any country. Interestingly, these high scoreshave differing explanations. The US is highly inconsistent between cases (andanalogously deaths) and HDI due to its much higher case and death countsthan other countries of similar HDI. Pakistan is classified as inconsistent dueto an extreme HDI, the lowest of any country under consideration, but a caseand death time series that are similar to many others. Thus, due to a lowerHDI than other countries with similar case and death counts, it is registered asinconsistent. We remark that high anomaly scores do not necessarily indicate astraight-forward anomalous quotient between cases or HDI, for example. Instead,a high anomaly score reflects inconsistency in relationships with other countries.On the other hand, the UAE has a high weighted and total anomaly scoredue to its value of a c,d , which is the highest of any country. Indeed, theUAE experienced the lowest mortality rate across 2020 of any country underconsideration. The country with the second-highest value of a c,d is Mexico. Thisis anomalous for the opposite reason: a consistently high progression from cases11 igure 5: Total anomaly matrix INC c,d,h , as defined in Section 4. Lighter entries indicatehigher values of the matrix, and hence more inconsistency between the attributes underconsideration: cases, deaths and HDI. The US and Pakistan can be seen to have substantialinconsistency with many other countries. to deaths, as first noted in Figure 1c.
5. Discussion and conclusion
In this paper, we analyse the natural history of COVID-19 across 50 countriesover 2020. We observe significant structural similarity between certain countriesas well as heterogeneity across the world with respect to COVID-19 prevalenceand mortality, and identify anomalous countries therein.In Section 2, we analyse mortality rate trajectories for 50 countries. Bymodifying a recently introduced turning point algorithm and introducing anew semi-metric between turning point sets, we assign these time series intofive characteristic classes according to their differing trajectories. Russia isidentified as an outlier - its mortality rate rose consistently until July and neverdropped substantially enough to register a subsequent trough in our algorithmicframework. It is unique in this sense among the 50 countries, possessing aconsistently stable mortality rate after its first peak. 19 countries exhibit threeturning points, including Brazil, India and the US, indicating a substantial12ountry anomaly scores relative to cases, deaths and HDI (1)Country a c,d a c,h a d,h a c,d,h ˜ a c,d,h Argentina 3.23 8.96 10.59 22.78 1.98Austria 3.43 8.73 10.45 22.61 1.20Azerbaijan 3.87 7.47 9.72 21.06 1.15Bangladesh 2.47 14.69 14.93 32.10 1.58Belarus 4.41 7.86 9.91 22.19 1.23Belgium 3.88 8.27 8.52 20.67 1.13Brazil 4.04 12.29 15.31 31.64 1.64Bulgaria 2.26 8.84 8.31 19.41 0.99Canada 2.87 9.15 9.21 21.23 1.11Chile 3.00 9.18 10.85 23.04 1.20Colombia 3.55 6.28 6.48 16.30 0.92Croatia 2.91 10.53 11.04 24.48 1.26Czechia 4.06 8.23 9.86 22.15 1.21Ecuador 3.75 7.52 6.96 18.23 1.02France 2.98 9.77 10.70 23.45 1.21Georgia 4.86 13.33 11.92 30.11 1.61Germany 3.33 9.83 8.36 20.62 1.10Hungary 3.52 10.74 8.94 23.21 1.23India 2.75 9.67 9.72 22.14 1.14Indonesia 3.72 8.37 7.47 19.56 1.07Iran 5.25 8.77 11.68 25.69 1.43Iraq 3.42 7.35 5.88 16.65 0.93Israel 4.98 9.74 11.15 25.86 1.43Italy 4.03 9.27 11.67 24.96 1.34Japan 2.79 9.75 10.34 22.88 1.18Jordan 4.66 11.06 10.94 26.66 1.44Mexico 9.73 6.82 13.06 29.60 1.85Morocco 3.97 10.34 9.67 24.00 1.29Nepal 4.53 16.84 17.78 39.15 2.00Netherlands 3.97 8.41 8.70 21.08 1.16Pakistan 2.43 22.73 21.65 46.80 2.24Panama 3.34 7.17 7.92 18.42 1.01Peru 3.06 6.48 6.98 16.52 0.90Philippines 2.51 7.49 7.29 17.29 0.91Poland 2.68 7.69 8.58 18.94 0.99Portugal 3.11 7.42 8.10 18.63 1.00Romania 3.18 7.24 7.68 18.10 0.98
Table 1: Anomaly vectors, total anomaly scores and weighted anomaly scores, as defined inSection 4, for the first 37 countries under consideration. Pairwise anomaly scores quantifythe inconsistency in measurements between two quantities, while the total and weightedanomaly scores incorporate all three attributes. The weighted anomaly score is chosen to moreappropriately weight the contributions from the three pairwise scores. a c,d a c,h a d,h a c,d,h ˜ a c,d,h Russia 2.62 10.62 10.75 23.99 1.21Saudi Arabia 3.10 10.77 10.54 24.41 1.26Serbia 3.57 7.98 9.80 21.36 1.15Slovakia 3.34 12.86 13.13 29.33 1.50South Africa 3.16 7.01 5.72 15.89 0.88Spain 4.05 9.82 11.27 25.14 1.35Sweden 4.23 9.57 9.36 23.16 1.26Switzerland 3.14 8.52 9.73 21.38 1.13Turkey 2.50 7.81 7.78 18.08 0.95Ukraine 2.78 6.86 6.45 16.08 0.87UAE 10.29 8.78 13.56 32.63 2.01UK 3.78 9.87 10.80 24.44 1.30US 3.18 18.46 19.81 41.45 2.04
Table 2: Anomaly vectors, total anomaly scores and weighted anomaly scores, as defined inSection 4, for the remaining 13 countries under consideration. Pairwise anomaly scores quantifythe inconsistency in measurements between two quantities, while the total and weightedanomaly scores incorporate all three attributes. The weighted anomaly score is chosen to moreappropriately weight the contributions from the three pairwise scores. reduction in mortality from a first peak. 21 countries exhibit four turningpoints, indicating a second wave in which mortality has increased once again.In particular, a strong subcluster contains most Western European countries:Austria, Belgium, Czechia, France (1g), Germany (1h), Hungary, Italy (1i),Portugal, Switzerland, and the UK. These all share highly similar mortalitytrajectories, with a first peak in April-May, a trough around September, andanother peak at the end of the year.There are three wealthy western European countries that do not fit intothis cluster. Both the Netherlands and Sweden, displayed in Figures 1e and 1frespectively, do not register a second peak in mortality. Indeed, these countriesboth kept their mortality low towards the end of the year, while France, Germanyand Italy experienced an increase. Prior research has noted that the Netherlandsreduced its mortality rate substantially in its second wave of COVID-19 [43],while Sweden changed its COVID-19 response substantially relative to the firsthalf of the year [44]. Spain registers six turning points primarily due to highlyirregular reporting, featuring negative counts and large numbers of cases anddeaths consolidated and reported on single sporadic days.A smaller number of countries exhibited more turning points: five with 5turning points and four with 6. We observe that the majority of developedcountries exhibit 3 or 4 turning points, as visible in Figure 2, while the outliercountries (with 2,5 or 6 turning points) were mostly developing countries. Thisreflects more regular (and less undulating) behaviour in the mortality ratetrajectories and has two explanations. First, more developed countries may haveimplemented more consistent testing, which could have caused less fluctuations14n the reported mortality rate. Secondly, more developed countries may havemore healthcare resources to improve their treatment of COVID-19 and therebyreduce and stabilise the mortality rate over time.In Section 3, we introduce a new class of virulence matrices for cases, deathsand mortality rates and analyse their eigenspectra. The first eigenvalue λ provides a measure of the total scale of the matrices and summarises worldwidetrends in prevalence and mortality throughout 2020. Figure 4a reflects a sub-stantial surge in cases towards the end of the year, Figure 4b shows multiplewaves of deaths of comparable magnitude, while Figure 4c shows an early peakthat dominates the rest of the period. The second eigenvalue λ provides ameasure of the heterogeneity among the studied time series. Figure 4a exhibitsa considerable rise in heterogeneity towards the end of the year, during a time inwhich new cases trajectories across different countries were substantial but quitenon-uniform. In Figure 4b, we see a much greater value of λ during the secondwave of deaths, in which λ is in fact lower than the first wave. The much milderdrop off between λ and λ indicates the greatest heterogeneity with respect todeaths during this period in the middle of the year. Figure 4c similarly revealssubstantial heterogeneity in mortality rates during the earlier part of the year.When viewed in conjunction, these three figures provide several insightsinto the natural history of the disease throughout 2020. Case counts generallyincreased in global severity throughout the year, while death counts constituteda much clearer pattern of multiple waves. The mortality rate trajectory (4c) canexplain this - in March and April, the progression from reported cases to deathswas much more severe throughout Europe, causing substantial deaths despitefewer cases than late 2020. During the middle of the year, the heterogeneityin death counts was at its highest. Indeed, the months of June to Augustfeatured relatively few new cases in Europe [45], while Brazil [46], India andother developing countries experienced substantial growth in cases [47]. Towardsthe end of the year, the pandemic once again impacted the entire world, withmore counts observed than ever before. During this time, mortality was low,but cases were so high that deaths became the highest they have ever been.Heterogeneity in case trajectories also increased substantially, with COVID-19trajectories differing substantially between different countries, many increasing,some decreasing, but most with high total counts. One could more closelyexamine heterogeneity by considering normalised virulence matrices obtainedfrom normalised inner products, as explained in Section 3.In Section 4, we study the consistency between cases, deaths and HDI forall 50 countries under consideration. We believe that this is the first methodproposed to study (in)consistencies among a collection of time series for upto three measures. We propose two measures of anomaly across these threequantities: a total and weighted anomaly score (that more appropriately combinesthe contributions of the three pairwise anomaly components). The three mostanomalous countries with respect to the weighted score are Pakistan, the US andthe UAE. Closer inspection of the pairwise anomaly components in Tables 1 and2 can reveal which quantities most contribute to a country’s total or weightedscore. For the UAE, this is the high anomaly score between cases and deaths,15aused by the lowest progression from cases to deaths among our collection ofcountries. For the US, both anomaly scores a c,h and a d,h contribute highly;these reflect the fact that the US has considerably more cases and deaths thanother countries of similar HDI. For Pakistan, the same two anomaly scores a c,h and a d,h are the largest of any country, but for the opposite reason: its HDI issubstantially lower than any country with a similar case and death time series.The full collection of anomaly scores can also reveal broad trends regardingconsistency between the three measures. In Tables 1 and 2, we see that thetwo pairwise anomaly scores relative to HDI are systematically greater than thepairwise score between case and death counts. Indeed, we have a c,di < a d,hi forevery single country and a c,di < a c,hi for every country except Mexico (which hasthe second-highest case-death anomaly score after the UAE due to its consistentlyand anomalously high mortality). These patterns reveal systematically moreconsistency between case and death counts than between case or death countsand HDI. Qualitatively, this reveals there is little relationship between a country’sHDI and its case or death counts. In addition, a closer examination reveals that a c,hi < a d,hi for 34 out of the 50 countries, 2/3 of the collection. Thus, to a lesserextent, there is greater consistency between case counts and HDI than there isbetween death counts and HDI. This is a surprising finding - one would naivelyexpect more consistency between a lower HDI and higher deaths due to poorerhealthcare quality resulting in a greater progression of cases to deaths, regardlessof the number of cases.Several limitations and opportunities for future research exist in this incon-sistency framework. First, the results could also be repeated for case and deathtime series as a proportion of each country’s population. Alternative metricsbetween cases and deaths could be used, such as a simple difference between thetotal yearly counts, without the temporal component provided by the L metric.A closer analysis of the relationship between the varying sizes of the anomalyscores could quantitatively characterise the differing consistency between threequantities as a whole. One limitation in this analysis framework is that anomaliesare measured purely by their relative deviation from the rest of the collection,and direction (positive or negative) is ignored. A closer inspection is necessaryto determine the nature of the anomaly. However, this could be seen as a benefitof the methodology as well, as it is flexible in the detection of different sorts ofinconsistent behaviour.Overall, this paper introduces new methods for analysing COVID-19 preva-lence and mortality on a country-by-country and worldwide basis and chroniclesthe natural history of COVID-19 during 2020. On a global scale, we revealbroad trends in case and death counts as well as mortality trajectories, whichpresent a coherent picture of the changing impacts of COVID-19 over time.On a country-by-country basis, we reveal both heterogeneity and structuralsimilarity with respect to mortality over time and study consistency betweenCOVID-19 prevalence and human development, revealing specific anomalouscountries. Moreover, the framework presented in this paper could be appliedbroadly to various epidemiological or economic crises.16s 2021 begins, the world remains severely affected by COVID-19. Thoughvaccination distribution is underway in many countries, the analysis of trendsin cases, deaths and mortality remains of substantial relevance to governments.The identification of structural similarity in mortality rate trajectories betweenEuropean states may inspire additional cooperation [7] and coordination of theirstrategic response to the pandemic. Our methods highlight countries that haveresponded particularly well or poorly, and our analysis highlights points in timewhere cases, deaths and mortality rates changed substantially for candidatecountries. Finally, we reveal global changes in the relationship between cases,deaths and mortality rates over time. Such changes should inform governmentsregarding their response to the pandemic. This will be particularly crucial inthe coming months, as various vaccines are administered over the world. Data availability
Daily COVID-19 case and death counts and human development index datacan be accessed at "Our World in Data" [48].
Appendix A. Turning point methodology
In this section, we provide more details for the identification of turningpoints of a mortality rate time series r ( t ) . First, some smoothing is necessarydue to irregularities in the data set, and discrepancies between different datasources. The Savitzy-Golay filter ameliorates these issues by combining polyno-mial smoothing with a moving average computation, and yields a smoothed timeseries ˆ r ( t ) ∈ R ≥ . Subsequently, we perform a two-step process to select andthen refine a non-empty set P of local maxima (peaks) and T of local minima(troughs).Following [15], we apply a two-step algorithm to the smoothed time series ˆ r ( t ) . The first step produces an alternating sequence of troughs and peaks. Thesecond step refines this sequence according to chosen conditions and parameters.The most important conditions to identify a peak or trough, respectively, in thefirst step, are the following: ˆ r ( t ) = max { ˆ r ( t ) : max(1 , t − l ) ≤ t ≤ min( t + l, T ) } , (A.1) ˆ r ( t ) = min { ˆ r ( t ) : max(1 , t − l ) ≤ t ≤ min( t + l, T ) } , (A.2)where l is a parameter to be chosen. Following [15], we select l = 17 , whichaccounts for the 14-day incubation period of the virus [49] and less testing onweekends. Defining peaks and troughs according to this definition alone hasseveral flaws, such as the potential for two consecutive peaks.Instead, we implement an inductive procedure to choose an alternatingsequence of peaks and troughs. Suppose t is the last determined peak. Wesearch in the period t > t for the first of two cases: if we find a time t > t that satisfies (A.2) as well as a non-triviality condition ˆ r ( t ) < ˆ r ( t ) , we add17 to the set of troughs and proceed from there. If we find a time t > t thatsatisfies (A.1) and ˆ r ( t ) ≥ ˆ r ( t ) , we ignore this lower peak as redundant; if wefind a time t > t that satisfies (A.1) and ˆ r ( t ) > ˆ r ( t ) , we remove the peak t ,replace it with t and continue from t . A similar process applies from a troughat t .At this point, the time series is assigned an alternating sequence of troughsand peaks. However, some turning points are immaterial and should be excluded.The second step is a flexible approach introduced in [15] for this purpose. Inthis paper, we introduce new conditions within this framework. First, we usethe same peak ratio procedure: let t < t be two peaks, necessarily separatedby a trough. We select a parameter δ = 0 . , and if the peak ratio , defined as ˆ r ( t )ˆ r ( t ) < δ , we remove the peak t . If two consecutive troughs t , t remain, weremove t if ˆ r ( t ) > ˆ r ( t ) , otherwise remove t . That is, if the second peak hassize less than δ of the first peak, we remove it.Finally, let t , t be adjacent turning points (one a trough, one a peak). Wechoose a parameter (cid:15) = log(2) ; if | log ˆ r ( t ) − log ˆ r ( t ) | < (cid:15), (A.3)that is, the values of the turning point differ by less than a factor of 2, weremove t from our sets of peaks and troughs. If t is not the final turning point,we also remove t . This is a different condition from previous work - whereas[15] considers the average change with time between turning points of new casetrajectories, we consider only the absolute change between turning points inmortality rate. Indeed, there is no need to consider how much time has passedwhen determining whether mortality has increased or decreased by a sufficientamount, in our implementation a factor of 2, to warrant a turning point beingincluded. References [1] M. Wang, et al., Remdesivir and chloroquine effectively inhibit the recentlyemerged novel coronavirus (2019-nCoV) in vitro, Cell Research 30 (2020)269–271. doi: .[2] E. M. Bloch, Convalescent plasma to treat COVID-19, Blood 136 (2020)654–655. doi: .[3] X. Xu, et al., Effective treatment of severe COVID-19 patients withtocilizumab, Proceedings of the National Academy of Sciences 117 (2020)10970–10975. doi: .[4] B. Cao, et al., A trial of Lopinavir-Ritonavir in adults hospitalized withsevere Covid-19, New England Journal of Medicine 382 (2020) 1787–1799.doi: . 185] F. P. Polack, et al., Safety and efficacy of the BNT162b2 mRNA Covid-19vaccine, New England Journal of Medicine 383 (2020) 2603–2615. doi: .[6] E. E. Walsh, et al., Safety and immunogenicity of two RNA-based Covid-19vaccine candidates, New England Journal of Medicine 383 (2020) 2439–2450.doi: .[7] V. Priesemann, et al., Calling for pan-European commitment for rapidand sustained reduction in SARS-CoV-2 infections, The Lancet 397 (2021)92–93. doi: .[8] S. Momtazmanesh, et al., All together to fight COVID-19, The AmericanJournal of Tropical Medicine and Hygiene 102 (2020) 1181–1183. doi: .[9] S. McDonell, Coronavirus: US and Australia close borders to Chinese ar-rivals, , 2020. BBC, Febru-ary 2, 2020.[10] J. McCurry, Test, trace, contain: how South Korea flattened itscoronavirus curve, ,2020. The Guardian, U.S.April 23, 2020.[11] A. McCann, N. Popovich, J. Wu, Italy’s virus shutdown came too late. whathappens now?, , 2020. TheNew York Times, April 5, 2020.[12] G. Scally, B. Jacobson, K. Abbasi, The UK’s public health response tocovid-19, BMJ (2020) m1932. doi: .[13] M. Iati, et al., All 50 U.S. states have taken steps toward reopening in timefor Memorial Day weekend, , 2020. The Washington Post, May20, 2020.[14] R. Meyer, A. C. Madrigal, A devastating new stage of thepandemic, , 2020. The Atlantic, June 25,2020.[15] N. James, M. Menzies, COVID-19 in the United States: Trajectories andsecond surge behavior, Chaos: An Interdisciplinary Journal of NonlinearScience 30 (2020) 091102. doi: .[16] S. Funk, A. Camacho, A. J. Kucharski, R. M. Eggo, W. J. Edmunds,Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model, Epidemics 22 (2018) 56–61. doi: . 1917] A. Mhlanga, Dynamical analysis and control strategies in modelling ebolavirus disease, Advances in Difference Equations 2019 (2019). doi: .[18] S. K. Biswas, U. Ghosh, S. Sarkar, Mathematical model of Zika virusdynamics with vector control and sensitivity analysis, Infectious DiseaseModelling 5 (2020) 23–41. doi: .[19] R. E. Morrison, A. Cunha, Embedded model discrepancy: A case study ofZika modeling, Chaos: An Interdisciplinary Journal of Nonlinear Science30 (2020) 051103. doi: .[20] C. Manchein, E. L. Brugnago, R. M. da Silva, C. F. O. Mendes, M. W.Beims, Strong correlations between power-law growth of COVID-19 in fourcontinents and the inefficiency of soft quarantine strategies, Chaos: AnInterdisciplinary Journal of Nonlinear Science 30 (2020) 041102. doi: .[21] J. A. T. Machado, A. M. Lopes, Rare and extreme events: thecase of COVID-19 pandemic, Nonlinear Dynamics (2020). doi: .[22] N. James, M. Menzies, Cluster-based dual evolution for multivariate timeseries: Analyzing COVID-19, Chaos: An Interdisciplinary Journal ofNonlinear Science 30 (2020) 061108. doi: .[23] B. Blasius, Power-law distribution in the number of confirmed COVID-19cases, Chaos: An Interdisciplinary Journal of Nonlinear Science 30 (2020)093123. doi: .[24] M. Perc, N. G. Miksić, M. Slavinec, A. Stožer, Forecasting COVID-19,Frontiers in Physics 8 (2020) 127. doi: .[25] K. Shang, B. Yang, J. M. Moore, Q. Ji, M. Small, Growing networkswith communities: A distributive link model, Chaos: An InterdisciplinaryJournal of Nonlinear Science 30 (2020) 041101. doi: .[26] A. Karaivanov, A social network model of COVID-19, PLOS ONE 15(2020) e0240878. doi: .[27] R. Moeckel, B. Murray, Measuring the distance between time series,Physica D: Nonlinear Phenomena 102 (1997) 187–194. doi: .[28] G. J. Székely, M. L. Rizzo, N. K. Bakirov, Measuring and testing dependenceby correlation of distances, The Annals of Statistics 35 (2007) 2769–2794.doi: .2029] C. F. Mendes, M. W. Beims, Distance correlation detecting Lyapunovinstabilities, noise-induced escape times and mixing, Physica A: StatisticalMechanics and its Applications 512 (2018) 721–730. doi: .[30] C. F. O. Mendes, R. M. da Silva, M. W. Beims, Decay of the distanceautocorrelation and Lyapunov exponents, Physical Review E 99 (2019).doi: .[31] A. Vazquez, Polynomial growth in branching processes with divergingreproductive number, Physical Review Letters 96 (2006). doi: .[32] J. H. Ward, Hierarchical grouping to optimize an objective function, Journalof the American Statistical Association 58 (1963) 236–244. doi: .[33] G. J. Szekely, M. L. Rizzo, Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method, Journal ofClassification 22 (2005) 151–183. doi: .[34] A.-M. Madore, et al., Contribution of hierarchical clustering techniquesto the modeling of the geographic distribution of genetic polymorphismsassociated with chronic inflammatory diseases in the Québec population,Public Health Genomics 10 (2007) 218–226. doi: .[35] M. Kretzschmar, R. T. Mikolajczyk, Contact profiles in eight Europeancountries and implications for modelling the spread of airborne infectiousdiseases, PLoS ONE 4 (2009) e5931. doi: .[36] H. Alashwal, M. E. Halaby, J. J. Crouse, A. Abdalla, A. A. Moustafa,The application of unsupervised clustering methods to Alzheimer’s disease,Frontiers in Computational Neuroscience 13 (2019). doi: .[37] H. Muradi, A. Bustamam, D. Lestari, Application of hierarchical clusteringordered partitioning and collapsing hybrid in Ebola virus phylogeneticanalysis, in: 2015 International Conference on Advanced Computer Scienceand Information Systems (ICACSIS), IEEE, 2015, pp. 317–323. doi: .[38] R. Rizzi, P. Mahata, L. Mathieson, P. Moscato, Hierarchical clusteringusing the arithmetic-harmonic cut: Complexity and experiments, PLoSONE 5 (2010) e14067. doi: .[39] N. James, M. Menzies, L. Azizi, J. Chan, Novel semi-metrics for multivari-ate change point analysis and anomaly detection, Physica D: NonlinearPhenomena 412 (2020) 132636. doi: .2140] D. J. Fenn, M. A. Porter, S. Williams, M. McDonald, N. F. Johnson, N. S.Jones, Temporal evolution of financial-market correlations, Physical ReviewE 84 (2011). doi: .[41] W. Rudin, Functional Analysis, McGraw-Hill Science, 1991.[42] Human development reports, http://hdr.undp.org/en/content/human-development-index-hdi , 2020. United Nations DevelopmentProgramme.[43] N. James, M. Menzies, P. Radchenko, COVID-19 second wave mortality inEurope and the United States, arXiv:2012.13197 (2020).[44] Sweden adds further restrictions on outdoor gatherings as coronavirus caseshit record highs, ,2020. ABC, November 17, 2020.[45] S. Neuman, France announces further reopening amid decliningnumber of coronavirus cases, , 2020. NPR,June 15, 2020.[46] A. Boadle, Brazil has record new coronavirus cases, surpasses Francein deaths, , 2020.Reuters, May 31, 2020.[47] C. Kantis, S. Kiernan, J. S. Bardi, Updated: Timeline of theCoronavirus, , 2020. Think Global Health, December31, 2020.[48] Our World in Data, https://ourworldindata.org/coronavirus-source-data , 2020. Accessed November 25, 2020.[49] S. A. Lauer, et al., The incubation period of Coronavirus disease 2019(COVID-19) from publicly reported confirmed cases: Estimation and ap-plication, Annals of Internal Medicine 172 (2020) 577–582. doi:10.7326/m20-0504