Association between COVID-19 cases and international equity indices
HHuman and financial cost of COVID-19
Nick James
School of Mathematics and Statistics, University of Sydney, NSW, 2006, Australia
Max Menzies ∗ Yau Mathematical Sciences Center, Tsinghua University, Beijing, 100084, China (Dated: September 25, 2020)This paper analyzes the human and financial costs of the COVID-19 pandemic on 92 countries. We comparecountry-by-country equity market dynamics to cumulative COVID-19 case and death counts and new casetrajectories. First, we examine the multivariate time series of cumulative cases and deaths, particularly regardingtheir changing structure over time. We reveal similarities between the case and death time series, and keydates that the structure of the time series changed. Next, we classify new case time series, demonstrate fivecharacteristic classes of trajectories, and quantify discrepancy between them with respect to the behavior ofwaves of the disease. Finally, we show there is no relationship between countries’ equity market performance andtheir success in managing COVID-19. Each country’s equity index has been unresponsive to the domestic orglobal state of the pandemic. Instead, these indices have been highly uniform, with most movement in March.
I. INTRODUCTION
COVID-19 has had an immense social and economic impacton countries around the world, claiming many lives, necessi-tating business closures, and sending financial markets intodisarray. This paper addresses the following question: on acountry-by-country basis, what has had the most impact on acountry’s stock market - its total cumulative cases, the growthin new daily cases, the return of second waves of the disease,or the worldwide state of the pandemic? The goal of this paperis to study the worldwide spread of COVID-19, analyze thevarious waves of the disease on a country-by-country basis,and show that the financial markets have been unresponsive toall developments in new or cumulative cases after March.The pandemic has prompted a substantial amount of atten-tion and research. Epidemiologists have analyzed the spreadof COVID-19 and potential measures of containment [1–6],while clinical researchers have explored potential treatmentsfor the disease [7–13]. In finance, many studies have observedthe impact of COVID-19 on stock markets [14–16], particu-larly regarding financial contagion [17, 18] and stability [19].Within the nonlinear dynamics community, a majority of pa-pers on COVID-19 have used new and traditional techniquesto analyze and predict the spread of cases and deaths [20–26].There is an absence of research that studies financial marketsin conjunction with the spread of the virus.For this goal, we use new and existing time series analy-sis techniques. Existing methods of time series analysis arediverse, including power-law models [27–30], and nonparamet-ric methods such as distance analysis [31], distance correlation[32–34] and network models [35, 36]. Time series analysis hasbeen widely applied to both finance [37–42] and epidemiology[43, 44], including COVID-19 [21, 25, 45].We implement two methods of clustering time series, whichhave been previously used in various financial [46–48] andepidemiological applications, including inflammatory diseases ∗ [email protected] [49], airborne diseases [50], Alzheimer’s disease [51], Ebola[52], SARS [53], and COVID-19 [45]. The two methods weuse are hierarchical clustering [54, 55] and the optimal one-dimensional implementation of K-means, Ckmeans.1d.dp [56].In each of the proceeding three sections, we implement timeseries analysis and clustering for a different goal. In Section II,we use a smoothed dynamic implementation of cluster analysisto track the worldwide spread of COVID-19, particularly thechange in structure over time. In Section III, we apply semi-metrics to sets of turning points to classify countries accordingto the disease’s first, second or third wave behavior. In SectionIV, we use a new method to analyze case trajectories andequity markets in conjunction, and show the markets are highlyconcurrent with each other, not any country’s case counts.Section V summarizes our findings regarding the considerablydifferent progression of COVID-19 and equity market trendsof 2020. II. CUMULATIVE COVID-19 CASE AND DEATH SPREAD
In this section, we use a dynamic and smoothed implemen-tation of cluster analysis to study the worldwide spread ofCOVID-19, track the relationships between countries’ cumu-lative case and death counts, and detect changes in the struc-ture of the two time series. Our data spans 12/31/2019 to08/31/2020, a period of T =
245 days. We restrict attention tocountries with more than 10 000 cumulative cases at the endof the data period, leaving n =
92 countries. We order thesecountries by alphabetical order and let x i ( t ) , y i ( t ) ∈ R be themultivariate time series of cumulative daily cases and deaths,respectively, for i = , ..., n and t = , ..., T . A. Cluster-based methodology for multivariate time series
Following [25], this analysis proceeds in several steps, whichare further explained in Appendix A. First, given the multivari-ate time series of cases or deaths, we generate a logarithmicdistance matrix D ( t ) between counts x i ( t ) at time t . That is, D ( t ) a r X i v : . [ phy s i c s . s o c - ph ] S e p is an n × n matrix with entries D ( t ) i j = | log ( x i ( t )) − log ( x j ( t )) | .Next, we estimate an appropriate number of clusters to par-tition the counts x ( t ) , ..., x n ( t ) at each time t . We averageover several methods from the statistical learning literature[57] to produce an estimator k av ( t ) , and then apply exponentialsmoothing to produce a smoothed integer value ˆ k ( t ) . Third,we use the distance matrix D ( t ) to partition the counts intoˆ k ( t ) clusters at each t . As our data is one-dimensional, weapply the optimal implementation of K-means specific to one-dimensional data, Ckmeans.1d.dp [56].We record the results of this day-by-day clustering in severalways. Figure 1 displays the changing cluster memberships inthe form of heat maps. Figure 2a plots the smoothed numberof clusters ˆ k ( t ) for both cases and deaths. We define twosequences of n × n adjacency matrices and affinity matrices defined byAdj ( t ) i j = (cid:40) x i ( t ) and x j ( t ) are in the same cluster,0 , else; (1)Aff ( t ) i j = − D ( t ) i j max D ( t ) . (2)To understand the changing cluster structure of the serieswith time, we define a distance between these adjacency ma-trices. Let the L norm of an n × n matrix A be definedas || A || = ∑ ni , j = | a i j | . Given s , t ∈ [ , ..., T ] , let d ( s , t ) = || Adj ( t ) − Adj ( s ) || . This distance measures the discrepancybetween the respective cluster structure on different days. Weperform hierarchical clustering on d ( s , t ) in Figure 3.Finally, we can use the constructions so far to compare thecase and death time series in conjunction. Turning first to thenumber of clusters, let ˆ k X ( t ) , ˆ k Y ( t ) be the smoothed number forcases and deaths, respectively. Noticing a similarity betweenthese functions, we can compute the most appropriate offsetbetween them. Given a function f ( t ) and δ >
0, we write f δ ( t ) = f ( t + δ ) . An appropriate offset can be computed byminimizing the L norm between functions, || ( ˆ k X ) δ − ˆ k Y || L = (cid:90) | ˆ k X ( t + δ ) − ˆ k Y ( t ) | dt . (3)Turning to the cluster structure, we can define and computethe offset that minimizes the discrepancy between adjacencyor affinity matrices Aff X and Aff Y of the two time series. With τ >
0, we seek to minimize the normalized difference1 T − τ T − τ ∑ t = || Aff ( t ) X − Aff ( t + τ ) Y || . (4)We display this normalized difference as a function of τ inFigure 2b and observe a clear minimum. This can be computedfor both adjacency and affinity matrices. (a)(b) FIG. 1: Heat maps track the changing cluster membership ofselect countries with respect to (a) cases and (b) deaths,respectively. Cluster membership depicts COVID-19 severityrelative to the rest of the world. Clusters are labelled andordered with 1 being the worst impacted at any time. Darkercolors signify worse affected clusters.
B. Results of cluster-based analysis of cases and deaths
We now summarize the results of the three figures. Figure 1atracks the changing cluster membership of 15 select countrieswith respect to their case counts from February onwards, andcaptures the natural history of COVID-19. China was the firstcountry to experience a severe number of cases, and was theunique country in the worst-affected cluster until late March.Then, Italy, Spain and the United States (US) join the worst-affected cluster, struggling to contain their case counts. Fromthe beginning of April until the end of May, the US was theunique member of the worst-affected cluster, signifying howexceptionally it was impacted by COVID-19 cases relative toevery other country. Brazil joins the worst-affected cluster atthe start of June, and India at the start of August. By contrast,the United Kingdom (UK), Italy, Spain and Germany moveto less affected clusters from the beginning of April, likelya result of strict lockdown procedures implemented in thesecountries.Figure 1b tracks the cluster memberships according to deathsfor the same countries. Until mid-March, China was the onlymember of the worst-affected cluster. From then until mid-May, the US, UK, Spain, and Italy belong to the worst-affectedcluster. Subsequently, the UK, Spain and Italy leave this cluster.As with cases, the US was the unique member of the worstcluster with respect to deaths for over a month, with Braziljoining at the end of June, and India joining just before the endof August. Given the similarity in case and death cluster behav-iors, anomalous countries can be identified if they belong to asignificantly different case and death cluster at the same time.The most anomalous country is Singapore. On 08/31/2020,Singapore belonged to the fifth case cluster, but the ninth andleast severe death cluster. Indeed, on this day, Singapore had56771 cases and only 27 deaths, a lower death rate than anyother country under consideration.Figure 2a tracks the changing number of clusters for bothcases and deaths. During February and March, the number ofclusters rises substantially as the virus spreads to different coun-tries at different rates. Subsequently, cluster numbers stabilizein April, May and June, then begin to decline as cumulativecounts around the world begin to exhibit more homogeneity.Toward the end of the period, the greater number of case clus-ters than death clusters reflects the greater heterogeneity indeath rates than cumulative cases. Singapore is the starkestexample here, but this difference reveals a general trend thatthe time series for deaths become more spread out than thetime series for cases. The minimal offset in the number ofclusters is computed to be δ =
27. Figure 2b displays a convexminimum of τ =
15 for the offset that minimizes the discrep-ancy between affinity matrices pertaining to the case and deathtime series. With respect to adjacency matrices, this offset is τ = (a)(b) FIG. 2: (a) Smoothed number of clusters ˆ k ( t ) as a function oftime for both cases and deaths. Similarity is observed up to anoffset computed as δ =
27. (b) Normalized differencebetween affinity matrices, with an optimal offset of τ = III. NEW CASE TRAJECTORIES AND WAVE BEHAVIOR
In this section, we study the trajectories of new cases. Againwe restrict to the n =
92 countries with more than 10 000 totalrecorded cases as of 08/31/2020. Our goal is to algorithmicallyidentify turning points in new case counts on a country-by-country basis and therefore determine which countries are intheir first, second, or later waves of the disease. We also applya measure of discrepancy between sets of turning points tocompare this behavior between countries.Following [58], we proceed in several steps, which are fur-ther detailed in Appendix B. Let z i ( t ) ∈ R be the time series ofnew daily cases, with countries ordered alphabetically. First,we apply a Savitzky-Golay filter to produce a collection ofsmoothed time series ˆ z i ( t ) , t = , ..., T and i = , ..., n . Next,we apply a two-step algorithm where we select and then refinea set of turning points. We assign to each smoothed time seriesa non-empty set P i and T i of local maxima (peaks) and local (a)(b) FIG. 3: Hierarchical clustering on the distance d ( s , t ) between adjacency matrices Adj ( t ) at different times t , for (a) cases and (b)deaths. Each cluster is an unbroken interval of dates. The three boundary dates 05/30, 07/02, 08/08 for cases and 06/13, 07/21,08/18 for deaths herald significant changes in the structure of the multivariate time series on these dates.minima (troughs). Every sequence of turning points beginswith a trough at t =
1, where there are zero cases, and alter-nates between trough and peak. These sequences determineif a given country is in a first or second wave of COVID-19.An assigned sequence of TP indicates the country is in its firstwave, with case counts that have never materially decreased;a sequence of TPTP indicates a country is in its second wave.A sequence of TPT indicates a country experienced one wavefollowed by a period of significant decline. If this trough oc-curs as the last day of the period, the cases are still in decline;if it occurs before the last day, the wave has reached a localminimum and is completely over.Finally, we measure distance between two sets of turningpoints using the semi-metrics between finite sets proposed in[59]. Given two non-empty finite sets A , B , this is defined as D ( A , B ) = (cid:18) ∑ b ∈ B d ( b , A ) | B | + ∑ a ∈ A d ( a , B ) | A | (cid:19) . (5)The semi-metric D ( A , B ) is symmetric, non-negative, and zeroif and only if A = B . Then, we define the n × n turning pointdistance matrix D TP by D TPi j = D ( P i , P j ) + D ( T i , T j ) . (6)Our algorithmic approach classifies the 92 countries underconsideration into five characteristic classes. 15 countries,including Brazil, India and Argentina, displayed in Figures4a, 4b, 4c, respectively, are assigned the sequence TP anddetermined to be in their first wave of the disease. 31 countries,including China (4d), Sweden (4e) and Russia (4f) are assignedthe sequence TPT, indicating these countries experienced onewave of the disease - their counts have either reached a localminimum or are still in decline. 28 countries, including Spain(4g), Italy (4h), the UK (4i) and Germany (4j) are assignedTPTP and determined to be in the midst of their second wave.14 countries, including the United States (4k) and Singapore(4l) are assigned TPTPT, indicating an ongoing or completeddecline from a second wave. Notably, the United States wasin a rapid decline in new cases as of 08/31/2020, but stilla substantial number of ∼ ×
92 ma-trix D TP . China is an outlier due to turning points that occurredmuch earlier than any other country, and relatively little activ-ity in the disease after March. Excluding China, four primaryclusters are revealed in this dendrogram, corresponding to dif-fering behaviors of waves of the disease. The semi-metric inEquation (5) prioritizes low minimal distances between sets,rather than the number of elements. Thus, the four countries intheir third wave are assigned to the same cluster as the second wave countries due to low minimal distances between theirturning point sets. IV. EQUITY MARKET DYNAMICS
In this section, we study the dynamics of 17 countries’ equityindices with respect to both pricing and 30-day rolling volatility.The data spans 01/01/2020 to 08/31/2020, a period of T = p i ( t ) be the multivariate time series of each country’sdaily closing equity prices, for t = , ..., T and i = , ..., σ i ( t ) be the 30-day rolling volatility, t = , ..., T −
30. Foreach t , this is defined as the standard deviation of the previous30 days of index data, normalized by √ p i ∈ R T . Let || p i || = ∑ T i = | p i ( t ) | be its L norm. We can define a normalized indexprice trajectory by g i = p i || p i || . Analogously, we define || σ i || = ∑ T − i = | σ i ( t ) | and the normalized volatility trajectory by v i = σ i || σ i || . These vectors highlight the relative changes of priceor volatility within the entire period. We then define two trajectory distance matrices , D Pi j = || g i − g j || and D vol i j = || v i − v j || .We analyze these distance matrices D P and D vol , which aresymmetric, real matrices with trace 0. As such, they can be di-agonalized with real eigenvalues. To determine self-similaritywithin these indices with respect to prices and rolling volatility,we plot the absolute values of the eigenvalues | λ | ≤ ... ≤ | λ | for each respective matrix in Figure 7. Inspecting the collec-tive similarity of Figures 6a and 6b, we expect that a largenumber K of the 17 countries are highly similar with respectto equity price and rolling volatility, with a small number ofoutliers. Indeed, one would expect many volatility trajectoriesto behave similarly due to structural financial market factorssuch as volatility clustering. We examine the eigenvalues to (a) (b) (c)(d) (e) (f)(g) (h) (i)(j) (k) (l) FIG. 4: Smoothed time series and identified turning points for various countries: (a) Brazil (b) India and (c) Argentina are in theirfirst wave. (d) China (e) Sweden and (f) Russia are declining from or have finished their first wave. (g) Spain (h) Italy and (i) theUK are experiencing their second wave. (j) Germany (k) the US and (l) Singapore are declining from their second wave.FIG. 5: Turning point distance matrix D TP , defined in Section III, measures distance between sets of turning points in new casetrajectories. Excluding the outlier China, four primary clusters of time series are identified with the following behaviors: 15countries in their first wave, 31 countries declining from their first wave, 14 countries declining from their second wave, and afinal cluster containing 28 countries currently in their second wave and 4 in their third.estimate this K . If there is a large collection of highly similarelements in an n × n distance matrix D , the matrix would havethe form c c c . . . c K r ∗ ∗ . . . ∗ r ∗ ∗ . . . ∗ r ∗ ∗ . . . ∗ ... ... ... . . . r K ∗ ∗ ∗ ∗ r r ... r K r T r T r T . . . r TK where rows r , . . . , r K are highly similar to one another andelements ∗ are close to zero. Such a matrix is a small deforma-tion away from a rank n − ( K − ) matrix, and so K − ε , if | λ | ≤ ... ≤| λ K − | ≤ ε , then we can deduce K indices are similar withrespect to price or volatility. This is a concise measure of thenumber of indices that are similar within the collection studied.In Figure 7a, we can set ε = . ε = . D are symmet-ric, they can be conjugated by an orthogonal matrix to yield adiagonal matrix of eigenvalues. As a consequence, the operatornorm [61] of D coincides with the largest eigenvalue | λ n | . Thatis, max x ∈ R n −{ } || Dx |||| x || = || D || op = | λ n | . (7)We can see from Figure 7 that the operator norm for D vol isapproximately four times that of D P . Both distance trajectorymatrices are normalized, so a direct comparison is appropriate.Similarly, when comparing L matrix norms, || D vol || = . || D P || = .
09. That is, there is a higher degree of collec-tive similarity among indices with respect to price trajectoriesthan volatility. This is a surprising result, given the expected (a)(b)
FIG. 6: Equity market dynamics for 17 countries with respectto (a) adjusted closing equity prices, normalized to begin theyear at 1, and (b) rolling volatility for the prior 30 days.similarity in collective volatility behavior due to volatility clus-tering. This result may differ if we were to compare price andvolatility trajectories of assets in different financial sectors.Finally, we quantify the extent to which large changes in in-dex price coincide with high case counts. For this purpose, wetake all 17 equity price time series p i ( t ) and the correspondingcountries’ new case time series z i ( t ) ∈ R ≥ . Large values of z i ( t ) mean that the disease is spreading rapidly in that country.On the other hand, large values of the absolute value log re-turns | R i ( t ) | = (cid:12)(cid:12)(cid:12) log (cid:16) p i ( t ) p i ( t − ) (cid:17)(cid:12)(cid:12)(cid:12) indicate significant changes inthe value of the market. Since equity data is only applicable onweekdays, we restrict the new case time series to the weekdaysto yield a time series w i ( t ) , t = , ..., T . As new cases are loweron the weekends, this provides a good representation of thetrajectory of new cases in each country.We define a symmetric 34 ×
34 matrix M that compares theconcurrence of these changes. The entries of M are normalizedinner products between time series to measure the extent of (a)(b) FIG. 7: Absolute value of eigenvalues for the trajectorydistance matrices for (a) equity prices and (b) rolling volatility.Choosing ε = . , . || R i || = (cid:32) T ∑ t = | R i ( t ) | (cid:33) , (8) || w i || = (cid:32) T ∑ t = w i ( t ) (cid:33) , (9) < R i , w j > n = || R i |||| w j || T ∑ t = R i ( t ) w j ( t ) . (10)The pairing < ., . > n is a normalized inner product that mea-sures the concurrence of large changes in the time series moreaccurately than the correlation between price and new casetime series. The matrix M is defined as follows: M i j = < | R i ( t ) | , | R j ( t ) | > n if 1 ≤ i , j ≤ ,< w i − ( t ) , w j − ( t ) > n if 18 ≤ i , j ≤ ,< | R i ( t ) | , w j − ( t ) > n if 1 ≤ i ≤ , ≤ j ≤ . (11)As all the sequences | R i ( t ) | , w j ( t ) are non-negative, all entriesof M are non-negative. In general, given non-negative func-tions f , g , < f , g > n = f = α g for some α > < f , g > n = f and g are disjoint sets.In Figure 8 we perform hierarchical clustering on the matrix M and reveal several insights. First, China is highly anoma-lous with respect to both case counts and its index. Indeed,China recorded a large number of cases only during Januaryand February, with few cases since and no subsequent wave.Second, China is also relatively anomalous with respect to itsindex. We can see two particular periods in Figure 6a whereChina did not undergo similar large changes as other countries.In March, China’s index experienced a less severe drawdownthan every other country; in July, China experienced a periodof significant positive growth, unlike any other country.Third, the dendrogram reveals a high level of similarityamong equity indices, excluding China’s, visible in the clearsubcluster in the center of the dendrogram. These 16 equityindices form a submatrix in which the mean of all the entriesis 0.78, indicating high concurrence of large price movements.Turning to the remaining indices, the dendrogram reveals moreheterogeneity, yet some similarity, between the same 16 coun-tries’ case counts. While the 16 equity indices form one promi-nent subcluster, the same countries’ case counts split into twosubclusters. The normalized inner product produces high as-sociation between countries whose peaks in cases occurred atsimilar times. Indeed, the first cluster generally experiencedmuch earlier peaks, as can be seen for Italy (4h) and Germany(4j), while the second cluster experienced large case countsmuch later, such as Brazil (4a), India (4b) and Argentina (4c).Even within the two subclusters, there is less similarity be-tween case counts than there is for indices. This is reflectedin the tree of Figure 8, where branches belonging to countries’equity indices are split much lower in the tree’s structure.Most significantly, the figure reveals that there is no concur-rence at all between large changes in countries’ case countsand their equity indices. Excluding China’s index, all otherequity indices have moved together closely - even China itselfexhibited some similarity with other indices. V. CONCLUSION
In this paper, we analyze the natural history of COVID-19across the world in conjunction with the stock market activityof 17 countries. Qualitatively and quantitatively we demon-strate that market movements have been highly uniform be-tween these 17 countries, with China as the only exception.In Section II, we analyze the structure of the multivariatetime series of COVID-19 cases and deaths. Our analysis iso- lates the US as the unique member of the worst-affected clusterwith respect to both cumulative cases and deaths for over amonth, reflecting its exceptional impact by COVID-19. Sub-sequently, Brazil and India join that cluster, as their countsrose rapidly. The dendrograms in Figure 3 each exhibit fourcontiguous intervals of dates, allowing us to observe key dateswhen the structure of the world’s case counts changed sub-stantially. With respect to cases, these dates are 05/30, 07/02,and 08/08. Indeed, all these dates herald significant shifts inthe status of the disease around the world. On 05/30, Russiaand Brazil enter the worst-affected cluster, on the same day asthe latter reported a record number of cases [62]. On 07/02,several countries that had been heavily impacted earlier, suchas China, South Korea, Singapore, and the Netherlands, en-ter less-affected clusters. This follows from June, a period ofsteady decline in Europe [63]. On 08/08, both Singapore andthe Netherlands move back into more severely affected clusters.Also around this time, India, Brazil, much of Africa and SouthAmerica experience significantly more cases [64], while casesin Europe demonstrate a slower increase.In Section III, we identify five characteristic behaviors ofnew case trajectories between countries. 24 countries exhibittheir greatest counts up to smoothing on the final day of theperiod. 46 countries experience a second wave, with 28 experi-encing a more severe second wave than the first. Singapore andAustralia responded quickly to the virus [65], and South Koreawas hailed for its early contact tracing success [66], yet all threeof these countries experience second waves, and South Koreaexhibits its greatest case counts at the end of the period. Italyand Spain were acknowledged to have imposed lockdowns toolate in March [67], with case counts eventually declining inMay. Nonetheless, both of these countries experience secondwaves, with Spain’s more severe than its first. Overall, longfirst waves and the return of second waves contribute to highcase counts toward the end of the data period.Despite the substantial activity in COVID-19 cases afterMarch, the heterogeneity of subsequent waves and the numberof countries with peaks in new cases, no discernible impacton financial indices was observed from March. In Section IV,we apply a new method to analyze collective equity market dy-namics across 17 countries in conjunction with their new casecounts. Eigenvalue analysis indicates high similarity between16 countries’ equity prices, with China as the only outlier. Weintroduce an inner product pairing that demonstrates little con-currence between the profound market movements observed inMarch and development in COVID-19 cases.Overall, we have chronicled the natural history of COVID-19 together with the market movements during 2020. De-spite substantial heterogeneity in the new case trajectories ona country-by-country basis and frequent changes in the orderand structure of most affected countries in cumulative cases,we have observed high homogeneity in the markets. All havemoved together with substantial drawdown in March, followedby steady recovery, and no qualitative or quantitative relation-ship to any developments in COVID-19.0FIG. 8: Hierarchical clustering on the normalized inner product matrix M . High similarity is observed between 16 equity indices,with no relationship to case counts. China is observed as an outlier in both cases and index. Other countries’ case counts split intotwo subclusters according to whether large counts of new cases occur disproportionately early or later. DATA AVAILABILITY
Daily COVID-19 case and death counts can be accessedat "Our World in Data" [68]. Financial data is included in asupplementary file.
ACKNOWLEDGMENTS
Thanks to Kerry Chen for helpful edits and comments.
Appendix A: Cluster-based evolution methodology
In this section, we provide more details and explanation forthe methodology described in Section II A. Given the expo-nential spread of the disease, we select a logarithmic distancebetween counts. We replace any data entry that is empty or0 - before any cases are detected - with a 1, so that the logof that entry is defined. Then, we define a distance on countsby d ( x , y ) = | log ( x ) − log ( y ) | . Effectively, this pulls back theEuclidean metric on R by the homeomorphism log : R + → R and makes the positive reals a one-dimensional normed space.This allows us to use efficient cluster methods specific to one-dimensional data.The goal is to partition the case or death counts x ( t ) , ..., x n ( t ) into a time-varying number of clusters at eachtime t . We wish to choose the number of clusters in such a waythat provides us meaningful inference on how the multivariatetime series evolves as a whole. A highly variable number of clusters would obscure inference on individual countries’ clus-ter memberships. So we combine several methods of choosingthis number to reduce any bias in our estimator and then im-plement exponential smoothing to yield a suitably changingnumber of clusters with time. For one-dimensional data, itis often regarded as unsuitable to use multivariate clusteringmethods, as simpler alternatives exist. We use an optimal im-plementation of K-means clustering called Ckmeans.1dp.dp[56]. This requires the choice of the number of clusters apriori .To choose the number of clusters at each t , we average sixmethods described in [57]. These methods are as follows: Pt-biserial index [69], Silhouette score [70], KL index [71], Cindex [72], McClain-Rao index [73] and Dunn index [74], butother methods could be used alternatively. Let the cluster num-bers computed by these methods be k ( t ) , ..., k ( t ) , respectively.We define k av ( t ) = ∑ j = k j ( t ) . This value is not necessarilyan integer, so we cannot cluster with it directly. The function k av ( t ) is approximately locally stationary. So we may applyexponential smoothing to produce a smoothed integer valueˆ k ( t ) . It is with this number that we cluster. Doing so at each t produces a time-varying partition of the 92 countries intoclusters, and defines an adjacency matrix Adj ( t ) for every time t . Appendix B: Turning point methodology
In this section, we provide more details for the identificationof turning points of a new case time series z ( t ) . First, somesmoothing of the counts is necessary due to irregularities in1the data set, and discrepancies between different data sources.There are consistently lower counts on the weekends, and somenegative counts due to retroactive adjustments. The Savitzy-Golay filter ameliorates these issues by combining polynomialsmoothing with a moving average computation - this movingaverage eliminates all but a few small negative counts; wesimply replace these negative counts with zero. This yields asmoothed time series ˆ z ( t ) ∈ R ≥ . Subsequently, we perform atwo-step process to select and then refine a non-empty set P oflocal maxima (peaks) and T of local minima (troughs).Following [58], we apply a two-step algorithm to thesmoothed time series ˆ z ( t ) . The first step produces an alter-nating sequence of troughs and peaks, beginning with a troughat t =
1, where there are zero cases. The second step refinesthis sequence according to chosen conditions and parameters.The most important conditions to identify a peak or trough,respectively, in the first step, are the following:ˆ z ( t ) = max { ˆ z ( t ) : max ( , t − l ) ≤ t ≤ min ( t + l , T ) } , (B1)ˆ z ( t ) = min { ˆ z ( t ) : max ( , t − l ) ≤ t ≤ min ( t + l , T ) } , (B2)where l is a parameter to be chosen. Following [58], we select l =
17, which accounts for the 14-day incubation period ofthe virus [75] and less testing on weekends. Defining peaksand troughs according to this definition alone has several flaws,such as the potential for two consecutive peaks.Instead, we implement an inductive procedure to choose analternating sequence of peaks and troughs. Suppose t is thelast determined peak. We search in the period t > t for thefirst of two cases: if we find a time t > t that satisfies (B2)as well as a non-triviality condition ˆ z ( t ) < ˆ z ( t ) , we add t tothe set of troughs and proceed from there. If we find a time t > t that satisfies (B1) and ˆ z ( t ) ≥ ˆ z ( t ) , we ignore this lowerpeak as redundant; if we find a time t > t that satisfies (B1)and ˆ z ( t ) > ˆ z ( t ) , we remove the peak t , replace it with t andcontinue from t . A similar process applies from a trough at t .At this point, the time series is assigned an alternating se-quence of troughs and peaks. However, some turning points are immaterial and should be excluded. The second step isa flexible approach introduced in [58] for this purpose. Inthis paper, we introduce new conditions within this framework.First, let t m be the global maximum of ˆ z ( t ) . If this is not unique,we declare t m to be the first global maximum. This time t m is always declared a peak during the first step detailed above.Given any other peak t , we compute the peak ratio ˆ z ( t ) ˆ z ( t m ) . Weselect a parameter δ , and if ˆ z ( t ) ˆ z ( t m ) < δ , we remove the peak t . If two consecutive troughs t , t remain, we remove t ifˆ z ( t ) > ˆ z ( t ) , and remove t if ˆ z ( t ) ≤ ˆ z ( t ) . That is, we en-sure the sequence of peaks and troughs remains alternating.In our implementation, we choose δ = . . Unlike [58], weremove earlier peaks, not just subsequent peaks, according tothis condition.Finally, we use the same log-gradient function betweentimes t < t , defined aslog-grad ( t , t ) = log ˆ z ( t ) − log ˆ z ( t ) t − t . (B3)The numerator equals log ( ˆ z ( t ) ˆ z ( t ) ) , a "logarithmic rate of change."Unlike the standard rate of change given by ˆ z ( t ) ˆ z ( t ) −
1, the loga-rithmic change is symmetrically between ( − ∞ , ∞ ) . Let t , t beadjacent turning points (one a trough, one a peak). We choosea parameter ε = . | log-grad ( t , t ) | < ε , (B4)that is, the average logarithmic change is less than 0.7%, weremove t from our sets of peaks and troughs. If t is not thefinal turning point, we also remove t . Appendix C: Classification of countries by wave behavior
In Table I, we classify all n =
92 countries into 5 differentcharacteristic classes according to the methodology of SectionIII. [1] G. Wang, Y. Zhang, J. Zhao, J. Zhang, and F. Jiang, The Lancet , 945 (2020).[2] M. Chinazzi et al. , Science , 395 (2020).[3] Y. Liu, A. A. Gayle, A. Wilder-Smith, and J. Rocklöv, Journalof Travel Medicine (2020), 10.1093/jtm/taaa021.[4] Y. Fang, Y. Nie, and M. Penny, Journal of Medical Virology ,645 (2020).[5] P. Zhou et al. , Nature , 270 (2020).[6] J. Dehning, J. Zierenberg, F. P. Spitzner, M. Wibral, J. P. Neto,M. Wilczek, and V. Priesemann, Science , eabb9789 (2020).[7] F. Jiang, L. Deng, L. Zhang, Y. Cai, C. W. Cheung, and Z. Xia,Journal of General Internal Medicine , 1545 (2020).[8] Z. Y. Zu, M. D. Jiang, P. P. Xu, W. Chen, Q. Q. Ni, G. M. Lu,and L. J. Zhang, Radiology , 200490 (2020).[9] G. Li and E. D. Clercq, Nature Reviews Drug Discovery , 149 (2020).[10] L. Zhang and Y. Liu, Journal of Medical Virology , 479(2020).[11] M. Wang, R. Cao, L. Zhang, X. Yang, J. Liu, M. Xu, Z. Shi,Z. Hu, W. Zhong, and G. Xiao, Cell Research , 269 (2020).[12] B. Cao et al. , New England Journal of Medicine , e68 (2020).[13] L. Corey, J. R. Mascola, A. S. Fauci, and F. S. Collins, Science , 948 (2020).[14] D. Zhang, M. Hu, and Q. Ji, Finance Research Letters , 101528(2020).[15] Q. He, J. Liu, S. Wang, and J. Yu, Economic and PoliticalStudies , 1 (2020).[16] A. Zaremba, R. Kizys, D. Y. Aharon, and E. Demir, FinanceResearch Letters , 101597 (2020).[17] M. Akhtaruzzaman, S. Boubaker, and A. Sensoy, Finance Re- search Letters , 101604 (2020).[18] D. I. Okorie and B. Lin, Finance Research Letters , 101640(2020).[19] S. Lahmiri and S. Bekiros, Chaos, Solitons & Fractals ,109936 (2020).[20] S. Khajanchi and K. Sarkar, Chaos: An Interdisciplinary Journalof Nonlinear Science , 071101 (2020).[21] C. Manchein, E. L. Brugnago, R. M. da Silva, C. F. O. Mendes,and M. W. Beims, Chaos: An Interdisciplinary Journal of Non-linear Science , 041102 (2020).[22] M. H. D. M. Ribeiro, R. G. da Silva, V. C. Mariani, and L. dosSantos Coelho, Chaos, Solitons & Fractals , 109853 (2020).[23] T. Chakraborty and I. Ghosh, Chaos, Solitons & Fractals ,109850 (2020).[24] C. Anastassopoulou, L. Russo, A. Tsakris, and C. Siettos, PLOSONE , e0230405 (2020).[25] N. James and M. Menzies, Chaos: An Interdisciplinary Journalof Nonlinear Science , 061108 (2020).[26] B. Blasius, Chaos: An Interdisciplinary Journal of NonlinearScience , 093123 (2020). [27] A. Vazquez, Physical Review Letters (2006), 10.1103/phys-revlett.96.038702.[28] P. Gopikrishnan, M. Meyer, L. Amaral, and H. Stanley, TheEuropean Physical Journal B , 139 (1998).[29] B. Podobnik, D. Horvatic, A. M. Petersen, and H. E. Stanley,Proceedings of the National Academy of Sciences , 22079(2009).[30] Y. Liu, P. Gopikrishnan, Cizeau, Meyer, Peng, and H. E. Stanley,Physical Review E , 1390 (1999).[31] R. Moeckel and B. Murray, Physica D: Nonlinear Phenomena , 187 (1997).[32] G. J. Székely, M. L. Rizzo, and N. K. Bakirov, The Annals ofStatistics , 2769 (2007).[33] C. F. Mendes and M. W. Beims, Physica A: Statistical Mechanicsand its Applications , 721 (2018).[34] C. F. O. Mendes, R. M. da Silva, and M. W. Beims, PhysicalReview E (2019), 10.1103/physreve.99.062206.[35] K. Shang, B. Yang, J. M. Moore, Q. Ji, and M. Small, Chaos:An Interdisciplinary Journal of Nonlinear Science , 041101(2020). [36] J.-P. Onnela, K. Kaski, and J. Kert’esz, The European PhysicalJournal B - Condensed Matter , 353 (2004).[37] D. J. Fenn, M. A. Porter, S. Williams, M. McDonald, N. F.Johnson, and N. S. Jones, Physical Review E (2011),10.1103/physreve.84.026109.[38] S. Drozdz, R. Gebarowski, L. Minati, P. Oswiecimka, andM. Wactorek, Chaos: An Interdisciplinary Journal of NonlinearScience , 071101 (2018).[39] S. Drozdz, L. Minati, P. Oswiecimka, M. Stanuszek, andM. Wactorek, Chaos: An Interdisciplinary Journal of NonlinearScience , 023122 (2020).[40] Z. Eisler and J. Kertész, Physical Review E (2006),10.1103/physreve.73.046109.[41] D. Valenti, G. Fazio, and B. Spagnolo, Physical Review E (2018), 10.1103/physreve.97.062307.[42] F. Wang, K. Yamasaki, S. Havlin, and H. E. Stanley, PhysicalReview E (2006), 10.1103/physreve.73.026117.[43] H. W. Hethcote, SIAM Review , 599 (2000).[44] G. Chowell, L. Sattenspiel, S. Bansal, and C. Viboud, Physicsof Life Reviews , 66 (2016).[45] J. A. T. Machado and A. M. Lopes, Nonlinear Dynamics (2020),10.1007/s11071-020-05680-w.[46] N. Basalto, R. Bellotti, F. D. Carlo, P. Facchi, E. Pantaleo, andS. Pascazio, Physica A: Statistical Mechanics and its Applica-tions , 635 (2007).[47] N. Basalto, R. Bellotti, F. D. Carlo, P. Facchi, E. Pantaleo,and S. Pascazio, Physical Review E (2008), 10.1103/phys-reve.78.046112.[48] R. Mantegna, The European Physical Journal B , 193 (1999).[49] A.-M. Madore et al. , Public Health Genomics , 218 (2007).[50] M. Kretzschmar and R. T. Mikolajczyk, PLoS ONE , e5931(2009).[51] H. Alashwal, M. E. Halaby, J. J. Crouse, A. Abdalla, and A. A.Moustafa, Frontiers in Computational Neuroscience (2019),10.3389/fncom.2019.00031.[52] H. Muradi, A. Bustamam, and D. Lestari, in (IEEE, 2015).[53] R. Rizzi, P. Mahata, L. Mathieson, and P. Moscato, PLoS ONE , e14067 (2010).[54] J. H. Ward, Journal of the American Statistical Association ,236 (1963).[55] G. J. Szekely and M. L. Rizzo, Journal of Classification , 151(2005).[56] H. Wang and M. Song, The R Journal , 29 (2011).[57] P. Radchenko and G. Mukherjee, Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) , 1527 (2017).[58] N. James and M. Menzies, Chaos: An Interdisciplinary Journalof Nonlinear Science , 091102 (2020).[59] N. James, M. Menzies, L. Azizi, and J. Chan, Physica D: Non-linear Phenomena , 132636 (2020).[60] “GDP (current US$),” https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?year_high_desc=true (2020), The World Bank, September 21, 2020.[61] W. Rudin, Functional Analysis (McGraw-Hill Science, 1991).[62] A. Boadle, “Brazil has record new coronavirus cases,surpasses France in deaths,” (2020), Reuters, May 31, 2020.[63] S. Neuman, “France announces further re-opening amid declining number of coron-avirus cases,” (2020), NPR, June 15, 2020.[64] S. Neuman, “Global coronavirus caseshit 20 million as pandemic accelerates,” (2020), Sydney Morning Herald, August 11, 2020.[65] S. McDonell, “Coronavirus: US and Australia close bor-ders to Chinese arrivals,” (2020), BBC, Accessed February 1, 2020.[66] J. McCurry, “Test, trace, contain: how SouthKorea flattened its coronavirus curve,” (2020), The Guardian, April 23, 2020.[67] A. McCann, N. Popovich, and J. Wu, “Italy’s virusshutdown came too late. what happens now?” (2020), The New York Times, April 5, 2020.[68] “Our World in Data,” https://ourworldindata.org/coronavirus-source-data (2020), accessed September 6,2020.[69] G. W. Milligan, Psychometrika , 325 (1980).[70] P. J. Rousseeuw, Journal of Computational and Applied Mathe-matics , 53 (1987).[71] W. J. Krzanowski and Y. T. Lai, Biometrics , 23 (1988).[72] L. J. Hubert and J. R. Levin, Psychological Bulletin , 1072(1976).[73] J. O. McClain and V. R. Rao, Journal of Marketing Research ,456 (1975).[74] J. C. Dunn, Journal of Cybernetics , 95 (1974).[75] S. A. Lauer et al. , Annals of Internal Medicine172