[PDF] A spectral clustering approach for the evolution of the COVID-19 pandemic in the state of Rio Grande do Sul, Brazil

Abstract

The aim of this paper is to analyse the evolution of the COVID-19 pandemic in Rio Grande do Sul by applying graph-theoretical tools, particularly spectral clustering techniques, on weighted graphs defined on the set of 167 municipalities in the state with population 10,000 or more, which are based on data provided by government agencies and other sources. To respond to this outbreak, the state has adopted a system by which pre-determined regions are assigned flags on a weekly basis, and different measures go into effect according to the flag assigned. Our results suggest that considering a flexible approach to the regions themselves might be a useful additional tool to give more leeway to cities with lower incidence rates, while keeping the focus on public safety. Moreover, simulations show the dampening effect of isolation on the dissemination of the disease.

Full PDF

AA SPECTRAL CLUSTERING APPROACH FOR THE EVOLUTION OFTHE COVID-19 PANDEMIC IN THE STATE OF RIO GRANDE DO SUL,BRAZIL

LUIZ EMILIO ALLEM, CARLOS HOPPEN, MATHEUS MICADEI MARZO,AND LUCAS SIVIERO SIBEMBERG

Abstract.

The aim of this paper is to analyse the evolution of the COVID-19 pandemicin Rio Grande do Sul by applying graph-theoretical tools, particularly spectral clusteringtechniques, on weighted graphs deﬁned on the set of 167 municipalities in the state withpopulation 10,000 or more, which are based on data provided by government agencies andother sources. To respond to this outbreak, the state has adopted a system by which pre-determined regions are assigned ﬂags on a weekly basis, and diﬀerent measures go into eﬀectaccording to the ﬂag assigned. Our results suggest that considering a ﬂexible approach tothe regions themselves might be a useful additional tool to give more leeway to cities withlower incidence rates, while keeping the focus on public safety. Moreover, simulations showthe dampening eﬀect of isolation on the dissemination of the disease. Introduction

The aim of this paper is to employ graph-theoretical tools to understand the disseminationof COVID-19 in the Brazilian state of Rio Grande do Sul. These tools may be useful sourcesof additional information for decision making by health and government authorities.The year 2020 has been marked by the global outbreak and spread of the virus SARS-CoV-2, which causes the coronavirus disease (COVID-19) in humans [1]. In December 2019,several patients with an unknown severe respiratory disease were traced back to a wholesalemarket in Wuhan, China. Researchers were quick to detect and isolate a novel strain ofcoronavirus [2]. It was soon discovered that the virus is highly contagious, and that it can betransmitted by infected individuals before they show the ﬁrst symptoms and even by infectedindividuals that remain asymptomatic throughout the course of the disease [3]. This has ledto unprecedented public health measures by the Chinese authorities. A lockdown of Wuhanand 15 other cities in Hubei Province took eﬀect on January 23 [4]. On January 30, the WorldHealth Organization declared COVID-19 a public health emergency of international concern[1]. In the next month, a large number countries implemented measures aiming to preventa global pandemic, ranging from travel restrictions, contact tracing and social isolation toborder closures and lockdowns [1]. These actions turned out to be unsuccessful in eradicatingthe disease, and the WHO characterised the outbreak as a pandemic on March 11. Two dayslater, it assessed that Europe had become the epicentre of the pandemic [1]. The virus thenquickly reached Brazil.The ﬁrst conﬁrmed case of COVID-19 in Brazil dates back to February 26, in the stateof S˜ao Paulo, and the Brazilian Health Ministry declared a state of nationwide communitytransmission on March 20 [5]. At that point, the number of conﬁrmed cases in the state ofRio Grande do Sul was 37 [6], and the state government had already instated measures aimed

C. Hoppen acknowledges the support of CNPq 308054/2018-0 and FAPERGS 19/2551-0001727-8. M. Marzoand L. Sibemberg are supported by CAPES. CAPES is Coordena¸c˜ao de Aperfei¸coamento de Pessoal de N´ıvelSuperior, CNPq is Conselho Nacional de Desenvolvimento Cient´ıﬁco e Tecnol´ogico, and FAPERGS is Funda¸c˜aode Amparo `a Pesquisa do Estado do Rio Grande do Sul. a r X i v : . [ c s . S I] A ug L. E. ALLEM, C. HOPPEN, M. M. MARZO, AND L. S. SIBEMBERG at slowing down the spread of the virus, including school closures and a ban on commercialinterstate travel [7]. In the next month, a large number of restrictions were imposed onactivities that were deemed inessential. By the end of April, there were 1466 cases and 51deaths oﬃcially attributed to COVID-19 in the state of Rio Grande do Sul [6]. At this point,and as of this writing, there was no vaccine or proven eﬀective treatment for patients withsevere cases of COVID-19 [1]. Recognizing the seriousness of the health crisis and the socialand economic impact of widespread isolation, the state government unveiled a regulatorymodel for controlled distancing [8] , which went into eﬀect on May 11, and we now brieﬂydescribe.The state has been divided into 20 (pre-determined) regions based on the availability of ICUbeds for COVID-19 patients. Every week, each region is assigned one of four possible ﬂags ,yellow (low risk), orange (medium risk), red (high risk) or black (very high risk), accordingto a numerical value based on several indices that measure the spread of the disease and theavailability of ICU beds. Each ﬂag entails diﬀerent social distancing measures and imposesdiﬀerent constraints on businesses (or even their mandatory closure). This regulation haslegal precedence over more ﬂexible measures determined by local authorities or by the federalgovernment [9]. Due to its eﬀect on daily lives and on the economic activity, this model hasbeen in the spotlight, and it has mustered praise, but also faced criticism. We should mentionthat, after the adoption of this system in Rio Grande do Sul, other states have followed suitand devised similar models (for instance, Acre, Mato Grosso, Mato Grosso do Sul, Par´a, Riode Janeiro, and So Paulo).The aim of this paper is to analyse the evolution of the COVID-19 pandemic in Rio Grandedo Sul by applying graph-theoretical tools on data provided by government agencies and othersources. Given the distancing model described above, we believe that clustering techniquesare particularly well-suited for this analysis. Spectral clustering techniques are widely used inexploratory data analysis, but we are not aware of applications in connection with epidemio-logical models.The general idea of clustering is to partition a (typically large) data set into (a much smallernumber of) clusters in a way that data in a same cluster are similar , and data in diﬀerentclusters are dissimilar . Formal measures of aﬃnity and of the quality of a given partition relyheavily on the context of the problem being considered. In this paper, we address clusteringfrom a graph-theoretical perspective. We consider weighted graphs G = ( V, E, ω ), where the vertex set V is the data set, the edge set E contains edges connecting elements of V and thefunction ω : E → R > assigns a positive weight w ij to each edge ij ∈ E .Here, we consider two types of aﬃnity measures. The ﬁrst type is based on pendulummigration between cities, by which we mean the daily ﬂow of commuters for work or education,to which we incorporate data about self-isolation. According to our results, isolation did nothave a strong eﬀect on the way cities are clustered together, suggesting that general appeals forisolation by state and federal authorities, and by the media, have a much stronger impact thanappeals by local authorities. The second type of aﬃnity measures is based on the availabilityof ICU beds. In this case, incorporating weekly data, and considering a more ﬂexible approachto the regions, by which new clusters are determined on a weekly basis, seems to generateuseful complementary information for the ﬂag system used in Rio Grande do Sul.We also used a discrete SEIR compartmental model to simulate the spread of the diseaseand the eﬀect of the social distancing measures that have been implemented, based on themigration and isolation data used for clustering. In contrast to clustering techniques, models ofthis type are a basic tool in the epidemiological toolbox, both in their discrete and continuousversions, and there is a vast literature related with them, see [10, 11] and the references therein.Our contribution in this respect was to show that the data for pendulum migration and https://distanciamentocontrolado.rs.gov.br/ PECTRAL CLUSTERING APPROACH TO COVID-19 3 isolation, combined with the available disease information, have been successful in explainingthe evolution of the disease in the state. Extrapolating from this, we conclude that isolationmeasures have been very important in slowing down the spread of the disease (often referredto as ﬂattening the curve ).The remainder of the paper is organized as follows. In Section 2, we describe the dataused in this paper. Section 3 is concerned with spectral clustering and its mathematicalfoundations. The aﬃnity measures mentioned above are discussed in that section, and wealso analyse the partitions that have been obtained by spectral methods. The SEIR model isintroduced and analysed in Section 4. We ﬁnish the paper with concluding remarks.2.

Data

In this section, we describe the data used in our study. The actual matrices are availableas an appendix, included as an ancillary ﬁle. We consider the 167 municipalities in the stateof Rio Grande do Sul whose estimated population in 2019 is above 10.000 according to theBrazilian Institute of Geography and Statistics (IBGE) . Hereafter they will be referred toas cities . The distance between cities is given by a square matrix D = ( d ij ) of order n = 167,where d ij denotes the average road distance from the seat of i to the seat of j and vice-versa,as calculated by the web mapping service Google Maps.Using data from the population census of 2010, which is the most recent census performedin Brazil, we deﬁne square matrices T = ( t ij ) and E = ( e ij ) of order n , where t ij is the numberof daily commuters who reside in i and work in j and e ij is the number of commuters whoreside in i and go to school in j . These matrices have been obtained by extracting anonymizedcensus microdata related to long-form questionnaires, which are publicly available , and byextrapolating them to the entire city population (adjusted to the 2019 values) using the surveyweights that are part of the census microdata. To extract the data from this large dataset,we used a commercial statistical software.We also considered data directly related to the spread of the disease, and to the responseto it, which has been extracted directly from the state health authorities [6], and indirectlythrough UFRGS websites .In our approach, the time t is measured in weeks, where our weeks correspond to the state’s epidemiological weeks , which go from Saturday to Friday. Regarding epidemiological data, weconsider N = 17 weeks starting at the week of March 7-13, when the ﬁrst cases of COVID-19 were oﬃcially conﬁrmed in the state, until July 3. We note that most pandemic relateddata is actually released on a daily basis, but contains ﬂuctuations that may be attributed toadministrative procedures. For instance, the number of reported cases and deaths regularlygoes down on weekends and holidays, and surges in the ﬁrst business days thereafter, whichsuggests that it does not reﬂect the actual behavior of the disease. Regarding cases anddeaths, the weekly data that we collect is simply the overall number of reported cases ina week. Regarding self-isolation and ICU beds occupancy rates, we take the average overthe time period. We should point out that the number of ICU beds in the state expandedconsiderably during the weeks considered, so that the number of total ICU beds in each city isalso tracked on a weekly basis. Finally, we point out that these data are only used for N = 8weeks, starting at the week between May 2 and 8 (when the model for controlled distancingwas unveiled). https://mhbarbian.shinyapps.io/covid19_rs and L. E. ALLEM, C. HOPPEN, M. M. MARZO, AND L. S. SIBEMBERG

The information about self-isolation in each city i ∈ [ n ] = { , . . . , n } is given by values β i ( t ) ∈ [0 ,

1] for all t ∈ { , , . . . , N } . This is an index developed by In Loco , a technologyﬁrm with oﬃces in Brazil and in the United States, calculated from granular anonymizedgeolocation data from more than 60 million mobile devices across Brazil. It is deﬁned as theproportion of devices in a city i that stayed within a radius of 450 meters from their habitualhome during day t [12, 13]. 3. Clustering

Consider a set of points M = { p , . . . , p n } such that a weight w ij ≥ p i and p j , where i (cid:54) = j and i, j ∈ [ n ]. The aim of data clustering is to partition thisset of points into classes such that elements of the same class are more alike, while elements ofdiﬀerent classes are less alike. The weight w ij measures aﬃnity or similarity in this context ;the larger the value, the larger their aﬃnity. For general data sets, a large number of similaritymeasures appear in the literature, and their quality depends on the context in which they areused [14].Here, points are cities and weights are used to measure whether cities are highly intercon-nected or not. Several such measures will be considered here. For instance, a simple way tomeasure interconnection between cities is by simply considering the number of people whocommute between them. This leads to the following matrix, where the weight α ij betweencities i and j is deﬁned through the matrices T and E deﬁned in Section 2: A = ( α ij ) , where α ij = t ij + t ji + e ij + e ji . (1)This choice is justiﬁed because, in the context of aﬃnity measures, it is natural to considersymmetric weights.In order to understand how the interconnection between cities was aﬀected during the pan-demic, we also considered weights given by matrices A ( t ), for t ∈ { , . . . , N } . To incorporateself-isolation data, we ﬁrst adjust the rate of self-isolation in each city i in terms of the averageisolation β i , which was calculated using the same cell-phone data for i in the entire month ofFebruary, before the implementation of measures to contain the dissemination of COVID-19.We deﬁne β ∗ i ( t ) = max (cid:26) β i ( t ) − β i − β i , (cid:27) , (2)so that β ∗ i ( t ) = 0 if rates of self-isolation are below average (this actually does not happen inour data set after the ﬁrst week); otherwise, it is a linear interpolation where 0 correspondsto the average rate and 1 to full isolation. We are now ready to deﬁne A ( t ) = ( a ij ) , where a ij = (cid:0) − β ∗ j ( t ) (cid:1) ( t ij + e ij ) + (1 − β ∗ i ( t )) ( t ji + e ji ) . (3)The deﬁnition of A ( t ) reﬂects our belief that it is conceptually more relevant to considerinformation about isolation in city j to assess the impact on commuting from i to j thaninformation about isolation in city i . On the other hand, we understand that the nature ofour isolation index, which estimates the number of individuals who never leave their home,could suggest using indices in city i to limit commutes from i to j . This has been testedand would have negligible impact on the results. Moreover, it would have been natural toignore data related to student mobility as of the third week because all in-person school anduniversity operations had already been suspended by then. However, this turned out to makeclustering more unstable, perhaps because entries associated with smaller or more remotecities became too small. We use the word aﬃnity because, in some of our examples, cities that are more diﬀerent in some aspectswill have more aﬃnity to each other.

PECTRAL CLUSTERING APPROACH TO COVID-19 5

Normalized cut.

Before introducing the other aﬃnity measures used in this paper, weﬁrst describe the framework of our analysis. We think of the data points as vertices in a graph G = ( V, E ), where we use V = [ n ] for simplicity. The weight between p i and p j is viewed asa weight ω ( ij ) = w ij associated with the edge ij of G (if w ij = 0, we assume that vertices i and j are not adjacent in G ).In general terms, a clustering problem in G = ( V, E ) consists of ﬁnding a partition V = V ∪ · · · ∪ V k of the vertex set into a pre-determined number k of classes, where the partitionoptimizes some measure of quality of the partition. There are several such measures proposedin the literature [14]. In this paper, we work with the the normalized cut introduced by Shi andMalik in [15]. To deﬁne it, some additional notation is needed. Given U ⊂ V , let U = V \ U bethe complement of U with respect to V . Moreover, for S, T ⊂ V , let W ( S, T ) = (cid:80) i ∈ S,j ∈ T w ij .For a partition P = { V , . . . , V k } of V , letNCut( P ) = k (cid:88) (cid:96) =1 Cut( V (cid:96) , V (cid:96) )Vol( V (cid:96) ) , (4)where Cut( P ) = 12 k (cid:88) (cid:96) =1 W ( V (cid:96) , V (cid:96) ) and Vol( V (cid:96) ) = (cid:88) i ∈ V (cid:96) (cid:88) j ∈ V w ij . Finding an optimal partition in this context is to ﬁnd a partition P of V that minimizesthe value of NCut( P ). Note that this objective function takes both aims of clustering intoaccount. On the one hand, the only weights that appear on numerators of terms in (4) areweights of edges whose endpoints lie in distinct classes, so that minimizing the function favorspartitions such that vertices in diﬀerent classes have small weight. On the other hand, thedenominator of the term associated with V i in (4) counts the weight of each edge with bothendpoints in V i twice, while the other edges incident with V i are only counted once. So,increasing the weight of internal edges would decrease the value of the cut. Unfortunately,the authors of [15] showed that the problem of ﬁnding such a partition is NP-hard for generalgraphs (even if k = 2).However, this problem is well-suited for a spectral approach. The following deﬁnitions arewell known in spectral graph theory. The weighted adjacency matrix W = ( w ij ) of a graph G = ( V, E ) with weight function ω is deﬁned by w ij = ω ( ij ) if ij ∈ E and w ij = 0 otherwise.The degree of a vertex i ∈ V in G is given by d i = (cid:80) nj =1 w ij . The diagonal matrix with thedegrees d , . . . , d n on the diagonal is called the degree matrix D .At this point, we could simply present the procedure that we use to cluster our data;however, we believe that explaining how it works, and its connection to linear algebra, clariﬁesour approach. The following computation are performed in detail in [14]. Given a positiveintegers n and k and a partition P = { V , . . . , V k } of the vertex set of a graph G = ( V, E ) withweight function ω and no isolated vertices, consider the matrix X P ∈ R n × k whose columnsare the k vectors x ( (cid:96) ) = ( x (cid:96) ) , x (cid:96) ) , . . . , x n ( (cid:96) ) ) T with coordinates x ( (cid:96) ) j = (cid:40) Vol ( V (cid:96) ) if j ∈ V (cid:96) ;0 otherwise,for all (cid:96) ∈ { , . . . , k } and j ∈ { , . . . , n } . Using the Laplacian matrix L = D − W associatedwith the weighted graph G , it turns out thatNCut( P ) = k (cid:88) (cid:96) =1 Cut( V (cid:96) , V (cid:96) )Vol( V (cid:96) ) = k (cid:88) (cid:96) =1 x ( (cid:96) ) T L x ( (cid:96) ) = tr( X T P LX P ) . L. E. ALLEM, C. HOPPEN, M. M. MARZO, AND L. S. SIBEMBERG

Writing Y P = D − X P we obtain thatNCut( P ) = tr( Y T P ( D − LD − ) Y P ) = tr( Y T P L Y P ) , where L = D − LD − is the normalized Laplacian matrix associated with G . Thereforeﬁnding an optimal partition in the sense of [15] is equivalent to ﬁnding a partition P thatminimizes ncut k ( G ) = min Q NCut( Q ) = min Q tr( Y T Q L Y Q ) , where Q ranges over all partitions of V into exactly k sets. It is easy to see that Y T Q Y Q = I ,and by the Rayleigh-Ritz Theorem [16, Theorem 13], we havemin Y ∈ R n × k ,Y T Y = I tr( Y T L Y ) = λ + · · · + λ k , (5)where 0 = λ ≤ · · · ≤ λ k are the k smallest eigenvalues of the symmetric matrix L . Moreover,equality is achieved by matrices Y whose columns are orthogonal unit vectors generated byeigenvectors associated with the eigenvalues λ , . . . , λ k . As we have discussed, each partitionof V into k parts is associated with a matrix Y as above. However, there are matrices Y thatare feasible for (5), but are not of the form Y Q for any partition Q . This leads to the thefollowing inequality: ncut relk ( G ) = min Y ∈ R n × k ,Y T Y = I tr( Y T L Y ) ≤ ncut k ( G ) . (6)As in usual LP-relaxations, the left-hand side of the inequality (6) may be computed eﬃcientlyand gives a lower bound on the value of an optimal partition. On the other hand, there isno obvious connection between a matrix Y that achieves ncut relk ( G ) in (6) (i.e. a matrixconstructed from eigenvectors associated with the smallest eigenvalues of L ) and a partitioninto k parts P such that NCut( P ) is close to ncut k ( G ). The following heuristic tries toﬁnd good quality partitions. To turn the matrix Y into a partition P , it uses a well-knowngeometric method, known as K -means [17]. One way of assessing the quality of the outputpartition P is by looking at the ratio NCut( P ) / ncut relk ( G ) ≥

1. If this ratio is exactly 1, thepartition P is optimal. Otherwise, it gives an upper bound on the actual value of the ratio ρ ( P ) = NCut( P ) / ncut k ( G ) (however, we should mention that the gap between ncut relk ( G )and ncut k ( G ) may be very large in general). It is important to mention that this heuristichas been quite successful in practice, we refer to [18, 19, 20] for more explanation aboutthese empirical ﬁndings. Moreover, deﬁning the best choice for the number of clusters k isan important problem with no deﬁnitive solution. Parameters that are often used to indicatea good choice of k are the spectral gap (this is the ratio between consecutive eigenvalues,small ratios followed by a larger jump λ k +1 /λ k indicate that k is a good choice) and thecloseness to 0 ( k is the number of eigenvalues below a certain threshold), and the stability ofthe clusters obtained in repeated iterations of the procedure, but other criteria also appear inthe literature [14].We now state the heuristic of Shi and Malik [15], iterated S times. Given an aﬃnity matrix W associated with an n -vertex graph G = ( V, E ), do the following:(1) Let D to be the degree matrix associated with W and construct its normalized Lapla-cian matrix L = D − / LD − / , where L = D − W .(2) Compute vectors x , x , . . . , x k ∈ R n , where each x i is a unit eigenvector associatedwith the eigenvalue λ i , where λ , . . . , λ k are the k smallest eigenvectors of L (countingmultiplicity). In the case of repeated eigenvalues, the eigenvectors associated withthem must be orthogonal. Form the matrix X = [ x x . . . x k ] ∈ R n × k by stackingthese eigenvectors in columns. PECTRAL CLUSTERING APPROACH TO COVID-19 7

Figure 1.

Clustering obtained by spectral clustering with respect to measure A for k = 10 clusters. The largest city in each cluster is marked with a largercircle.(3) Form the matrix Y = ( y ij ) from X = ( x ij ) by renormalizing each of the rows to haveunit length (i.e. y ij = x ij / (cid:80) nj =1 x ij ).(4) for s = 1 , . . . , S do (let Q denote the best partition obtained up to a given step, wherethe starting partition is arbitrary.)(4.1) Treating the i th row of Y as a point y i ∈ R k , split { y , . . . , y n } into k clusters S , . . . , S k via K -means.(4.2) Let P be the partition such that vertex i is assigned to cluster (cid:96) if and only if y i lies in S (cid:96) .(4.3) If NCut( P ) < NCut( Q ), redeﬁne Q as P .(5) Return Q , the partition with minimum Ncut obtained in step (4).When we compute the eigenvalues of the matrix L associated with the aﬃnity measure A deﬁned in (1), we ﬁnd determine that there is a considerable eigenvalue gap between λ and λ , which suggests that k = 10 is a good choice for the number of clusters. When we applythe above procedure to the aﬃnity measure A for S = 500, we obtain the partition given inFigure 1, whose gap is NCut( P ) / ncut relk ( G ) ≈ . P is at most 32 . k ( G ), but the gap is typically much smaller (and may possiblybe optimal). Regarding stability, this partition P has been obtained 183 times out of the 500iterations of the procedure.Even though the data used to obtain this partition is not related to the pandemic, if welook at the evolution of the number of cases during this time period, the connection betweenthe clusters and the spread of the disease is perceptible. For instance, Figure 2 shows how realdata about the disease evolved in cities of two neighboring clusters, marked with black and redin the ﬁgure, in four diﬀerent weeks (material for all regions is available in the ancillary ﬁles).The red cluster consists of four cities: Gramado, Canela, Nova Petr´opolis and S˜ao Franciscode Paula (which are part of a nationally renowned touristic area) and the other cluster iscentered in Caxias do Sul, the second largest city in the state by population. The ﬁrst casesappear in the cluster of Caxias do Sul quite early, and they quickly spread to cities in thesame cluster, which has a relatively large number of active cases by May 2, the ﬁrst weekdisplayed in the ﬁgure (and the ninth week with cases in the state). On the other hand, thereare no recorded cases in the cluster of Gramado until the week of May 9. After the ﬁrst case isidentiﬁed, all the other cities in the cluster record cases in a span of three weeks. This behavior L. E. ALLEM, C. HOPPEN, M. M. MARZO, AND L. S. SIBEMBERG

Figure 2.

Clockwise, starting from the top left. Cases on the weeks fromMay 2-8, May 9-15, May 16-22 and from May 30 to June 5. Gray stands forno active cases, green for cases in the interval [1 , , A , one could adjust themeasure to incorporate rates of isolation, using a diﬀerent measure A ( t ) (deﬁned in (3)) ateach time t . As it turns out, the diﬀerence in the partition obtained when performing theabove clustering procedure for A ( t ) instead of A is minor. Indeed, the Hamming distanceat any time t between the two partitions was at most 1 (out of 167). This may indicatethat public response to self-isolation has been rather uniform throughout the state. Calls forself-isolation by state and federal authorities, and by the media, seem to have a much strongerimpact than appeals by local authorities.3.2. Aﬃnity based on available ICU beds.

As mentioned in the introduction, the stategovernment introduced regulation to deﬁne when mandatory protocols of social distancingmust be put into eﬀect. Every Saturday, each region, out of a pre-determined set of 20regions (which in turn are sorted into seven macroregions ), is assigned one of four possibleﬂags, yellow, orange, red or black, according to a numerical value based on several indices,which take the number of cases, the number of hospitalizations, the number of deaths andthe availability of ICU beds into account. Once a ﬂag has been assigned, cities in the regionmust adapt to the state regulations associated with that ﬂag (local governments may enforcestricter rules, if desired).Even though this method was met by a very positive reception from health and localauthorities, its implementation quickly led to complaints by cities and economic agents whodeem to have been treated unfairly. For instance, in the ﬁrst weeks using this method, it waspointed out that several cities where no cases had ever been recorded had been assigned orangeor red ﬂags (owing to an outbreak or a shortage of ICU beds in their region, for instance).

PECTRAL CLUSTERING APPROACH TO COVID-19 9

Moreover, since the index for a region incorporates data from the macroregion to which itbelongs, a high risk ﬂag can be assigned to a region in which no city had a substantial numberof cases. In some instances, this has led to loud public outcry and threats of disobedience bylocal authorities, which in turn led to negotiations and adjustments. At the present moment,regulations include automatic ‘ﬂag reductions’ for cities that meet certain criteria. This isthe case for cities where no new cases have been recorded in the past two weeks, for instance.Moreover, each city can appeal to a board after its weekly classiﬁcation has been revealed.When this happens, the city is allowed to present new data, such as an expansion on the totalnumber of ICU beds.Given this reality, we aim to look at the partition into regions under a more ﬂexible per-spective. To this end, we propose aﬃnity measures that consider the availability of ICU beds(updating it weekly) and consider what happens when we re-organize the regions on a weeklybasis. For a city i , let u i ( t ) be the average total number of ICU beds in i at time t , and let (cid:96) i ( t ) be the average number of ICU beds that are available (i.e. unoccupied and ready toaccommodate new patients) in i at time t . The ﬁrst measure is ‘static’, as it only considersthe total number of ICU beds at the beginning of the recording process: C = ( γ ij ) , where γ ij = | u i (0) − u j (0) | d ij + c , (7)where u i (0) denotes the total number of ICU beds in city i on May 2 and d ij is the distancebetween i and j given by matrix D (see Section 2) and the constant c = 10 avoids the eﬀectof very small distances.The intuition behind this deﬁnition is that the health systems of two cities i and j thatare geographically close, but whose health infrastructure is very diﬀerent, would tend to beinterconnected (with the city with small health capability transferring patients to the other),while two cities whose health capacities are equivalent would be less dependent on each other.The second measure is ‘dynamic’, not only updating the number of ICU beds, but alsoconsidering the actual number of ICU beds that are ready to accommodate new patients: C ( t ) = ( c ij ( t )) , where c ij = max (cid:26) η i ( t ) (cid:96) j ( t ) − (cid:96) i ( t ) d ij + c , η j ( t ) (cid:96) i ( t ) − (cid:96) j ( t ) d ij + c (cid:27) , (8)where c = 10 and η i ( t ) = u i ( t ) − (cid:96) i ( t )+1 u i ( t )+1 . This quantity η i ( t ) may be viewed as a rate of urgencyfor city i to look for ICU beds outside its borders. This rate is 1 if it does not have any ICUbeds or if all its ICU beds are occupied, and decreases as the percentage of available bedsgets larger. The term (cid:96) i ( t ) − (cid:96) j ( t ) accounts for the fact that a city j with more available ICUbeds than i would be desirable to receive patients rom i . In other words, the aﬃnity measureof interconnection between i and j goes up from the perspective of i if its health system isstrained and j is geographically close and has more available beds.Applying the above spectral partitioning procedure with the aﬃnity measure deﬁned in (8)for k = 20 (the number chosen by the state) and S = 500 produces the partitions in Figure 3in two consecutive weeks. In this particular case, 26 cities switched regions from one week tothe next.Our aim using this measure is to assess whether allowing the regions to be re-organizedon a weekly basis can bring meaningful additional information to one of the features ofthe state ﬂag system, namely that the state consists of 20 pre-determined regions, whichare in turn combined into seven macroregions. To this end, we shall ﬁrst give a generaldescription of the way in which the state assigns ﬂags to regions, (available at https://distanciamentocontrolado.rs.gov.br/ , a spreadsheet is also available in the ancillaryﬁles). The ﬂag is based on 11 individual indices, classiﬁed in two main types, disease propa-gation or healthcare capacity, and computed in one of three levels (within each region, within Figure 3.

Partitions obtained using the aﬃnity measure (8) using data fromthe weeks from June 13-19 (left) and June 20-26 (right).each macroregion or statewide). For each index, four intervals have been deﬁned, and a ﬂagis assigned to the index according to the interval it belongs to. The ﬂag actually assigned tothe region is obtained from a weighted average of the ﬂags assigned to the diﬀerent indices.Here, we have devised an alternative formula (available in the spreadsheet in the ancillaryﬁles), which uses exactly the same indices wherever possible. An important diﬀerence is thatwe do not use any indices related with macroregions, as it would not make sense to assigna city to a new region every week, while at the same time assume that cities lie in a ﬁxedmacroregion. Unfortunately, some of the data available for macroregions was not publiclyavailable, or was less reliable, for the cities themselves. Because of this, we transferred theweight of these indices to other indices measuring similar features for cities. To assess whatthe dynamic clustering obtained using our matrices might say about the clustering deﬁnedby the state, we proceed in two steps. The ﬁrst compares the ﬂags assigned to the 20 pre-determined regions using the state’s formula and this new formula. Figure 4 does this for theweeks from June 13-19 and June 20-26. (A comparison for all seven weeks under considerationmay be found in the ancillary ﬁles).The second step is to split the state in 20 regions on a weekly basis (which we call the dynamic partition ) and compare the ﬂags assigned by the new formula to these regions andto the 20 pre-determined regions. This is done in Figure 5, which suggests that more citiesthat can be assigned a lower-risk ﬂag in the dynamic partition. On the week from June 13-19,26 cities had a lower-risk ﬂag for the dynamic regions, 13 cities had a higher-risk ﬂag forthe dynamic region, and 128 cities remained the same. On the week from June 20-26, thesenumbers where 50, 2 and 115, respectively.In short, our computations suggest that the ﬂags assigned with the new formula are relatedwith the ﬂags from the original formula. Moreover, ﬂags assigned in the second step show thatpartitioning on a weekly basis allows for more ﬂexibility than considering the same partitionthroughout. This sends the message that it might be possible to devise a formula that takesmore, or more reliable, information into account (as in the government’s formula), and thatallows regions to be adapted on a weekly basis.We should emphasize that we do not believe that the new formula presented here is betterthan the formula used by the state government, quite the opposite, but simulations suggestthat the new formula was able to capture the main features of the government’s formula usingthe data available to us. We are also not suggesting that our regions are better than the pre-determined regions deﬁned by the state government. The government’s regions are heavilybased on the way in which the public health system is organized and on the reality that manycities of small and average size do not have hospitals, particularly hospitals equipped with

PECTRAL CLUSTERING APPROACH TO COVID-19 11

Figure 4.

Flags assigned by the state formula (left) and by our formula (right)on the weeks from June 13-19 (top) and June 20-26 (bottom).

Figure 5.

Flags assigned by our formula to the state’s regions (left) and tothe dynamic regions (right) on the weeks from June 13-19 (top) and June 20-26(bottom). Flags are given by colors, regions are labelled 0 to 19.ICU beds to treat complex cases, and therefore need to establish formal agreements with oneor more cities to which their patients can be transferred. Because of this, periodic changes tothe regions would require that some cities direct their patients to hospitals in diﬀerent citiesevery week, which is certainly not easy to implement. However, in exceptional situations suchas a pandemic, this might be justiﬁed, and accepted by local governments, given the beneﬁtof more leeway to cities that are not as directly aﬀected by the disease. SEIR model

Using the data collected in the previous sections, it is possible deﬁne a discrete model forthe spread of the disease, which gives a qualitative description of the evolution of the diseaseand helps us understand the eﬀect of diﬀerent parameters associated with the disease andof measures to contain it. We consider a discrete susceptible-exposed-infectious-recovered(SEIR) epidemiological model, where the spread of the disease is represented by a recurrencerelation indexed by a discrete parameter t ∈ { , , . . . } . This recurrence relations are givethe expected behavior of a stochastic process deﬁned on a digraph G = ( V, E, ω ), where eachvertex represents a city and the weight w ij of an arc ij represents the number of commutersfrom i to j on an average day. Each city i ∈ V has population P i and, for all t ≥

0, thevector x i ( t ) = ( S i ( t ) , E i ( t ) , I i ( t ) , R i ( t )) stands for the number of susceptible , exposed , infected and removed inhabitants of city i at time t , respectively. As usual, all susceptible individualsare assumed to be prone to contracting the disease. Exposed individuals have been infected,but are not yet contagious, while infected individuals are capable of infecting susceptibleindividuals. Removed individuals either recovered (and became immune from the disease) orpassed away. Initially, each city i is assigned a vector x i (0) with the number of individuals ineach class at the start of the process.We now describe how our system evolves. As in the work of Silva, Pereira and Nonato [21],we assume that most of the movement between cities may be attributed to daily commutes.On day t , part of the population of each city leaves their city to work or study, and comes backin the evening. This leads to a row stochastic matrix M = ( p ij ) of order n , where n = | V | .We interpret p ij as the relative ﬂow from city i to city j , given by p ij = ( t ij + e ij ) /P i ,where t ij and e ij come from the matrices T and E from Section 2. This corresponds to theproportion of the population of i that regularly commutes to j . The diagonal entries are givenby p ii = 1 − (cid:80) j (cid:54) = i p ij .As a consequence, during the day each city j has an eﬀective population of P (cid:48) j = (cid:88) i ∈ V p ij P i . We shall also assume that all classes of individuals are equally likely to move between cities,so that the eﬀective number of individuals of each class in city j on day t is given by S (cid:48) j ( t ) = (cid:88) i ∈ V p ij ( t ) S i ( t ) , E (cid:48) j ( t ) = (cid:88) i ∈ V p ij ( t ) E i ( t ) ,I (cid:48) j ( t ) = (cid:88) i ∈ V p ij ( t ) I i ( t ) , R (cid:48) j ( t ) = (cid:88) i ∈ V p ij ( t ) R i ( t ) . In our model, infections only occur during the day (at the city where each individual spendsthe day). Each such individual is assumed to meet L other individuals in a normal day.However, assuming that a susceptible individual spends the day at city j , the number ofactual meetings on day t is assumed to be L (1 − β ∗ j ( t )) , where β ∗ j ( t ) is the relative rate ofisolation of city j on day t , given in (2). This rate has been assumed under the simplifyingassumption that the probability that, for a meeting to happen, both participants cannot beunder self-isolation, and this would happen with probability (1 − β ∗ j ( t )) if the decision toself-isolate were taken by each individual spending the day in city j , independently of allothers, with probability β ∗ j ( t ). When an individual is infected, we assume that the diseasetakes its course in 14 days, following the phases described in guidelines of the Center forDisease Control and Prevention (CDC) [22]. In the ﬁrst four days [23], incubation occurs, inthe next 5 days, infected individuals are contagious [24] and, in the ﬁnal ﬁve days, individualsare still convalescent, but do not transmit the disease [25]. While infectious, we assume that PECTRAL CLUSTERING APPROACH TO COVID-19 13 the probability that an encounter between a susceptible and an infected individual leads to aninfection is given by τ ( k ), where k is the number of days since the infected individual becamecontagious. As in [24], we assume that τ ( k ) follows a triangular distribution over the ﬁvedays, with a peak on the third day. We have τ ( k ) = 0 for k >

5. The area of the trianglein the deﬁnition of this distribution is given by R /L , to ensure that the basic reproductivenumber (assuming no isolation) is R = 2 .

4, following a situation report by the WHO [26](see also [25]).The recurrence relations become S i ( t + 1) = S i ( t ) − R S i ( t ) (cid:88) j p ij (cid:80) tk = t − (1 − β ∗ j ( t )) I (cid:48) newj ( k ) τ ( k − t + 5) P (cid:48) j ( t ) I newi ( t + 1) = R S i ( t ) (cid:88) j p ij (cid:80) tk = t − (1 − β ∗ j ( t )) I (cid:48) newj ( k ) τ ( k − t + 5) P (cid:48) j ( t ) E i ( t + 1) = E i ( t ) + I newi ( t + 1) − I newi ( t − I i ( t + 1) = I i ( t ) + I newi ( t − − I newi ( t − R i ( t + 1) = R i ( t ) + I newi ( t − I newi ( s ) = 0 for all s ≤ i ∈ [ n ]. Just toillustrate where these equations come from, we discuss the case where an individual in city i does not contract the disease at time t + 1 in the case where there is no social distancing.With probability p ij , the individual moved to city j on day t + 1. The probability that anencounter leads to an infection is R L t (cid:88) k = t − τ ( k − t + 5) I (cid:48) newj ( k ) P (cid:48) j ( t ) , so that the probability that no encounter leads to an infection, given that the individualspends the day in city j , is (cid:32) − R L t (cid:88) k = t − τ ( k − t + 5) I (cid:48) newj ( k ) P (cid:48) j ( t ) (cid:33) L ≈ − R t (cid:88) k = t − τ ( k − t + 5) I (cid:48) newj ( k ) P (cid:48) j ( t ) . Since the same holds for each susceptible individual in i and knowing the proportion ofsusceptible individuals that commute from i to j , the ﬁrst equation in the above system givesthe expected number of susceptible individuals at time t that remain susceptible at time t + 1.We run this model starting with the oﬃcial state data on May 26 to simulate the evolutionof the disease until July 9. The number of new infections in the days before this date areestimated using data from May 20-26, where we assume that new cases correspond to 10%of the number of active cases. The results for the cities of Porto Alegre (the state capitaland largest city), Rio Grande (the largest port in Southern Brazil and the city with highestaverage rate of self-isolation) and Antˆonio Prado (a small city with a population of about13,000, where the average rates of self-isolation are lowest) appear in Figure 6.It is striking to compare it with the behavior of these quantities in the case where there isno social distancing (that is β ∗ j ( t ) = 0 for all j and t ) and with the situation in which the highrates of self-isolation observed on the week between March 21 and 27 had been maintainedafter May 26 ( β i = 0 . x -axis. According to our data,the average rate of isolation in Porto Alegre has been about 44.3% during this time period. Figure 6.

Number of new cases and the overall number of cases (per 100,000inhabitants) in three cities of Rio Grande do Sul. The x axis represents thenumber of days after May 26. Figure 7.

On the top: Number of new cases in Porto Alegre assuming theactual isolation data (left), no isolation (center) and strict isolation (right). Inthe middle: number of active cases in each scenario. On the bottom: overallnumber of cases in each scenario.

Figure 8.

Number of active cases, and the overall number of cases (per100,000 inhabitants) in Porto Alegre on July 9, assuming that the rate ofisolation remains constant, and is given by the value on the x axis. PECTRAL CLUSTERING APPROACH TO COVID-19 15

We note that a simple calculations shows that, while the number of susceptible individuals ismuch higher than the number of individuals in the other classes, isolation would need to beabove 55% to keep the eﬀective reproductive number of the disease is below 1.Even though we have opted to plot the evolution of the disease from May 26 to avoidintrinsic errors coming from initial conditions where the number of infected individuals wasvery small, we should mention that the isolation data were successful at explaining the ups-and-downs in the number of cases in the ﬁrst weeks of the pandemic in Porto Alegre. Accordingto the simulations, the number of cases remained stable between March 31 and the week ofMay 26, and started growing rapidly since then. State data report that the number was stableuntil early June, and grew rapidly since then. (Speciﬁc data are in the ancillary ﬁles.)5.

Concluding remarks

In this paper, we looked at the evolution of the COVID-19 pandemic in Rio Grande doSul using graph theory. We applied spectral clustering techniques on weighted graphs deﬁnedon the set of 167 municipalities in the state with population 10,000 or more, using oﬃcialdata provided by government agencies and isolation data by In Loco. Results related withour ﬁrst measure, based on pendulum migration, reiterate that pendulum migration is animportant means of spreading the disease. Moreover, given the speciﬁc context of the ﬂagsystem in Rio Grande do Sul, our main aﬃnity measure is based on the availability of ICUbeds. Our results suggest that considering a ﬂexible approach to the regions themselves mightbe a useful additional tool in giving more leeway to cities with lower incidence rates, whilekeeping the focus on public safety. However, this is just a ﬁrst step in evaluating the adequacyof such an approach. More data (in a municipal level) would be necessary to perform a directcomparison with the government system. Moreover, state and local authorities would need toassess whether periodic changes to the regions could be readily met with changes in patienttransfer protocols.In a diﬀerent direction, we have observed that disease information from the literature, com-bined with the isolation data, have provided a coherent qualitative description of the evolutionof the pandemic in Rio Grande do Sul using a simple discrete SEIR model. Extrapolatingfrom this, we conclude that isolation measures have been very important in slowing down thespread of the disease. Of course, better results would be achieved with a better understandingof the behavior of the disease and with a model that takes more information into account.

Acknowledgments

The authors are particularly indebted to In Loco for providing data about self-isolationin the cities of Rio Grande do Sul and to Prof. M´arcia Barbian for sharing her data aboutavailability and occupancy of ICU beds in the state. The authors also thank Alisson MatheusFachini Soares, Guilherme Tadewald Varella and Lucas da Rocha Schwengber for helpfuldiscussions leading to this paper.

References [1] “Timeline of WHO’s response to COVID-19.” ” ”, 2020.[2] N. Zhu, D. Zhang, W. Wang, X. Li, B. Yang, J. Song, X. Zhao, B. Huang, W. Shi, R. Lu, P. Niu, F. Zhan,X. Ma, D. Wang, W. Xu, G. Wu, and G. Gao, “A novel coronavirus from patients with pneumonia inChina, 2019,”

New England Journal of Medicine , vol. 382, 01 2020.[3] Y. Wang, J. Tong, Y. Qin, T. Xie, J. Li, J. Li, J. Xiang, Y. Cui, E. S. Higgs, J. Xiang, and Y. He,“Characterization of an asymptomatic cohort of SARS-COV-2 infected individuals outside of Wuhan,China,”

Clinical Infectious Diseases , 05 2020.[4] D. Cyranoski, “What China’s coronavirus response can teach the rest of the world,”

Nature , vol. 579,no. 7800, pp. 479–480, 2020. [5] “Di´ario Oﬁcial da Uni˜ao, Portaria 454, 20 de mar¸co de 2020, Minist´erio da Sa´ude,” March 2020.[6] “Painel Coronav´ırus RS- Secretaria Estadual de Sa´ude,” 2020.[7] “Di´ario Oﬁcial do Estado do Rio Grande do Sul, Decreto 55.128, 19 de mar¸co de 2020,” March 2020.[8] “Di´ario Oﬁcial do Estado do Rio Grande do Sul, Decreto 55.240, 10 de maio de 2020,” May 2020.[9] “Di´ario Oﬁcial da Uni˜ao, Decis˜ao A¸c˜ao Direta de Inconstitucionalidade 6343, 1 de junho de 2020, SupremoTribunal Federal,” June 2020.[10] M. Y. Li, H. L. Smith, and L. Wang, “Global dynamics of an seir epidemic model with vertical transmis-sion,”

SIAM Journal on Applied Mathematics , vol. 62, no. 1, pp. 58–69, 2001.[11] K. Linka, M. Peirlinck, F. Sahli Costabal, and E. Kuhl, “Outbreak dynamics of covid-19 in europe and theeﬀect of travel restrictions,”

Computer Methods in Biomechanics and Biomedical Engineering , pp. 1–8,2020.[12] N. Ajzenman, T. Cavalcanti, and D. Da Mata, “More than words: Leaders’ speech and risky behaviorduring a pandemic,”

SSRN , 04 2020.[13] P. S. Peixoto, D. R. Marcondes, C. M. Peixoto, L. Queiroz, R. Gouveia, A. Delgado, and S. M. Oliva,“Potential dissemination of epidemics based on brazilian mobile geolocation data. part i: Populationdynamics and future spreading of infection in the states of sao paulo and rio de janeiro during the pandemicof covid-19.,” medRxiv , 2020.[14] U. Von Luxburg, “A tutorial on spectral clustering,”

Statistics and computing , vol. 17, no. 4, pp. 395–416,2007.[15] Jianbo Shi and J. Malik, “Normalized cuts and image segmentation,”

IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 22, no. 8, pp. 888–905, 2000.[16] J. Magnus and H. Neudecker,

Matrix Diﬀerential Calculus with Applications in Statistics and Econometrics(Revised Edition) . John Wiley & Sons Ltd, 1999.[17] J. Macqueen, “Some methods for classiﬁcation and analysis of multivariate observations,” in

In 5-thBerkeley Symposium on Mathematical Statistics and Probability , pp. 281–297, 1967.[18] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” in

Proceedingsof the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic ,(Cambridge, MA, USA), pp. 849–856, MIT Press, 2001.[19] B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis, “Diﬀusion maps, spectral clustering andeigenfunctions of fokker-planck operators,” in

Proceedings of the 18th International Conference on NeuralInformation Processing Systems , NIPS’05, (Cambridge, MA, USA), pp. 955–962, MIT Press, 2005.[20] U. von Luxburg, M. Belkin, and O. Bousquet, “Consistency of spectral clustering,”

Ann. Statist. , vol. 36,pp. 555–586, 04 2008.[21] P. J. S. Silva, T. Pereira, and L. G. Nonato, “Robot dance: a city-wise automatic control of covid-19mitigation levels,” medRxiv , 2020.[22] B. Adhikari, L. Fischer, B. Greening, S. Jeon, E. Kahn, G. Kang, G. Rainisch, M. Meltzer, and M. Washing-ton, “COVID19Surge: a manual to assist state and local public health oﬃcials and hospital administratorsin estimating the impact of a novel coronavirus pandemic on hospital surge capacity,” 2020.[23] S. Ma, J. Zhang, M. Zeng, Q. Yun, W. Guo, Y. Zheng, S. Zhao, M. H. Wang, and Z. Yang, “Epidemiologicalparameters of coronavirus disease 2019: a pooled analysis of publicly reported individual data of 1155 casesfrom seven countries,” medRxiv , 2020.[24] C. M. Peak, R. Kahn, Y. H. Grad, L. M. Childs, R. Li, M. Lipsitch, and C. O. Buckee, “Comparativeimpact of individual quarantine vs. active monitoring of contacts for the mitigation of covid-19: a modellingstudy,” medRxiv , 2020.[25] “Report 9: Impact of non-pharmaceutical interventions to reduce COVID-19 mortality and healthcaredemand, Imperial College COVID-19 Response Team,” 2020.[26] “Coronavirus disease 2019 (COVID-19) Situation Report 46, World Health Organization,” 2020.

E-mail address : [email protected] E-mail address : [email protected] E-mail address : [email protected] E-mail address : [email protected]@ufrgs.br