[PDF] Clustering Future Scenarios Based on Predicted Range Maps

Abstract

Predictions of biodiversity trajectories under climate change are crucial in order to act effectively in maintaining the diversity of species. In many ecological applications, future predictions are made under various global warming scenarios as described by a range of different climate models. The outputs of these various predictions call for a reliable interpretation. We propose a interpretable and flexible two step methodology to measure the similarity between predicted species range maps and cluster the future scenario predictions utilizing a spectral clustering technique. We find that clustering based on ecological impact (predicted species range maps) is mainly driven by the amount of warming. We contrast this with clustering based only on predicted climate features, which is driven mainly by climate models. The differences between these clusterings illustrate that it is crucial to incorporate ecological information to understand the relevant differences between climate models. The findings of this work can be used to better synthesize forecasts of biodiversity loss under the wide spectrum of results that emerge when considering potential future biodiversity loss.

Full PDF

CClustering Future Scenarios Based on Predicted Range Maps

Davidow, Matthew Merow, Cory Che-Castaldo, Judy Schafer, Toryn L. J.D¨uker, Marie-Christine Corcoran, Derek, Matteson, David S.November 2020

Summary

1. Predictions of biodiversity trajectories under climate change are crucial in order to act eﬀectively inmaintaining the diversity of species. In many ecological applications, future predictions are madeunder various global warming scenarios as described by a range of diﬀerent climate models. Theoutputs of these various predictions call for a reliable interpretation.2. We propose a interpretable and ﬂexible two step methodology to measure the similarity betweenpredicted species range maps and cluster the future scenario predictions utilizing a spectral clus-tering technique.3. We ﬁnd that clustering based on ecological impact (predicted species range maps) is mainly drivenby the amount of warming. We contrast this with clustering based only on predicted climatefeatures, which is driven mainly by climate models.4. The diﬀerences between these clusterings illustrate that it is crucial to incorporate ecological infor-mation to understand the relevant diﬀerences between climate models. The ﬁndings of this workcan be used to better synthesize forecasts of biodiversity loss under the wide spectrum of resultsthat emerge when considering potential future biodiversity loss.

Key-words: biodiversity; clustering; similarity measures; future scenarios; animal species; climatechange.

Predicting ecological responses to a rapidly changing climate is essential to enact eﬀective conservationpolicies (Parry et al., 2007; Hannah et al., 2013, 2020). Future range maps of various species can be predictedbased on predicted future patterns of climate (Burrows et al., 2014; Jones and Cheung, 2015; Molinos et al.,2016). However, predicting future patterns of climate is a challenging task due to the many sources of1 a r X i v : . [ q - b i o . Q M ] J a n ncertainty, and a plethora of climate predictions are possible that are consistent with various unknownfactors (e.g. diﬀering human policy responses, climate models). We are interested in the analysis of comparingthe outputs of these various predictions and deducing common patterns among predictions.We make use of climate predictions from the Coupled Model Intercomparison Project 6 (CMIP6). Thisproject provides multiple climate predictions which vary by the underlying global climate model (GCM)and what representative concentration pathway (RCP) is used. An RCP is a greenhouse gas concentrationtrajectory (Eyring et al., 2016). Four such trajectories are included in CMIP6, varying in the quantity ofgreenhouse gas emissions to capture the uncertainty of future emissions. We refer to the RCP trajectory ofleast emission as “optimistic”, and refer to the most pessimistic trajectory as the “extreme” scenario. Inthis work we refer to a scenario as a (GCM, RCP) pair, these scenarios represent uncertainty both in theevolution of climate, and in future greenhouse gas emissions.In order to interpret the diﬀerences among climate predictions, we propose a methodology to cluster thescenarios. We create such a clustering both from the climate features, and from predicted range maps for1101 mammalian species. These predicted range maps are based on the predicted climate features, the detailsof these range maps are discussed in Section 2. These clusterings reveal the important diﬀerences betweenclimate models, such as whether the global climate model or the RCP diﬀerences are the most salient features.The clustering will also group the scenarios into interpretable collections, such as an “optimistic” collectionof scenarios of lesser ecological impact, and an “extreme” collection of scenarios of greater ecological impact.Clustering the scenarios based on these predicted range maps is a diﬃcult task due to the discrete andhigh dimensional nature of the range maps. Although Principal Component Analysis (PCA) is a commontool to analyze data, it has two signiﬁcant drawbacks in this setting. One is that PCA implicitly measuresdistance between range maps in Euclidean space, whereas we present a ﬂexible alternative, allowing for anysimilarity or distance between range maps. Secondly, it is not clear how to incorporate information frommultiple species into a PCA-based approach.Currently, a popular metric of change in species richness due to climate change is climate velocity (Bur-rows et al., 2014; Jones and Cheung, 2015; Molinos et al., 2016). However, these climate velocities relyheavily on climate information while ignoring ecological data. Such ecological information, such as species’exposure to climate conditions not found in their current niches, are important factors to predict the speciesfuture ranges (Trisos et al., 2020). We incorporate such ecological data by clustering based on predictedrange maps based on models of historical ranges.We will contrast this ecologically driven clustering with a climate driven clustering to emphasise theimportance of incorporating ecological information. This climate driven clustering will use only predictedclimate features taken from CMIP6, such as annual mean temperature and annual mean precipitation. By2omparing the climate driven clustering and the ecological one obtained from the range map similarities, wedemonstrate the need for incorporating ecological information. Clustering based only on climate features isa similar but distinct task to the ecological based clustering; the climate features are continuous whereas theecological range maps are binary. Previously these scenarios have been clustered using the climate features byaveraging these climate features over global regions Giorgi and Francisco (2000); Cannon (2015). However,spatially averaging the climate features this way loses signiﬁcant information about the spatial variability ofthe features.We propose a ﬂexible two step approach to cluster both predicted species range maps and predictedclimate features. The ﬁrst step is to measure the pairwise similarity between the prediction maps. Thesecond step is to use the pairwise similarities for spectral clustering, whose implementation is discussed inSection 2. This two-step procedure is highly ﬂexible, any similarity measure can be used between rangemaps.We propose to measure the similarity between range maps as the cosine similarity of the range maps.This choice allows for considerable ﬂexibility; the modeller can weight absences and presences separately, andgive certain sets of cells higher importance. In addition the cosine similarity is interpretable and comparableacross species; it is always in the range [ − , We ﬁrst describe prior modelling work to predict the future range maps given climate information. Thenwe discuss our methodology for clustering the scenarios; ﬁrst based on on predicted range maps, and sec-ondly based on predicted climate features. These clusterings reveal interpretable relationships between thescenarios, and the contrast of these two clusterings demonstrate the importance of incorporating ecologicalinformation.

A set of 9 GCMs are used from CMIP6 (Eyring et al., 2016; Stouﬀer et al., 2017), which span four dif-ferent levels of RCPS. Five climate features are chosen from the set of 19 commonly used in WordCLIM(Fick and Hijmans, 2017), which were chosen to minimize correlations. These ﬁve climate features are an-nual mean temperature, temperature seasonality (standard deviation of temperature), annual precipitation,precipitation seasonality (standard deviation), and precipitation in the driest quarter of a year.The ﬁve chosen climate features are used to ﬁt a Poisson point process model to explain the present3ay range maps (Merow et al., 2013; Elith et al., 2011). The present day occurrences of 1101 mammals areobtained from Miller (2020), whose range maps contain at 10 unique presence cells on a 10km grid. ThisPoisson point process model was used to predict future spatial occurrence based on these ﬁve predictedclimate values. Binary maps are obtained from these abundance maps by thresholding based on the 5thpercentile of predicted values at training presences. This approach is used to make predictions on all 1101mammals using the 34 diﬀerent sets of predicted climate features. We make use of these predicted rangemaps to measure the similarities between the 34 diﬀerent climate scenarios.

We present a novel methodology to cluster scenarios based on the binary presence maps such as those shownin Figure 1b. We now discuss the procedure for achieving this scenario clustering. This clustering illuminatesthe important similarities between scenarios, for instance if GCM or RCP is the main diﬀerentiating featureamong scenarios, and additionally gives insight into the variation among scenarios.

For notational simplicity we focus the presentation of the methodology on a single species. Furthermore, wesuppose our maps are on a n r by n c grid represented as ( r, c ) : r = 1 , . . . , n r , c = 1 , . . . , n c . We denote B s asthe binary presence map of this species according to scenario s , to be more precise B s ( r, c ) = 1 if the cell at( r, c ) represents a presence, and B s ( r, c ) = 0 otherwise. We recognize it is important to consider how theserange maps diﬀer from the present day as will be discussed further. For this reason we denote by P thepresent day map. Similarly as for B ( r, c ), we write P ( r, c ) = 1 if the cell at ( r, c ) is presently occupied, and0 otherwise. We let the binary map A (same size as P and each B s ) denote the background set, where validabsences may occur. For instance, the set A can take the value 1 only when there is land, or alternativelyonly on the same continent or regional areas as P . We have chosen to take A as the union of the originalmap and all scenarios, thus A ( r, c ) = 1 if P ( r, c ) = 1, or there is a scenario s such that B s ( r, c ) = 1. In order to interpret the diﬀerences between range map predictions, a similarity or distance measure is calledfor between pairs of range maps. The quantiﬁcation of the similarity or distance between range maps allowsus to cluster the scenarios, which is a desirably interpretable result.There exists a plethora of alternative measures to quantify the similarity or distance between thesepresence maps (Visser and De Nijs, 2006; Wilson, 2011; Hagen, 2002; Hill et al., 2013; Gritti et al., 2013).4he Hellinger distance and Kullback-Leibler divergence rely on a probabilistic interpretation, and thus it isdiﬃcult to incorporate absences into these measures (Wilson, 2011). The Kappa statistic (Hagen, 2002) canbe used to measure similarities between categorical maps, however it is not as clear how to weight importantcells, such as novel absences. In addition, the Kappa statistic does not directly incorporate the relativefrequencies of absences and presences. For instance if two maps both predict all presences except a singleabsence cell, these two maps will have a negative Kappa statistic if their one absence cell diﬀers. However, forour purposes we would like to consider such a pair of maps very similar. The Wasserstein distance (or EarthMover’s Distance) (Peyr´e et al., 2019; Kranstauber et al., 2017) is an attractive alternative which capturesthe idea of movement of a species. However, the Wasserstein distance does not model the disappearanceof regions (as opposed to the shift/movement of regions), and computing the distance via optimal discretetransport (a linear program of quadratic size in the number of present cells) proved too slow or even infeasible(no solution found to the linear program) in all but the smallest maps (the maps with the smallest numberof presence cells).We chose the cosine similarity function, which has several attractive features. It is ﬂexible; the modellercan weight absences and presences separately, and the modeller has the ﬂexibility to give certain sets ofcells higher importance. The cosine similarity is interpretable and comparable across species as we describebelow.We compute the pairwise similarity between range maps as the cosine similarity of their weighted rangemaps. The motivation and details for weighting the range maps is discussed in Section 2.2.3. In short,we construct a weighted map W s based on the binary range map B s and present day map, P . For a pairof scenarios s, s (cid:48) with corresponding binary range maps B s , B s (cid:48) and present day map P , we measure theirsimilarity as the cosine similarity of their weighted maps, CS( W s , W s (cid:48) ). We suppose that a map G on an n r by n c grid can be interpreted as a vector with length n r × n c . Then the cosine similarity between twovectors W s , W s (cid:48) can be written asCS( W s , W s (cid:48) ) = W s · W s (cid:48) (cid:107) W s (cid:107) (cid:107) W s (cid:48) (cid:107) = n r (cid:80) r =1 n c (cid:80) c =1 W s ( r, c ) W s (cid:48) ( r, c ) (cid:107) W s (cid:107) (cid:107) W s (cid:48) (cid:107) , with (cid:107) W s (cid:107) = (cid:16) n r (cid:88) r =1 n c (cid:88) c =1 W s ( r, c ) (cid:17) / . (1) Weighting the cells of the binary maps is required to quantitatively measure the similarity between rangemaps. For each scenario s , we denote the corresponding weighted map W s . The value of W s ( r, c ) is 0 if A ( r, c ) = 0, but when A ( r, c ) = 1 we choose one of four values for W s ( r, c ) depending on whether the cell5epresents a presence/absence for the present day, and a presence/absence for the scenario. These fourchoices are shown in Table 1. P ( r, c ) = 1 P ( r, c ) = 0 B ( r, c ) = 1 p keep p new B ( r, c ) = 0 a new a keep Table 1: Cell Weighting Values for W ( r, c ) given A ( r, c ) = 1.The choice of weightings in Table 1 is dependent on the use of cosine similarity. For instance using cosinesimilarity but representing presences with the value 1 and absences with 0 is inappropriate because any twomaps will have positive similarity, and this choice will also have the property that absences and presencesare counted diﬀerently in the denominator of the cosine similarity.These ﬁrst four cases (reading row by row) represent unchanged (kept) presences, new presences, newabsences, and unchanged absences respectively, where “keep” and “new” are with respect to the presentday distribution, P . For example, a cell corresponding to p keep is a cell that is present in P , and is keptpresent according to scenario B s . Presences are given positive weights, absences negative weights, and wechoose | a new | > | a keep | , to emphasize those cells whose ecological suitability for this species is vanishing.These “lost” cells corresponding to novel absences are particularly important; they represent cells wherethe climate is changing so drastically that the species cannot continue to live there. In addition these cellsshould be weighted higher because we are more conﬁdent about their prediction; by deﬁnition they existwithin the training data’s presences. By contrast the cells corresponding to p new represent regions where thespecies are predicted to move to, however such predictions are more uncertain as the movement of a speciesis complex and not directly taken into account by the Poisson point process model. Thus we also make thechoice | p keep | > | p new | . We make the choice | a new | = | p keep | = 1 , | a keep | = | p new | = 0 . P ( r, c ) = 1). However, we emphasize the ﬂexibility of our model, alternative choices can be made dependingon the modeler’s goals. A visualization of this weighting scheme is shown in Figure 1.The computation and resultant cosine similarity using these weighted maps are highly interpretable;when two scenarios agree on the presence or absence of a cell, this cell has a positive contribution tothe cosine similarity, whereas the cell has a negative contribution when the two scenarios disagree. Theresulting similarity is always in the range [ − ,

1] (for any choice of weightings), and takes the value 1when the two maps are identical and − | a new | = | p keep | , | a keep | = | p new | , which was argued for previously.6 a) Present Day Range Map (b) Scenario Range Map (c) Overlap of Present Day and Scenario Figure 1: Illustration for weighted presences and absences. Presences are shown in blue, absences in red/pink.The overlap with the present day in shown in darker regions which represent more signiﬁcant cells, whichare dark blue kept presences, and dark red new absences, which represent “lost” cells. A strength of ourproposed methodology is the ﬂexibility to weight these cells diﬀerently

For all species we compute the pairwise scenario similarity matrix by computing the cosine similarity, eq.(1) on each pair of scenarios. That is for each species m , we construct the n s -by- n s matrix S m ( s, s (cid:48) ) = CS ( W ms , W ms (cid:48) ), where n s is the number of scenarios and W ms is the weighting map for species m on scenario s . Spectral clustering is well suited to cluster the scenarios based on this similarity measure (Von Luxburg,2007).The properties of spectral clustering are understood from a graph theory perspective. The similaritymatrix S m can be thought of as an undirected graph whose nodes are the scenarios and edge weightsbetween a pair of scenarios s and s (cid:48) given by S m ( s, s (cid:48) ). Spectral clustering has best performance on sparsegraphs, thus for the dense similarity matrix S m , the ﬁrst step is to sparsify it by means of taking the k -nearest neighbor graph, that is retaining an edge from s to s (cid:48) only if s (cid:48) is within the top k neighbors of s (i.e.it is within the top k nodes of maximal similarity to s ). However this would lead to a directed graph as thisdeﬁnition of nearest neighbor is not symmetric. Thus we retain the undirected edge from s to s (cid:48) if either s (cid:48) is within the top k neighbors of s , or vice-versa. The retained edges are still weighted by the similarity oftheir endpoints. We denote by E this matrix of retained weights.The main tool of spectral clustering is the graph Laplacian L = D − E , where D is a diagonal matrixof node degree, D ii = (cid:80) n s j =1 E ( i, j ). We use the random-walk normalized graph Laplacian, L rw = D − L as suggested in Von Luxburg (2007). Both L and L rw have several nice properties, they are positive semi-deﬁnite and the multiplicity of their zero eigenvalue is the number of connected components of the graph.7or most real-world graphs including the sparsiﬁed cosine similarity matrix E , the graph is fully connectedand thus the number of connected components of this graph is one. When this is the case the eigenvectorscorresponding to the smallest non-zero eigenvalues can be used as an embedding. This can be thought of froma perturbation perspective, if the graph had true clusters of connected components than these eigenvectorswould have the same span as vectors representing clustering membership indicators. However in real worldgraphs there are a few “noisy” edges between clusters, thus these ﬁrst few eigenvectors instead are nearpiecewise on those indicators.Drawing from this insight, the space associated with the eigenvectors of L rw , which we denote as columnsof a matrix U , is used as “spectral embedding”. We use the second and third eigenvectors of U (correspondingto the second and third smallest eigenvalue) as the spectral embedding , that is scenario s is represented by( U s, , U s, ). We choose a simple clustering algorithm, single-linkage clustering, to cluster in this embeddedspace as it performs well according to the Davies-Bouldin criterion, a common clustering criterion for howwell separated clusters are (Davies and Bouldin, 1979).This clustering can be performed for a single species using only S m , for example this is performed forfour diﬀerent species, shown in Figure 6. These results will be discussed in the next section. One way tocombine information across animals is to average the similarity matrices across all n m = 1101 animals, as in S = (1 /n m ) (cid:80) n m a =1 S m . Performing spectral clustering using this averaged similarity matrix, S , is shown inFigure 5b. One can get an overall sense of the changes predicted in the range maps from Figure 2, which shows howthe diversity of mammals is spread spatially throughout the globe. We see from both Figures 2c and 2d thatsigniﬁcant diversity losses are predicted around the equator in South America and Africa, whereas there issome diversity increases further up north, consistent with previous ﬁndings (Chen et al., 2011).

One way to qualitatively measure the performance of the clustering is to look at the spatial overlap ofpresences for each scenario. That is we deﬁne the frequency map F m based on the scenario maps: F m ( r, c ) := (cid:80) n s s =1 B ms ( r, c ) For example the overlap of all 34 scenarios for the African cheetah is shown in Figure 8a.8 a) Present Day Mammal Variety (b) Predicted Mammal Variety(c) Net Change (d) Fraction Change Figure 2: Visualization of the mammal richness in the dataset over space. White cells correspond to locationswith no recorded presences. Our ﬁndings are consistent with Chen et al. (2011) which ﬁnds that speciesare moving poleward and towards higher elevations, there is loss around the equator and some increase indiversity towards the northern pole. 9 .3.2 Principal Component Analysis

Principal component analysis is a great tool to understand the main directions of variation in data. We usePCA to visualize the correlations of presence cells. This can be performed for a single species by consideringeach scenario as an observation, with n r × n c binary features corresponding to the vectorized (ﬂattened)binary range map. We discuss the process of clustering using only the ﬁve climate features that were used to predict the rangemaps. This clustering when contrasted with the ecologically driven clustering demonstrates the importanceof incorporating ecological information. Each of these ﬁve globally distributed features is predicted across the34 scenarios. In order to directly compare to our ecologically based clustering, we utilize the climate featuresto perform a clustering in a similar fashion to the ecologically based clustering. However, for continuousdata, the cosine similarity is not appropriate, as two maps that are scaled versions of each other would beconsidered very similar to each other, which is not desirable. For instance if one map predicted two degreeswarmer everywhere than another, the cosine similarity between these two maps would very high, which isnot desirable as these two maps represent signiﬁcantly diﬀerent predictions. Instead we use the L distancebetween maps, which will eﬀectively use both the diﬀerence between the means of maps, and diﬀerences inthe spatial variation. To incorporate all ﬁve climate features, each feature is normalized before applying the L distance. We denote T fs ( r, c ) the value of feature f according to scenario s at location ( r, c ). The featurescaled L distance between a pair of scenarios s and s (cid:48) using all ﬁve features is given by: H ( s, s (cid:48) ) = (cid:88) f =1 n r (cid:88) r =1 n c (cid:88) c =1 [( T fs ( r, c ) − T fs (cid:48) ( r, c )) /σ f ] . Where σ f is the standard deviation of feature f measured across all locations and scenarios, that is wecalculate the mean of each feature across all locations and scenarios, σ f = n s (cid:88) s =1 n r (cid:88) r =1 n c (cid:88) c =1 [ T fs ( r, c ) − µ f ] with µ f = ( n s · n r · n c ) − n s (cid:88) s =1 n r (cid:88) r =1 n c (cid:88) c =1 T fs ( r, c ) . (2)Spectral clustering requires a similarity matrix (instead of a dissimilarity or distance matrix), thus wetransform the pairwise L distances into similarities with any monotonically negative function, such as x → /x . An important step in spectral clustering is to sparsify the similarity matrix by keeping onlythe largest similarities (discussed above), and so any monotonically negative function will produce the samespectral embedding and clusters Von Luxburg (2007). Thus we use the similarity matrix L ( s, s (cid:48) ) := 1 /H ( s, s (cid:48) )10 a) Annual Mean Temperature (b) Temperature Seasonality(c) Annual Precipitation (d) Precipitation Seasonality Figure 3: Spectral clustering of scenarios using individual climate features (one of the ﬁve is not showndue to space). The annual temperature clustering is most similar to the ecological clustering of Figure5b, clustering mainly by RCP. The other features mainly cluster mainly by GCM. This is why the climatedriven clustering of Figure 5a is driven mainly by GCM, most of the features are. The ecological clusteringis important to discern which of these features are most ecologically relevant, these plots show that annualtemperature contains the most ecologically relevant diﬀerences between climate models.as the similarity matrix for spectral clustering, which is shown in Figure 5b.

We summarize some overall patterns of the projected range maps by counting both presences that diﬀer fromthe present day range map, and absences that diﬀer from the present day range maps (cells correspondingto a new and p new respectively). The fraction of each of these cell types is averaged over species for eachscenario. The resulting average of fraction of a new and p new type cells is shown in Figure 4, which showsthat these novel absences and novel presences are correlated. In this section we discuss the scenario clusterings, with emphasis on the diﬀerence between the climate basedand ecological based clustering. We then demonstrate that we have detected meaningful clusters, with rangemaps within clusters similar to each other, but diﬀerent than range maps in other clusters.11 .1 Range Shifts

By comparing the fraction of cells corresponding to a new and p new we get a sense of the changes of the rangemaps compared to the present day range map. This is shown in Figure 4, which demonstrates these rangemaps tend to predict range shifts, as opposed to range expansions or contractions. We conclude this becausethe number of new presences and new absences grow together, which would occur as range shifts, instead ofsay absences growing as presences shrink, which would be indicative of an overall range decrease.On average there are signiﬁcantly more novel absences (average 38% (28% s.d.) of current range) thannovel presences (average 25% (26% s.d.)), which implies that the species ranges are shrinking (average 13%net loss, (31% s.d.)) due to the changing climate.Figure 4: Shows that predicted range maps tend to be shifted, that is lost and new cells grow together, withlosses being larger than gains, as the scenarios are above the 45 degree dashed black line. The spectral clustering results using the species speciﬁc similarity matrix is shown in Figure 6 for four species,illustrating the main types of patterns observed. We see an interesting mix of RCP and GCM dependence.It appears that RCP is a major driving factor that separates these clusters; in most of these clusteringsthe far left (“optimistic”) cluster contains mainly scenarios with low RCP, and the far right (“extreme”)cluster only scenarios with high RCP. However, the clustering using the slender treeshew (

Tupaia gracilis )is driven mainly by GCM. This clustering mainly by GCM was found in many species (30%), although themost common trend is RCP dependence (70% of species). The variability among animals is further evidencesupporting the importance of the climate ecology relationship: the most important diﬀerence between climatemodels (GCM or RCP) varies depending on the individual species’ response to the climate.The spectral cluster results from using the similarity matrix averaged over all 1101 mammals is shownin Figure 5b. We see in the bottom cluster of Figure 5b an interesting mix of varying RCP and GCM. Thissuggests that RCP alone does not account for the variation, the GCM is also important. However, there12 a) Climate Feature Based (b) Predicted Range Map Based

Figure 5: In the climate based clustering shown on the left, the clusters are mainly determined by GCM.This clustering is signiﬁcantly diﬀerent from the ecological based one shown on the right, which demonstratesthe importance of incorporating ecological information. B: Scenario Embedding and Clustering. The clustersare mainly driven by RCP. However, there is still some relationships among the GCM, for instance the red’ca’ GCM predicts more extreme outcomes, and at each level of RCP the green black and yellow (cc,ce,ip)climate models are together. This clustering is signiﬁcantly diﬀerent from the climate based one (Panel A),which demonstrates the importance of incorporating ecological information.are still variabilities among the GCMs. For example, the “ca” climate model appears twice in the far rightcluster.Instead of averaging over all species, we also performed clustering for only the species most at risk,deﬁned by those species whose fraction of area lost is among the highest 10%. This loss in area can be usedto approximate a loss in population abundance using the techniques in He (2012); Che-Castaldo and Neel(2016). Performing spectral clustering by using the average of similarity matrices of this subset of animalsat risk in shown in Figure 7. This clustering puts a stronger emphasis on RCP, that is, the animals most atrisk are more sensitive to RCP.

One way to qualitatively measure the performance of the clustering is to look at the spatial overlap ofpresences for each scenario. For example the overlap of all 34 scenarios is shown in Figure 8a. We seethat the cluster associated with the smallest RCP scenarios (“optimistic” scenarios) accounts for most ofthe presences in the discrepant regions, whereas the cluster associated with the highest RCP (“extreme”scenarios) accounts for many of the absences. This demonstrates that we have discovered meaningful clusters;there is agreement within clusters but disagreements across clusters.We visualize the correlations of presence cells over scenarios using principal component analysis. Thiscan be performed for a single species by considering each scenario as an observation, with n r × n c binaryfeatures corresponding to the vectorized (ﬂattened) binary range map. Performing PCA in this way is shown13 a) Acinonyx jubatus (cheetah) (b)

Zapus princeps (c)

Procyon lotor (d)

Tupaia gracilis

Figure 6: Spectral clustering of scenarios using individual species. We see mainly grouping by RCP forthe ﬁrst three, but a starkly diﬀerently mainly GCM driven clustering for the treeshew. This variety wasfound among animals, most species driven clusterings are strongly connected with RCP, but some are moreconnected with GCM. These diﬀerences between species further demonstrates that ecological information isimportant to interpret the diﬀerences between scenarios.14igure 7: Scenario Embedding and Clustering using only animals in riskiest quantiles. Clustering basedonly these species most at risks puts an even higher emphasis on RCP than the clustering in Figure 5b. Thisfurther demonstrates the importance of accounting for ecological information, diﬀerent subsets of ecologicalpopulations emphasis RCP even more strongly.for the cheetah in Figure 8b. The mix of positive and negative coordinates illustrate the fact that the mostextreme scenarios do not only “lose” certain cells, but also predict novel presences.

We see that the climate based clustering is signiﬁcantly diﬀerent than the ecological one, the clusteringis driven mainly by GCM instead of by RCP in the ecological based clustering. This is evidence for theimportance of considering the speciﬁc climate niche occupied by a species in relation to how those conditionsare projected to change. Although these same ﬁve features are used to predict the ecological range maps,these predicted range maps paint a diﬀerent picture of the scenario clustering because of how the animalsare inﬂuenced by the climate features. In fact, we can get a sense of feature importance by clustering basedon individual climate features, shown in Figure 3. We see that clustering using only the annual temperaturecreates the most similar clustering to the ecologically driven one, suggesting that annual temperature is themost important feature of these ﬁve. 15 a) All Scenarios (b) First PC(c) Optimistic Scenarios (d) Exteme Scenarios

Figure 8: Sum of presences over scenarios. A meaningful clustering should have strong similarities withincluster, and diﬀerences across cluster. The circled regions showed that indeed we have discovered meaningfulclusters; there is agreement within cluster in these regions, but diﬀerences across clusters.16

Conclusion

We have proposed a novel methodology to cluster scenarios based on ecological range maps. The presentedapproach is interpretable, ﬂexible, and fast. We have demonstrated diﬀerent patterns of clustering dependingon which ]subsets of species are included. For instance, the animals most at risk group many of the higherRCP scenarios together. The diﬀerences between the climate based clustering and ecological based cluster-ing highlights the importance of considering ecological response; the interaction of climate and ecology isessential to understand the ecologocially most important diﬀerences between future scenario predictions. Aninteresting direction to explore further is to uncover subsets of animals that respond diﬀerently than others.For instance it may be the case that rodents tend to fare worse under the “bc” climate model, compared toother mammals. A similar area of future research is to determine why some species like the slender treeshewcluster more by GCM instead of the more pattern of RCP. Are these species less sensitive to temperaturechanges? Another extension is to consider how to combine information across animals in a more holisticmanner. For instance, Dong et al. (2013) presents a methodology to cluster according to many graphs, whichcould be applied in our use case to the scenario graphs from each species.

References

Martin L Parry, Osvaldo Canziani, Jean Palutikof, Paul Van der Linden, and Clair Hanson.

Climate change2007-impacts, adaptation and vulnerability: Working group II contribution to the fourth assessment reportof the IPCC , volume 4. Cambridge University Press, 2007.Lee Hannah, Makihiko Ikegami, David G Hole, Changwan Seo, Stuart HM Butchart, A Townsend Peterson,and Patrick R Roehrdanz. Global climate change adaptation priorities for biodiversity and food security.

PLoS one , 8(8):e72590, 2013.Lee Hannah, Patrick R Roehrdanz, Pablo A Marquet, Brian J Enquist, Guy Midgley, Wendy Foden, Jon CLovett, Richard T Corlett, Derek Corcoran, Stuart HM Butchart, et al. 30% land conservation and climateaction reduces tropical extinction risk by more than 50%.

Ecography , 2020.Michael T Burrows, David S Schoeman, Anthony J Richardson, Jorge Garcia Molinos, Ary Hoﬀmann, Lau-ren B Buckley, Pippa J Moore, Christopher J Brown, John F Bruno, Carlos M Duarte, et al. Geographicallimits to species-range shifts are suggested by climate velocity.

Nature , 507(7493):492–495, 2014.Miranda C Jones and William WL Cheung. Multi-model ensemble projections of climate change eﬀects onglobal marine biodiversity.

ICES Journal of Marine Science , 72(3):741–752, 2015.17orge Garc´ıa Molinos, Benjamin S Halpern, David S Schoeman, Christopher J Brown, Wolfgang Kiessling,Pippa J Moore, John M Pandolﬁ, Elvira S Poloczanska, Anthony J Richardson, and Michael T Burrows.Climate velocity and the future global redistribution of marine biodiversity.

Nature Climate Change , 6(1):83–88, 2016.Veronika Eyring, Sandrine Bony, Gerald A Meehl, Catherine A Senior, Bjorn Stevens, Ronald J Stouﬀer,and Karl E Taylor. Overview of the coupled model intercomparison project phase 6 (CMIP6) experimentaldesign and organization.

Geoscientiﬁc Model Development , 9(5):1937–1958, 2016.Christopher H Trisos, Cory Merow, and Alex L Pigot. The projected timing of abrupt ecological disruptionfrom climate change.

Nature , 580(7804):496–501, 2020.Filippo Giorgi and Raquel Francisco. Uncertainties in regional climate change prediction: a regional analysisof ensemble simulations with the hadcm2 coupled aogcm.

Climate Dynamics , 16(2-3):169–182, 2000.Alex J Cannon. Selecting gcm scenarios that span the range of changes in a multimodel ensemble: applicationto cmip5 climate extremes indices.

Journal of Climate , 28(3):1260–1267, 2015.Ronald J Stouﬀer, Veronika Eyring, Gerald A Meehl, Sandrine Bony, Cath Senior, Bjorn Stevens, andKE Taylor. CMIP5 scientiﬁc gaps and recommendations for CMIP6.

Bulletin of the American Meteoro-logical Society , 98(1):95–105, 2017.Stephen E Fick and Robert J Hijmans. Worldclim 2: new 1-km spatial resolution climate surfaces for globalland areas.

International journal of climatology , 37(12):4302–4315, 2017.Cory Merow, Matthew J Smith, and John A Silander Jr. A practical guide to maxent for modeling species’distributions: what it does, and why inputs and settings matter.

Ecography , 36(10):1058–1069, 2013.Jane Elith, Steven J Phillips, Trevor Hastie, Miroslav Dud´ık, Yung En Chee, and Colin J Yates. A statisticalexplanation of maxent for ecologists.

Diversity and distributions , 17(1):43–57, 2011.Joe Miller. Gbif home page, 2020. URL .Hans Visser and T De Nijs. The map comparison kit.

Environmental Modelling & Software , 21(3):346–358,2006.Peter D Wilson. Distance-based methods for the analysis of maps produced by species distribution models.

Methods in Ecology and Evolution , 2(6):623–633, 2011.Alex Hagen. Multi-method assessment of map similarity. In

Proceedings of the 5th AGILE Conference onGeographic Information Science , pages 171–182. Universitat de les Illes Balears Palma, Spain, 2002.18ark O Hill, Colin A Harrower, and Christopher D Preston. Spherical k-means clustering is good forinterpreting multivariate species occurrence data.

Methods in Ecology and Evolution , 4(6):542–551, 2013.Emmanuel S Gritti, Anne Duputie, Francois Massol, and Isabelle Chuine. Estimating consensus and as-sociated uncertainty between inherently diﬀerent species distribution models.

Methods in Ecology andEvolution , 4(5):442–452, 2013.Gabriel Peyr´e, Marco Cuturi, et al. Computational optimal transport: With applications to data science.

Foundations and Trends ® in Machine Learning , 11(5-6):355–607, 2019.Bart Kranstauber, Marco Smolla, and Kamran Saﬁ. Similarity in spatial utilization distributions measuredby the earth mover’s distance. Methods in Ecology and Evolution , 8(2):155–160, 2017.Ulrike Von Luxburg. A tutorial on spectral clustering.

Statistics and computing , 17(4):395–416, 2007.David L Davies and Donald W Bouldin. A cluster separation measure.

IEEE transactions on pattern analysisand machine intelligence , 2(2):224–227, 1979.I-Ching Chen, Jane K Hill, Ralf Ohlem¨uller, David B Roy, and Chris D Thomas. Rapid range shifts ofspecies associated with high levels of climate warming.

Science , 333(6045):1024–1026, 2011.Fangliang He. Area-based assessment of extinction risk.

Ecology , 93(5):974–980, 2012.Judy P Che-Castaldo and Maile C Neel. Species-level persistence probabilities for recovery and conservationstatus assessment.

Conservation Biology , 30(6):1297–1306, 2016.Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, and Nikolai Nefedov. Clustering on multi-layergraphs via subspace analysis on Grassmann manifolds.