[PDF] Geospatial distributions reflect rates of evolution of features of language

Abstract

Different structural features of human language change at different rates and thus exhibit different temporal stabilities. Existing methods of linguistic stability estimation depend upon the prior genealogical classification of the world's languages into language families; these methods result in unreliable stability estimates for features which are sensitive to horizontal transfer between families and whenever data are aggregated from families of divergent time depths. To overcome these problems, we describe a method of stability estimation without family classifications, based on mathematical modelling and the analysis of contemporary geospatial distributions of linguistic features. Regressing the estimates produced by our model against those of a genealogical method, we report broad agreement but also important differences. In particular, we show that our approach is not liable to some of the false positives and false negatives incurred by the genealogical method. Our results suggest that the historical evolution of a linguistic feature leaves a footprint in its global geospatial distribution, and that rates of evolution can be recovered from these distributions by treating language dynamics as a spatially extended stochastic process.

Full PDF

GGeospatial distributions reﬂect rates of evolutionof features of language

Henri Kauhanen *a,b , Deepthi Gopal a,c , Tobias Galla d , and Ricardo Bermúdez-Otero a a Linguistics and English Language, School of Arts, Languages and Cultures, The University ofManchester, Oxford Road, Manchester, M13 9PL, UK b Department of Linguistics, University of Konstanz, Universitätsstraße 10, 78464 Konstanz,Germany c Department of Theoretical and Applied Linguistics, University of Cambridge, Sidgwick Avenue,Cambridge, CB3 9DA, UK d Theoretical Physics, School of Physics and Astronomy, The University of Manchester, OxfordRoad, Manchester, M13 9PL, UK

January 31, 2018

Abstract

Different structural features of human language change at different rates and thusexhibit different temporal stabilities. Existing methods of linguistic stability estimationdepend upon the prior genealogical classiﬁcation of the world’s languages into lan-guage families; these methods result in unreliable stability estimates for features whichare sensitive to horizontal transfer between families and whenever data are aggregatedfrom families of divergent time depths. To overcome these problems, we describe amethod of stability estimation without family classiﬁcations, based on mathematicalmodelling and the analysis of contemporary geospatial distributions of linguistic fea-tures. Regressing the estimates produced by our model against those of a genealogicalmethod, we report broad agreement but also important differences. In particular, weshow that our approach is not liable to some of the false positives and false negativesincurred by the genealogical method. Our results suggest that the historical evolution ofa linguistic feature leaves a footprint in its global geospatial distribution, and that ratesof evolution can be recovered from these distributions by treating language dynamicsas a spatially extended stochastic process.

Keywords: linguistic typology; stability estimation; complex systems * Corresponding author: [email protected] a r X i v : . [ phy s i c s . s o c - ph ] J a n Language change and linguistic stability

Languages differ from each other in respect of a ﬁnite number of structural features. Thesefeatures determine how individual words are formed, how words are combined into phrasesand sentences, and which sounds and sound sequences are available in a language. Forexample, some languages place the verb before the object (e.g. English

Mary loves John ),while others place the object before the verb (e.g. Turkish

Mary John’u seviyor ). Similarly,some languages employ a marker of deﬁniteness (e.g. English the in the car ) whereas othershave no such device (e.g. Finnish ∅ auto ). The set of all linguistic features deﬁnes thevariation space of human language, with each individual language occupying a speciﬁc pointin this space.Languages are not immutable entities, however, but rather change over time throughcomplex processes of cultural evolution. For instance, a number of languages have under-gone change from object–verb to verb–object order [1–3], and several languages have in-dependently innovated a deﬁnite marker [4]. Languages thus sometimes adopt features theyformerly lacked, or lose features they formerly possessed. These processes of change maybe either vertical (from ancestor to descendant, e.g. from Old English to Modern English)or horizontal (between contemporary geographically close languages in extensive contact,e.g. Norman French and Middle English), loosely paralleling the distinction between inher-itance and horizontal gene transfer in biological evolution [5].One of the most important ﬁndings of modern linguistics is that different linguistic fea-tures evolve at widely disparate rates. This fact calls for an explanation: it is, for example,not predicted by replicator-neutral models of cultural evolution, according to which culturalchange is largely driven by random processes akin to genetic drift and neutral evolutionin biology [6, 7]. Yet substantial evidence exists that certain linguistic features are morestable—harder to lose and harder to innovate—than others [8]. Although the causes of thesedifferences remain poorly understood, efforts have been made to develop techniques for esti-mating the relative stabilities of individual features. Current methods of stability estimation,however, depend upon assumptions of genealogy (linguistic relatedness) that incur seriousproblems, resulting in unreliable stability estimates, as we argue in detail below.In this paper, we put forward the proposition that signiﬁcant information about the rateof evolution of a linguistic feature is encoded in its global geospatial distribution. Build-ing on early, qualitative work in linguistic typology [9], we suggest that features which arepresent in geographically scattered samples of languages are unstable (rapidly changing),while features which cluster together in geographical space are stable (slow to change).Thus, the contemporary geospatial distribution of a linguistic feature carries a signal aboutits past. Based on this idea, we develop a technique of linguistic stability estimation fromgeospatial distributions alone, using methods from statistical physics and without recourseto assumptions of genealogical language relatedness. Comparing the predictions of our tech-nique against those of a genealogical method, we report broad agreement but also importantdifferences: speciﬁcally, we show that our method is not liable to some of the false posi-tives and false negatives incurred by the latter. We thus conclude that genealogical methodsnot only incorrectly predict the stability of certain problematic features, but may also beunnecessary—a model that relies solely on directly observable geospatial information faresno worse. More generally, our results demonstrate that signiﬁcant information about theevolution of a linguistic feature is retained in its current geospatial distribution, and thatit is possible to tap into this signal by treating language dynamics as a spatially extendedstochastic process. 2 Problems of genealogical stability estimation

The majority of existing linguistic stability estimation techniques rely on the genealogicalgrouping of the world’s languages into language families such as Indo-European, Uralic andAustronesian [8, 10–13]. These families are established using standard comparative recon-struction techniques [14]. Although implementation details vary, all genealogical stabilityestimation methods work on the basic assumption that stable linguistic features ought to beconserved within language families, unstable features exhibiting within-family variation in-stead. For instance, the basic order of verb and object is verb–object in all existing Romancelanguages [15]; verb–object order may thus be considered stable within the Romance fam-ily. On the other hand, the expression of the subject exhibits considerable variation amongthe Romance languages: many of these languages allow null subjects (e.g. Spanish ∅ voy ‘Igo’), but this option has been partially or fully lost in French, Rhaeto-Romance, Provençal,Northern Italo-Romance and Brazilian Portuguese, which require a pronoun or pronoun-likeelement (e.g. French je vais ‘I go’). The possibility of dropping pronominal subjects is then arelatively unstable feature within the Romance languages. Moreover, aggregating data fromseveral language families suggests that this conclusion holds universally: the basic orderof verb and object tends to be a stable, and the expression of subjects an unstable, featureamong the world’s languages [8].In general, the genealogical classiﬁcation procedures used in modern linguistics arehighly reliable: the family trees they yield are rarely in dispute, except in respect of theﬁne structure of otherwise uncontroversial families, or the very distant putative kinship re-lationships lying beyond the reach of the standard comparative method of linguistic recon-struction [14]. There is, in other words, little uncertainty as to which languages ought tobelong to which language family. The high reliability of linguistic genealogies, however,does not by itself render them an appropriate tool for estimating the stability of linguisticfeatures. We here outline two limitations shared by all genealogical methods: the problemof time depths, and the problem of horizontal transfer.Firstly, no agreement exists on how to estimate the time depths (or, ages) of otherwiseundisputed language families [16–18]. We illustrate this problem with the Uralic and Ger-manic family trees in Fig. 1. In each tree, each branching point corresponds to at least onelinguistic innovation (‘mutation’), whereby the daughter languages diverge from their an-cestor in the speciﬁcation of at least one linguistic feature. The degree of variation amongthe surviving members of a family must then increase with the age of the family (unless laterchanges precisely undo the effects of earlier ones, but this is rare). In our example trees, thisis illustrated by the fact that while all surviving members of the Germanic family employ adeﬁnite marker, only about half of the surviving Uralic languages do so, reﬂecting the greatertime depth of the latter family. The problem for genealogical stability estimation techniquesarises from the fact that no agreed methods exist for establishing the ages of individualfamilies; no way exists for controlling that the families employed in the estimation are ofsimilar time depths. Some studies [8] have attempted to overcome this problem by makingcomparisons across the so-called genera identiﬁed in the World Atlas of Language Struc-tures (WALS) [19]. This genalogical classiﬁcation into genera is intended to yield groupingsof comparable time depths, and indeed WALS treats Germanic as a single genus, whereasUralic (like Indo-European) is seen as comprising several genera. This expedient, however,mitigates but does not solve the problem: the WALS editors themselves describe WALS gen-era as ‘highly tentative’, ‘based on meagre initial impressions’ and consisting of no morethan ‘educated guesses’ [19, 20]. 3 ralicSamoyed Nenets †Kamassian

Finno-UgricFinno-PermicPermic

Udmurt

Komi

Finno-VolgaicFinno-SamicFinnicEstonian

Modern Estonian †Old Written Estonian

Finnish

Sami

Mordvin

Ugric

Hungarian

Mansi

GermanicEast Germanic †Gothic

West Germanic

German

English

Dutch

North Germanic

Icelandic

Swedish definite marker no definite marker| vertical descenthorizontal transfer † extinct language Figure 1.

Some languages of the Uralic and Germanic families with respect to the presenceof a deﬁnite marker (such as English the ).Secondly, genealogical stability estimation techniques necessarily miss out the effect ofhorizontal transfer—the borrowing of a feature from one family to another—on a feature’sstability. For example, Old Written Estonian was in extensive contact with German andborrowed a deﬁnite marker from it [21]; this relationship is visualized as the purple arrow inFig. 1. The problem for genealogical stability estimation methods arises from the fact thatsome linguistic features (e.g. inﬂectional markers) are more resistant to horizontal transferthan others [22], while some (e.g. case systems) are highly vulnerable to simpliﬁcation incontact situations involving large numbers of second-language learners [23]. Combininggenealogical and areal groupings [11] is not a solution, however, as no agreed methods existfor delimiting linguistic areas or for estimating the time depths of areal relationships.

The above criticisms motivate the search for a stability measure that reﬂects the relatednessof languages without presorting them into predeﬁned groupings and can take horizontaltransfer effects into account. Here, we propose such a technique by modelling language dy-namics as a stochastic process on a spatial substrate; this model can be studied in computersimulations and mathematically. While the model dynamics continue indeﬁnitely, the statis-tical properties of the distribution of features over physical space becomes stationary afterthe simulation has been run for a sufﬁciently long time. Using techniques from statisticalphysics, this stationary state can be characterized mathematically. In what follows, we showhow this analytical solution can be utilized to estimate the tendency of individual linguisticfeatures to change based solely on their contemporary geospatial distributions, measuredfrom the WALS atlas [19].Following an early but underexplored proposal of Greenberg’s [9], we treat the evolutionof binary linguistic features as a memoryless stochastic process. The dynamics of each fea-4

F F1 − p I (non-ingress) 1 − p E (non-egress) p I (ingress) p E (egress) Figure 2.

Markov chain dynamics of a single linguistic feature. F: feature present; − F:feature absent.ture are then given by a Markov chain with two parameters, p I and p E (Fig. 2). The formerparameter gives the probability of a language adopting the feature in question; we call it thefeature’s ingress rate . The latter parameter, in turn, gives the probability of a language losingthe feature; we will refer to it as the feature’s egress rate . We assume language communitiesto be distributed on a spatial substrate which, for reasons of mathematical tractability, wetake to be a square lattice, i.e. language communities reside at regularly spaced positionsover physical space. Each community is assumed to be subject to an ingress–egress dynam-ics as described by the Markov chain model of Fig. 2. To account for horizontal effects,we assume the existence of an interaction process between spatially contiguous languagecommunities. This interaction process is based on the so-called voter model [24–28] andoperates as follows. In each interaction event, a ‘target’ community is chosen, and the pres-ence or absence of a feature in a randomly selected neighbouring community is copied intothe target community. All in all, the model is iterated as follows:1. Initialize the lattice in a random state (for each feature F and community C , F ispresent in C with probability 0 . C and a random feature F .3. Execute one of the following steps:3a. with probability q : pick a random lattice neighbour C (cid:48) of C , and set the value forfeature F in C to that in C (cid:48) ; or3b. with probability 1 − q : if F is absent from C , acquire F with probability p I (ingress); if F is present in C , lose F with probability p E (egress).4. Go to 2.Inevitably, this model idealizes away from many of the complexities of real world lan-guage dynamics. What matters for present purposes is that the model should be able tocapture two key elements contributing to language change. First, the evolution of a linguis-tic community over time is subject to both vertical and horizontal effects: the vertical ef-fects arise mainly from the transmission of linguistic knowledge across generations throughlanguage acquisition; the horizontal effects reﬂect not only contact between speakers ofdifferent languages, but also contact between speakers of varieties of the same language[29]. Secondly, both vertical and horizontal effects can result in the faithful transmission ofa feature or in a mutation. For example, contact between speakers of different languagescan result in the simple transfer of a feature (borrowing), but it can also result in mutation,as when the interaction between two languages with different inﬂectional systems leads tothe emergence of a simpliﬁed system that is different from both its predecessors [23]. Insome instances of mutation, horizontal and vertical effects interact in highly intricate ways,5 = = = = = = Figure 3.

In the model, linguistic features evolve on a square lattice with periodic boundaryconditions. The ﬁgure shows three ad hoc illustrations (red: feature present; blue: featureabsent). In each case, the feature frequency is ρ = . σ , deﬁned as the proportion of disagreeing latticeinterfaces (yellow dots), depends on the spatial distribution of the feature. It is low when afeature is present throughout extended domains, and larger when a feature is scattered.e.g. during the formation of creole languages [30]. Our model reﬂects this state of affairs:faithful transmission occurs both vertically, with probability ( − q )( − p I − p E ) , and hor-izontally, with probability q . In turn, the ingress–egress dynamics (probabilities p I and p E )covers processes of mutation and is agnostic about their causes (vertical or horizontal).The statistical properties of the stationary distribution of features at long times dependson the model parameters p I , p E and q (for further details see the SI). For the purposes ofstability estimation, we are interested in two quantities in particular (illustrated in Fig. 3):the frequency, ρ , with which a particular feature is present across the lattice, and the fea-ture’s associated isogloss density . This latter quantity indicates the probability of ﬁnding adialect boundary (an isogloss) between two neighbouring communities such that the featureis found on one side of the boundary but not on the other. We deﬁne this as the proportionof pairs of adjacent lattice cells that differ in the feature value, and denote it by σ ; similarquantities are sometimes also found as the ‘density of reactive interfaces’ in literature oninteracting particle systems [25, 31]. The frequency of a feature in the stationary state isgiven by ρ = p I p I + p E . (1)This can be demonstrated mathematically (see SI), but is also clear intuitively; the higherthe ingress rate p I of a feature is in relation to its egress rate p E , the more prevalent thefeature will be. Obtaining the stationary isogloss density is more intricate mathematically(see again the SI). We ﬁnd σ = H ( τ ) ρ ( − ρ ) (2)with H ( τ ) = π ( + τ ) K (cid:0) + τ (cid:1) − τ (3)and τ = ( − q )( p I + p E ) q . (4)The function K ( · ) denotes the complete elliptic integral of the ﬁrst kind (see SI for fulltechnical details). Thus, from Eq. (2), the stationary-state isogloss density σ is found tobe a parabolic function of the feature’s overall frequency ρ . The height of this parabola is6 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . Feature frequency ρ I s og l o ss den s i t y σ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ●● τ = τ = τ = τ = A . . . . . . Feature frequency ρ I s og l o ss den s i t y σ ● ● ●● ●●● ● ●● ●● ●● ●● ●● ●● B Figure 4. (A) Statistical properties of the model. At long times the state of the lattice ischaracterized by the two quantities feature frequency ρ and isogloss density σ . We showcomputer simulations (markers) and analytical solution (curves) for different values of τ (determined by the combination of p I , p E and q ). Simulation snapshots are shown for twodifferent values of τ . (B) Empirical measurements of feature frequency ρ and isogloss den-sity σ for 35 linguistic features, identiﬁed by their WALS feature IDs (see Methods fordetails).controlled by the parameter τ (Fig. 4A); this parameter gives the relative rate of ingress–egress events (i.e., mutations) over faithful transmission (Eq. 4). For example, a value of τ = − would indicate that faithful copying events between neighbouring communities are10 times more frequent on average than mutations. This suggests that τ can be interpretedas a temperature , measuring the amount of noise in the dynamics: lower values of τ signifya stable feature, higher values indicating instability.We next describe how the above mathematical solution may be used to obtain estimatesof the stability of linguistic features using contemporary geospatial data. Our particular aimis to estimate the temperature parameter τ for individual features, using geospatial informa-tion contained in the WALS database [19], focusing on 35 binary features (see Methods).For each feature, the frequency ρ is given by the proportion of languages possessing thatfeature in the WALS language sample. The isogloss density σ is calculated as follows: foreach language, we ﬁrst establish its nearest geographical neighbour in the relevant languagesample. The empirical isogloss density is then given by the proportion of nearest-neighbourlanguage pairs differing in their values for the feature in question. Our data are summarizedin Fig. 4B, which supplies ρ and σ for each of the 35 features; a lower value of isoglossdensity σ signals a geographically clustered feature, whilst a higher value implies a featurewith a scattered geographical distribution. Fig. 5 illustrates this difference with two features,deﬁnite marker (WALS feature 37A) and order of object and verb (WALS feature 83A).7 igure 5. Empirical geospatial distributions of two linguistic features on the hemispherefrom 30° W to 150° E (red: feature present, blue: feature absent): (A) deﬁnite marker (WALSfeature 37A), (B) object–verb order (WALS feature 83A). Shown are both individual em-pirical data points (languages, as given by WALS coordinates) and a spatial interpolation(inverse distance weighting) on these points. Map projection: Albers equal-area.8 able 1.

The seven least stable and most stable features in our dataset. Ranking is by de-creasing temperature ( τ ), as identiﬁed from WALS data using our model (see text). Feature (WALS ID) Temperature τ verbal person marking (100A) 5 . × − deﬁnite marker (37A) 3 . × − question particle (92A) 3 . × − gender in independent personal pronouns (44A) 2 . × − indeﬁnite marker (38A) 2 . × − lateral consonants (8A) 2 . × − front rounded vowels (11A) 1 . × − ... ...velar nasal (9A) 1 . × − tone (13A) 1 . × − order of adjective and noun is AdjN (87A) 1 . × − order of subject and verb is SV (82A) 4 . × − order of genitive and noun is GenN (86A) 3 . × − order of numeral and noun is NumN (89A) 3 . × − order of object and verb is OV (83A) 1 . × − For a given feature frequency ρ , the isogloss density σ is ﬁxed by the value of H ( τ ) (Eq. 2); this quantity itself is an increasing function of τ (Eq. 3). Since each of our 35empirical features lies on a unique parabola in the space spanned by ρ and σ (Fig. 4),estimating its temperature is now a matter of inverting the function H ( τ ) . For each feature,we measure ρ and σ as described above. From Eq. (2), we then obtain the value of H ( τ ) andinvert this to recover τ . Although the elliptic integral in Eq. (3) cannot be expressed in termsof elementary functions and H ( τ ) thus cannot be inverted analytically, the inversion can beperformed numerically. Using this procedure we obtain an estimate of τ for any feature forwhich empirical measurements of frequency ρ and isogloss density σ exist. Table 1 suppliesthese estimates for the least stable and most stable features in our dataset (for a full listingof τ estimates for the entire dataset, see Table S2 in the SI). The technique proposed by Dediu [12] represents the state of the art in genealogy-based sta-bility estimation. Using a Bayesian phylogenetic algorithm, this method produces a posteriordistribution of rates of evolution for each linguistic feature within a predeﬁned genealogi-cal grouping. Dediu tests two phylogenetic algorithms and draws data from two sources—WALS and Ethnologue [32]—to control for implementation effects. His stability estimatesare then expressed as the additive inverse of the ﬁrst component (PC1) of a principal compo-nent analysis on the stability ranks predicted by each combination of phylogenetic algorithmand dataset (i.e. the higher the PC1 value, the less stable the feature).In Fig. 6A, we consider the 24 features which are both in our list of 35 features and inDediu’s list. Regressing our estimates for τ against Dediu’s PC1 (red regression line), weﬁnd a moderate correlation between the estimates predicted by the two methods (Pearson’s r = . p = . ●●●●●● ●● ●●●●●●● ● ●● τ D ed i u ' s P C − − A l e ss s t ab l e m o r e s t ab l e more stable less stable ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Pruning iteration R edu c t i on o f e rr o r B Figure 6.

Regression of our temperature ( τ ) estimates against Dediu’s PC1. (A) Red line:regression with all 24 data points (Pearson’s r = . p = . r = . p = . × − ). (B) Outlierswere detected by pruning the dataset recursively for those data points that contribute mostto the regression error, quantiﬁed as the sum of squared residuals. This identiﬁed features11A, 107A, 8A, 44A and 57A as outliers (see text).als; Fig. 6B gives the reduction of error at each step of this pruning. The reduction proﬁleprompts us to classify as outliers the ﬁrst ﬁve data points, corresponding to the follow-ing WALS features: 11A (front rounded vowels), 107A (passive construction), 8A (lateralconsonants), 44A (gender in independent personal pronouns) and 57A (possessive afﬁxes).Regressing the pruned dataset (Fig. 6A, black regression line), we ﬁnd a high correlationbetween our τ estimates and Dediu’s PC1 (Pearson’s r = . p = . × − ).We suggest that, rather than representing different views on stability, these outliers arefalse positives and negatives of the genealogical method. We illustrate this with the caseof (the presence or absence of) front rounded vowels (WALS feature 11A), i.e. the vowels/y/ (e.g. Finnish kyy ), / ø / (German schön ), / œ / (French bœuf ) and / Œ / (Danish grøn ). Thisis one of the most stable features in the genealogical analysis * but one of the least stablefeatures in ours (Table 1); we argue that evidence from both language change and languageacquisition supports our position. On the one hand, front rounded vowels are frequently in-novated: historical fronting of the back rounded vowel /u/ to [y] (with or without subsequentphonemicization to /y/) has been documented in a number of languages, including but notlimited to Armenian, Attic-Ionic Greek, French, Frisian, Lithuanian, Old Scandinavian, Os-can, Parachi, Umbrian, West Syriac, Yiddish, Zuberoan Basque, and numerous dialects ofEnglish [33–39]; additionally, front rounded vowels can arise through the inﬂuence of /i/ or/j/ on a neighbouring back rounded vowel [33, 40]. On the other hand, front rounded vowelsare difﬁcult to acquire in situations of language contact: there is experimental evidence thatsecond-language learners whose ﬁrst language lacks these vowels perceive them as moresimilar to back vowels than front vowels [41]. This perceptual assimilation is mirrored inspeech production: productions of /y/ by second-language learners are far less advanced in * Front rounded vowels are the fourth most stable feature (out of 86) in Dediu’s study (Ref. [12], Table S4)and the second most stable (out of 62) in Dediu and Cysouw’s metastudy (Ref. [8], Table 7). auto-ni ‘my car’, auto-si ‘your car’, etc.), and the appearanceof this system of possession can be dated back to Pre-Proto-Uralic by standard reconstructivetechniques [43]. Possessive afﬁxes are also old in the unrelated Turkic language family [44].There is, then, reason to believe that WALS feature 57A on possessive afﬁxes is a falsenegative of the genealogical method, which classiﬁes possessive afﬁxes as one of the leaststable features (Fig. 6).

The challenge of linguistic stability estimation arises, essentially, from having to work witha poor signal. Although evolutionary and anthropological evidence suggests that human lan-guage in its modern form has existed for at least 100,000 years [45, 46], the historical evo-lution of languages is (necessarily) poorly documented. Such documentation only capturesa few thousand years for languages with the best coverage, cannot in principle go beyondthe introduction of the ﬁrst writing systems, and does not exist at all for the majority of theworld’s languages. The rest of the historical evolution of language must be reconstructedbased on available data. In this paper, we have argued that stability estimation methods re-lying on even the most accurate reconstructive methods have their shortcomings: there isno guarantee that the genealogical classiﬁcations assumed in such estimation reﬂect equiv-alent time depths between different language families, and the methods do not control forhorizontal transfer between languages belonging to different families. We have introduceda stability estimation procedure that does without genealogies and infers stability estimatesfrom contemporary geospatial distributions of linguistic features alone. The method is rela-tively easy to implement: all that is needed are measures of feature frequency and isoglossdensity for a large enough sample of languages, and inversion of Eq. (3).We have offered some evidence in support of our method, in the sense that this method isnot liable to some of the false positives and false negatives incurred by genealogical meth-ods. Turning now to its limitations, we note that our approach currently only applies tobinary features, i.e. features which are either present in or absent from a language commu-nity. Most genealogical methods do not suffer from this limitation: Dediu’s [12] procedure,in particular, can be applied to polyvalent as well as binary features. Interestingly, however,Dediu ﬁnds a correlation between estimates generated for polyvalent and binary (or bina-rized) features. This suggests that the resolution at which the values of a linguistic variableare recorded may be a minor issue: after all, any polyvalent classiﬁcation can be reduced toa hierarchy of binary oppositions. Another limitation of our technique, shared by all exist-ing methods, is that it treats the evolution of individual features independently of the rest.Feature interactions are known to exist, however—for example, a language which placesobjects before verbs is far more likely to also place adverbs before verbs, rather than afterthem [47, 48]. It is in principle possible to generalize our method to cater for polyvalentfeatures and feature interactions, by extending the lattice model in the direction of the Axel-11od model of cultural dissemination [49]. It is at the moment unclear, however, whether thebehaviour of such a generalized model can be solved analytically so that stability estimatesmay be derived in the way we have described.While these remain tasks for future research, the present study enables us to concludethat, though hitherto underexploited, geospatial distributions provide one of the best sourcesof evidence on the rates of evolution of features of human language.

Methods

Data preparation

The WALS Online database [19] was downloaded on 18 July 2017 and used as the empiricalbasis for measures of feature frequency ρ and isogloss density σ . Since WALS employs apolyvalent coding for most features, a manual pass was made, retaining only those featuresthat are binary or binarizable on uncontroversial linguistic grounds. Features with fewer than300 languages in their language sample were discarded to ensure statistically robust results.This procedure resulted in 35 binary features (see Table S1 in the SI for a listing, togetherwith full information about our binarization scheme). Nearest neighbours of languages weredetermined by the great-circle distance, calculated from WALS coordinate data using thehaversine formula with the assumption of a perfectly spherical earth. Analysis

To eliminate any possible effect the differing language sample sizes of different WALS fea-tures might have on our statistics, a ﬁxed number of 300 languages was considered for eachfeature, with languages sampled uniformly at random from the feature’s language sample.This procedure was repeated 10,000 times for each feature to generate the bootstrap aver-ages reported in Fig. 4B. For comparison with the genealogical method, Table S1 in the SIto Ref. [12] was consulted and only those features were selected for comparison for whichour binarization schemes agreed; the PC1 values for the intersecting features were thengathered from Table S4. Correlation strengths were measured using the Pearson correlationcoefﬁcient; signiﬁcance was tested with a two-tailed t -test. The analytical solution of thelattice model (Eqs. 1–3) appears in the SI. Data availability

WALS is freely available at http://wals.info; the binarization scheme used to prepare the dataappears in the SI.

Code availability

Computer code for data analysis, stability estimation and numerical simulations may beobtained from the corresponding author.

References [1] Hróarsdóttir, T.

Word order change in Icelandic: from OV to VO (John Benjamins,Amsterdam, 2000). 122] Pintzuk, S. & Taylor, A. The loss of OV order in the history of English. In van Keme-nade, A. & Los, B. (eds.)

The handbook of the history of English , 249–278 (Blackwell,Malden, MA, 2006).[3] Zaring, L. On the nature of OV and VO order in Old French.

Lingua , 1831–1852(2011).[4] De Mulder, W. & Carlier, A. The grammaticalization of deﬁnite articles. In Narrog,H. & Heine, B. (eds.)

The Oxford handbook of grammaticalization , 522–534 (OxfordUniversity Press, Oxford, 2011).[5] Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the natureof bacterial innovation.

Nature , 299–304 (2000).[6] Bentley, R. A., Hahn, M. W. & Shennan, S. J. Random drift and culture change.

Proceedings of the Royal Society of London, Series B , 1443–1450 (2004).[7] Bentley, R. A., Lipo, C. P., Herzog, H. A. & Hahn, M. W. Regular rates of popularculture change reﬂect random copying.

Evolution and Human Behavior , 151–158(2007).[8] Dediu, D. & Cysouw, M. Some structural aspects of language are more stable thanothers: a comparison of seven methods. PLOS One , e55009 (2013).[9] Greenberg, J. H. Diachrony, synchrony, and language universals. In Greenberg, J. H.(ed.) Universals of human language , vol. 1, 61–91 (Stanford University Press, Stan-ford, CA, 1978).[10] Maslova, E. Dinamika tipologiˇceskix raspredelenij i stabil’nost’ jazykovyx tipov.

Vo-prosy jazykoznanija , 3–16 (2004).[11] Parkvall, M. Which parts of language are the most stable? STUF – Language Typologyand Universals , 234–250 (2008).[12] Dediu, D. A Bayesian phylogenetic approach to estimating the stability of linguisticfeatures and the genetic biasing of tone. Proceedings of The Royal Society B ,474–479 (2011).[13] Greenhill, S. J. et al.

Evolutionary dynamics of language systems.

Proceedings of theNational Academy of Sciences , E8822–E8829 (2017).[14] Bowern, C. & Evans, B. (eds.)

The Routledge handbook of historical linguistics (Rout-ledge, Abingdon, 2014).[15] Ledgeway, A. Syntactic and morphosyntactic typology and change. In

The Cambridgehistory of the Romance languages , vol. 1, 382–471 (Cambridge University Press, Cam-bridge, 2010).[16] McMahon, A. & McMahon, R.

Language classiﬁcation by numbers (Oxford Univer-sity Press, Oxford, 2005).[17] Pereltsvaig, A. & Lewis, M. W.

The Indo-European controversy: facts and fallacies inhistorical linguistics (Cambridge University Press, Cambridge, 2015).1318] Bowern, C. The Indo-European controversy and Bayesian phylogenetic methods.

Di-achronica , 421–436 (2017).[19] Dryer, M. S. & Haspelmath, M. (eds.) WALS Online (Max Planck Institute for Evolu-tionary Anthropology, Leipzig, 2013). URL http://wals.info/ .[20] Dryer, M. S. Large linguistic areas and language sampling.

Studies in Language ,257–292 (1989).[21] Metslang, H. Some grammatical innovations in the development of Estonian andFinnish: forced grammaticalization. Linguistica Uralica , 241–256 (2011).[22] Gardani, F., Arkadiev, P. & Amiridze, N. (eds.) Borrowed morphology (De Gruyter,Berlin, 2014).[23] Bentz, C. & Winter, B. Languages with more second language learners tend to losenominal case.

Language Dynamics and Change , 1–27 (2013).[24] Clifford, P. & Sudbury, A. A model for spatial conﬂict. Biometrika , 581–588(1973).[25] Krapivsky, P. L. Kinetics of monomer-monomer surface catalytic reactions. PhysicalReview A , 1067–1072 (1992).[26] Liggett, T. M. Stochastic models of interacting systems. The Annals of Probability ,1–29 (1997).[27] Castellano, C., Fortunato, S. & Loreto, V. Statistical physics of social dynamics. Re-views of Modern Physics , 591–646 (2009).[28] Fernández-Gracia, J., Suchecki, K., Ramasco, J. J., San Miguel, M. & Eguíluz, V. M.Is the voter model a model for voters? Physical Review Letters , 158701 (2014).[29] Labov, W. Transmission and diffusion.

Language , 344–387 (2007).[30] DeGraff, M. Language acquisition in creolization and, thus, language change: someCartesian–Uniformitarian boundary conditions. Language and Linguistics Compass ,888–971 (2009).[31] Krapivsky, P. L., Redner, S. & Ben-Naim, E. A kinetic view of statistical physics (Cambridge University Press, Cambridge, 2010).[32] Lewis, M. P., Simons, G. F. & Fennig, C. D. (eds.)

Ethnologue: languages of the world (SIL International, Dallas, TX, 2016). URL .[33] Ohala, J. J. The listener as a source of sound change. In Masek, C. S., Hendrick, R. A.& Miller, M. F. (eds.)

Papers from the parasession on language and behavior , ChicagoLinguistic Society 17, 178–203 (Chicago Linguistics Society, Chicago, IL, 1981).[34] Labov, W., Yaeger, M. & Steiner, R.

A quantitative study of sound change in progress (U. S. Regional Survey, Philadelphia, PA, 1972).1435] Dressler, W. U. Diachronic puzzles for Natural Phonology. In Bruck, A., Fox, R. A.& La Galy, M. W. (eds.)

Papers from the Parasession on Natural Phonology , 95–102(Chicago Linguistic Society, Chicago, IL, 1974).[36] Lass, R. Vowel shifts, great and otherwise: remarks on Stockwell and Minkova. InKastovsky, D. & Bauer, G. (eds.)

Luick revisited , 395–410 (Gunter Narr, Tübingen,1988).[37] Labov, W.

Principles of linguistic change , vol. 1 (Blackwell, Oxford, 1994).[38] Egurtzegi, A. Phonetically conditioned sound change: contact induced /u/-fronting inZuberoan Basque.

Diachronica , 331–367 (2017).[39] Samuels, B. D. Vocalic shifts in Attic-Ionic Greek. Papers in Historical Phonology ,88–115 (2017).[40] Iverson, G. K. & Salmons, J. C. The ingenerate motivation of sound change. InHickey, R. (ed.) Motives for language change , 199–212 (Cambridge University Press,Cambridge, 2003).[41] Strange, W., Levy, E. S. & Law, F. F., II. Cross-language categorization of French andGerman vowels by naïve American listeners.

The Journal of the Acoustical Society ofAmerica , 1461–1476 (2009).[42] Rochet, B. L. Perception and production of second-language speech sounds by adults.In Strange, W. (ed.)

Speech perception and linguistic experience: issues in cross-language research , 379–410 (York Press, Baltimore, MD, 1995).[43] Janhunen, J. On the structure of Proto-Uralic.

Finnisch-Ugrische Forschungen ,23–42 (1982).[44] Erdal, M. A grammar of Old Turkic (Brill, Leiden, 2004).[45] Bickerton, D. Language evolution: a brief guide for linguists.

Lingua , 510–526(2007).[46] Tallerman, M. Protolanguage. In Tallerman, M. & Gibson, K. R. (eds.)

The Oxfordhandbook of language evolution , 479–491 (Oxford University Press, Oxford, 2012).[47] Greenberg, J. H. Some universals of grammar with particular reference to the order ofmeaningful elements. In Greenberg, J. H. (ed.)

Universals of human language , 73–113(MIT Press, Cambridge, MA, 1963).[48] Dryer, M. S. Signiﬁcant and non-signiﬁcant implicational universals.

Linguistic Ty-pology , 108–128 (2003).[49] Axelrod, R. The dissemination of culture: a model with local convergence and globalpolarization. Journal of Conﬂict Resolution , 203–226 (1997).[50] de Oliveira, M. J. Linear Glauber model. Physical Review E , 066101 (2003).[51] Morita, T. Useful procedure for computing the Lattice Green’s Function—square,tetragonal, and bcc lattices. Journal of Mathematical Physics , 1744–1747 (1971).1552] Fernow, R. C. Principles of magnetostatics (Cambridge University Press, Cambridge,2016).

Author contributions

Designed the study: HK, DG, TG and RB-O. Analysed the data: HK, DG, TG and RB-O.Solved the mathematical model: HK, DG and TG. Wrote the paper: HK, DG, TG and RB-O. Wrote visualization routines: HK and DG. Wrote the data analysis and simulation code:HK.

Author information

The authors declare no conﬂict of interest and no competing ﬁnancial interests. Correspon-dence and requests for materials should be addressed to [email protected].

Acknowledgements

We thank Dan Dediu, Danna Gifford and George Walkden for comments and discussions.HK acknowledges ﬁnancial support from Emil Aaltonen Foundation and The Ella and GeorgEhrnrooth Foundation. 16 upplementary information

A Features consulted

Table S1 provides a listing of the 35 WALS [19] features consulted in this study, togetherwith our scheme for feature binarization. Each WALS feature is a variable of either nominalor ordinal level, whose possible values are recorded in the WALS database using integerlabels. The meanings of these labels are explained at length in Section D, below; Table S1indicates which values of each variable were folded into the ‘feature absent’ category andwhich values to the ‘feature present’ category in our binarization.

B Temperature estimates

Table S2 gives the temperature ( τ ) estimates found by our method for the 35 features. C Analytical solution of lattice model

We will treat the model as a spin system on a two-dimensional regular square lattice with N = L × L sites and periodic boundary conditions (for comparable approaches to the votermodel without an ingress–egress dynamics, see Refs. [25, 31]). Our model is conceptuallysimilar to a voter model with noise, which has been treated with similar methods in Ref. [50].We write s ( x ) ∈ {− , } for the spin at lattice site x = ( x , x ) ; S ( x ) = (cid:104) s ( x ) (cid:105) for the averagespin of x (over realizations of the stochastic process); and m = ∑ x S ( x ) / N for the meanmagnetization over the entire lattice. The feature frequency ρ , or fraction of up-spins in thesystem, is related to m by the identity ρ = ( m + ) /

2. We further write S ( x , y ) = (cid:104) s ( x ) s ( y ) (cid:105) for the pair correlation of s ( x ) and s ( y ) . In summations, x (cid:48) is understood to index the set ofvon Neumann neighbours of site x , i.e. the set { ( x − , x ) , ( x + , x ) , ( x , x − ) , ( x , x + ) } . (S1) C.1 Spin ﬂip probability

Central to our analytical derivations is the spin ﬂip probability, i.e. the probability withwhich the spin at site x changes its state from − + w ( x ) = ( − q ) A ( x ) + qB ( x ) , (S2)where A ( x ) is the contribution of the ingress–egress process and B ( x ) the contribution of thespatial (voter) process. These are A ( x ) = − s ( x ) p I + + s ( x ) p E = [ p I + p E + ( p E − p I ) s ( x )] (S3)and B ( x ) = ∑ x (cid:48) − s ( x ) s ( x (cid:48) ) = (cid:34) − ∑ x (cid:48) s ( x ) s ( x (cid:48) ) (cid:35) , (S4)17 able S1. The 35 binary features considered in this study. The ‘WALS’ column gives theWALS feature mined; values indicated in the ‘abs.’ column were folded into our ‘binaryfeature absent’ value, whilst values indicated in the ‘pres.’ column were folded into our‘binary feature present’ value (see Section D, below). The ﬁnal column gives the size of theWALS language sample (number of languages) for each feature. description WALS abs. pres. N

1. adpositions 48A 1 2–4 3782. deﬁnite marker 37A 4–5 1–3 6203. hand and arm identical 129A 2 1 6174. hand and ﬁnger(s) identical 130A 2 1 5935. front rounded vowels 11A 1 2–4 5626. gender in independent personal pronouns 44A 6 1–5 3787. glottalized consonants 7A 1 2–8 5678. grammatical evidentials 77A 1 2–3 4189. indeﬁnite marker 38A 4–5 1–3 53410. inﬂectional morphology 26A 1 2–6 96911. inﬂectional optative 73A 2 1 31912. lateral consonants 8A 1 2–5 56713. morphological second-person imperative 70A 5 1–4 54714. order of adjective and noun is AdjN 87A 2 1 136615. order of degree word and adjective is DegAdj 91A 2 1 48116. order of genitive and noun is GenN 86A 2 1 124917. order of numeral and noun is NumN 89A 2 1 115318. order of object and verb is OV 83A 2 1 151919. order of subject and verb is SV 82A 2 1 149720. ordinal numerals 53A 1 2–8 32121. passive construction 107A 2 1 37322. plural 33A 9 1–8 106623. possessive afﬁxes 57A 4 1–3 90224. postverbal negative morpheme 143F 4 1–3 132425. preverbal negative morpheme 143E 4 1–3 132426. productive reduplication 27A 3 1–2 36827. question particle 92A 6 1–5 88428. shared encoding of nominal and locational predication 119A 1 2 38629. tense-aspect inﬂection 69A 5 1–4 113130. tone 13A 1 2–3 52731. uvular consonants 6A 1 2–4 56732. velar nasal 9A 3 1–2 46933. verbal person marking 100A 1 2–6 38034. voicing contrast 4A 1 2–4 56735. zero copula for predicate nominals 120A 1 2 386 able S2. Feature frequency ρ , isogloss density σ and estimated temperature τ for the 35 featuresconsidered in this study (to eight signiﬁcant decimals), ordered by increasing τ (from stable to un-stable). feature ρ σ τ

1. order of object and verb is OV 0.50352113 0.12087912 0.000019102. order of numeral and noun is NumN 0.44097222 0.12454212 0.000033063. order of genitive and noun is GenN 0.59420290 0.12204724 0.000033394. order of subject and verb is SV 0.86071429 0.07722008 0.000493255. order of adjective and noun is AdjN 0.29856115 0.14843750 0.001209236. tone 0.41666667 0.17333333 0.001283407. velar nasal 0.50000000 0.18333333 0.001832008. order of degree word and adjective is DegAdj 0.54135338 0.21212121 0.005698709. uvular consonants 0.17000000 0.13000000 0.0097993910. ordinal numerals 0.89666667 0.08666667 0.0118197211. shared encoding of nominal and locational predication 0.30333333 0.21000000 0.0173270212. glottalized consonants 0.27666667 0.21000000 0.0260520313. inﬂectional optative 0.15000000 0.14333333 0.0412061014. inﬂectional morphology 0.85333333 0.14000000 0.0421563015. possessive afﬁxes 0.71333333 0.23666667 0.0478484616. passive construction 0.43333333 0.29333333 0.0594945917. tense-aspect inﬂection 0.86666667 0.13666667 0.0599484318. hand and arm identical 0.37000000 0.28000000 0.0621125419. productive reduplication 0.85000000 0.15333333 0.0627450920. voicing contrast 0.68000000 0.26333333 0.0676991321. adpositions 0.83333333 0.17000000 0.0690850322. postverbal negative morpheme 0.46333333 0.30333333 0.0696120323. hand and ﬁnger(s) identical 0.12000000 0.13000000 0.0703209524. morphological second-person imperative 0.77666667 0.21666667 0.0865553125. preverbal negative morpheme 0.70666667 0.27333333 0.1129289526. plural 0.91000000 0.11000000 0.1281772027. zero copula for predicate nominals 0.45333333 0.33333333 0.1321332528. grammatical evidentials 0.56666667 0.33333333 0.1440176929. front rounded vowels 0.06666667 0.08666667 0.1946834030. lateral consonants 0.83333333 0.20000000 0.2148984231. indeﬁnite marker 0.44666667 0.35666667 0.2295282232. gender in independent personal pronouns 0.33000000 0.32333333 0.2457757633. question particle 0.59666667 0.36666667 0.3433630634. deﬁnite marker 0.60666667 0.36333333 0.3539605835. verbal person marking 0.78000000 0.27666667 0.50590667 x (cid:48) runs over the four nearestneighbours of x . Hence we have w ( x ) = − q [ p I + p E + ( p E − p I ) s ( x )] + q (cid:34) − ∑ x (cid:48) s ( x ) s ( x (cid:48) ) (cid:35) . (S5) C.2 Stationary-state feature frequency ρ The spin at x changes by the amount − s ( x ) with probability N w ( x ) , the prefactor 1 / N representing the probability of x being picked for update. Consequently, the mean spin S ( x ) = (cid:104) s ( x ) (cid:105) evolves as S ( x , t + ∆ t ) − S ( x , t ) = N (cid:104)− w ( x ) s ( x ) (cid:105) , (S6)where ∆ t is the time step associated with each attempted spin ﬂip. Bearing in mind that s ( x ) s ( x ) = S ( x , t + ∆ t ) − S ( x , t ) / N = ( − q ) [ p I − p E − ( p I + p E ) S ( x )] + q (cid:34) − S ( x ) + ∑ x (cid:48) S ( x (cid:48) ) (cid:35) . (S7)Taking the sum over all sites x , one has m ( t + ∆ t ) − m ( t ) / N = ( − q )( p I − p E ) − ( − q )( p I + p E ) N ∑ x S ( x ) −− qN ∑ x S ( x ) + q N ∑ x ∑ x (cid:48) S ( x (cid:48) ) . (S8)Now ∑ x ∑ x (cid:48) S ( x (cid:48) ) = ∑ x S ( x ) , since the LHS is the sum of the four von Neumann neighboursof all lattice sites, so that each site, having four neighbours, gets counted four times. Using m = ∑ x S ( x ) / N , we then have m ( t + ∆ t ) − m ( t ) / N = ( − q ) [ p I − p E − ( p I + p E ) m ] . (S9)With the standard choice ∆ t = / N , and taking the limit N → ∞ (i.e. the continuous-timelimit ∆ t → dmdt = ( − q ) [ p I − p E − ( p I + p E ) m ] . (S10)Hence the mean magnetization in the stationary state ( dm / dt =

0) is m = p I − p E p I + p E . (S11)From this, using ρ = ( m + ) /

2, we ﬁnd ρ = p I p I + p E (S12)for the fraction of up-spins in the stationary state.20 .3 Pair correlation function To compute the pair correlation S ( x , y ) = (cid:104) s ( x ) s ( y ) (cid:105) , we note that s ( x ) s ( y ) changes by theamount − s ( x )( y ) if either x or y ﬂips spin. Assuming x (cid:54) = y , and working directly in thecontinuous-time limit, we have dS ( x , y ) dt = (cid:104)− [ w ( x ) + w ( y )] s ( x ) s ( y ) (cid:105) . (S13)After some algebra we ﬁnd dS ( x , y ) dt = ( − q ) [( p I − p E )[ S ( x ) + S ( y )] − ( p I + p E ) S ( x , y )] ++ q (cid:34) − S ( x , y ) + ∑ x (cid:48) S ( x (cid:48) , y ) + ∑ y (cid:48) S ( x , y (cid:48) ) (cid:35) , (S14)where the summation over y (cid:48) is over the four nearest neighbours of y .We now assume translation invariance and write C ( r ) = C ( x − y ) = S ( x , y ) . Then, for r (cid:54) = , dC ( r ) dt = ( − q ) [( p I − p E )[ S ( x ) + S ( y )] − ( p I + p E ) C ( r )] ++ q (cid:34) − C ( r ) + ∑ x (cid:48) C ( x (cid:48) − y ) + ∑ y (cid:48) C ( x − y (cid:48) ) (cid:35) . (S15)Due to translation invariance the two summations on the RHS coincide, and we have (alwaysrestricting r (cid:54) = ) dC ( r ) dt = ( − q ) [( p I − p E )[ S ( x ) + S ( y )] − ( p I + p E ) C ( r )] + q ∆ C ( r ) , (S16)where ∆ is the lattice Laplace operator ∆ f ( x ) = − f ( x ) + ∑ x (cid:48) f ( x (cid:48) ) . (S17)We note that we always have the boundary condition C ( , t ) = t (self-correlation is1 at all times, as s ( x ) = C.4 Stationary-state isogloss density σ If C ( e , t ) were known, where e is the unit vector ( , ) , the isogloss density could beobtained via the identity σ ( t ) = − C ( e , t ) . (S18)Thus, knowing the limiting ( t → ∞ ) value of C ( e ) would imply the stationary-state isoglossdensity σ .At the stationary state, S ( x ) + S ( y ) = m and dC ( r ) dt =

0. Assuming r (cid:54) = , Eq. (S16) thenimplies 0 = ( − q ) [ ( p I − p E ) m − ( p I + p E ) C ( r )] + q ∆ C ( r ) , (S19)21n other words, 0 = ( − q ) [( p I − p E ) m − ( p I + p E ) C ( r )] + q ∆ C ( r )= ( − q )( p I + p E ) (cid:20) p I − p E p I + p E m − C ( r ) (cid:21) + q ∆ C ( r )= ( − q )( p I + p E ) (cid:2) m − C ( r ) (cid:3) + q ∆ C ( r ) . (S20)In the special case q = C ( r ) = m for all r (cid:54) = . Thus for any two sites x (cid:54) = y , (cid:104) s ( x ) s ( y ) (cid:105) = m , indicating that spinsat different sites are fully uncorrelated. This is what one would expect, as all sites operateindependently when q = q = ∆ C ( r ) = r (cid:54) =

0. We also have C ( ) =

0. This implies C ( r ) = < q <

1. Dividing Eq. (S20) by q we obtain0 = − qq ( p I + p E ) (cid:2) m − C ( r ) (cid:3) + ∆ C ( r )= ( q − )( p I + p E ) q (cid:2) C ( r ) − m (cid:3) + ∆ C ( r )= α C ( r ) − α m + ∆ C ( r ) , (S21)where we write α = ( q − )( p I + p E ) q . (S22)Now let c ( r ) = C ( r ) − m . (S23)We note that ∆ C ( r ) = ∆ c ( r ) . To solve Eq. (S21), it then sufﬁces to solve (for r (cid:54) = ) ∆ c ( r ) + α c ( r ) = c ( ) = − m . (S25)We now ﬁrst focus on the equation ∆ G ( r ) + α G ( r ) = − δ r , , (S26)for all r (including r = ), and where δ x , y is the Kronecker delta.Let G α ( r ) be a solution of Eq. (S26). Then c ( r ) = ( − m ) G α ( r ) G α ( ) (S27)is a solution of Eqs. (S24) and (S25). This can be seen as follows: ﬁrst, from Eq. (S27), c ( ) = ( − m ) G α ( ) G α ( ) = − m , (S28)22o the condition in Eq. (S25) is met. Second, we need to show that Eqs. (S26) and (S27)imply ∆ c ( r ) + α c ( r ) = r (cid:54) = . For r (cid:54) = we have ∆ G ( r ) + α G ( r ) = c ( r ) in Eq. (S27) is proportional to G α ( r ) with a propor-tionality constant ( − m ) / G α ( ) which is independent of r . Given that G α ( r ) fulﬁllsEq. (S29) for r (cid:54) = it is then clear that c ( r ) fulﬁlls ∆ c ( r ) + α c ( r ) = r (cid:54) = , i.e. Eq. (S24).So we are left with the task of ﬁnding a solution of ∆ G ( r ) + α G ( r ) = − δ r , . (S30)Let us write G ( r ) in Fourier representation: G ( r ) = ( π ) (cid:90) π − π (cid:90) π − π dk dk e i k · r (cid:98) G ( k ) , (S31)where k = ( k , k ) and (cid:98) G ( k ) is the Fourier transform of G ( r ) , i.e. (cid:98) G ( k ) = ∑ r e − i k · r G ( r ) = ∑ x ∑ y e − ixk − iyk G ( x , y ) . (S32)Applying this to both sides of Eq. (S30) leads to ∑ x , y e − ixk − iyk [ ∆ G ( r ) + α G ( r )] = − . (S33)Next, notice ∑ x , y e − ixk − iyk ∆ G ( x , y ) = ∑ x , y e − ixk − iyk [ G ( x + , y ) + G ( x − , y ) ++ G ( x , y + ) + G ( x , y − ) − G ( x , y )]= ∑ x , y (cid:104) e − i ( x − ) k − iyk + e − i ( x + ) k − iyk ++ e − ixk − i ( y − ) k + e − ixk − i ( y + ) k − e − ixk − iyk (cid:105) G ( x , y )= ∑ x , y  e ik + e − ik (cid:124) (cid:123)(cid:122) (cid:125) = k + e ik + e − ik (cid:124) (cid:123)(cid:122) (cid:125) = k −  e − ixk − iyk G ( x , y )= [ cos k + cos k − ] ∑ x , y e − ixk − iyk G ( x , y )= [ cos k + cos k − ] (cid:98) G ( k ) . (S34)Using this in Eq. (S33) gives (cid:20) ( cos k + cos k − ) + α (cid:21) (cid:98) G ( k ) = − , (S35)in other words (cid:98) G ( k ) = − α − [ cos k + cos k ] . (S36)23sing Eq. (S31) we then have G α ( r ) = ( π ) (cid:90) π − π (cid:90) π − π e i k · r dk dk − α − ( cos k + cos k ) . (S37)Hence G α ( ) = ( π ) (cid:90) π − π (cid:90) π − π dk dk − α − ( cos k + cos k ) . (S38)The integrand is symmetric with respect to k ↔ − k and k ↔ − k respectively, due to theidentity cos ( k ) = cos ( − k ) . Hence G α ( ) = π (cid:90) π (cid:90) π dk dk − α − ( cos k + cos k )= − α π (cid:90) π (cid:90) π dk dk − ( − α ) ( cos k + cos k ) . (S39)This expression is related to the complete elliptic integral of the ﬁrst kind, see for exampleRef. [51]. We use the following notation: K ( z ) = π (cid:90) π du (cid:90) π dv − z ( cos u + cos v ) . (S40)Hence we have G α ( ) = π ( − α ) K (cid:18) − α (cid:19) = K α π ( − α ) , (S41)where we abbreviate K α = K (cid:18) − α (cid:19) (S42)for convenience.Next, we write the Laplacian ∆ G α ( ) in full: ∆ G α ( ) = − G α ( ) + [ G α ( , ) + G α ( − , ) + G α ( , ) + G α ( , − )] . (S43)Assuming isotropy, each term in the square brackets is equal to G α ( e ) = G α ( , ) . HenceEq. (S30), evaluated at r = , takes the form G α ( e ) + ( α − ) G α ( ) = − , (S44)in other words G α ( ) − G α ( e ) = + α G α ( ) . (S45)From this we ﬁnd 1 − G α ( e ) G α ( ) = (cid:20) α + G α ( ) (cid:21) . (S46)Now using Eqs. (S23) and (S27), C ( r ) = m + c ( r ) = m + ( − m ) G α ( r ) G α ( ) . (S47)24he stationary-state isogloss density σ is then σ = [ − C ( e )]= (cid:20) − m − ( − m ) G α ( e ) G α ( ) (cid:21) = ( − m ) (cid:20) − G α ( e ) G α ( ) (cid:21) = ( − m ) (cid:20) α + G α ( ) (cid:21) = ( − m ) (cid:20) α + π ( − α ) K α (cid:21) , (S48)where we have used Eq. (S41). On the other hand,1 − m = − ( p I − p E ) ( p I + p E ) = (cid:18) p I p I + p E (cid:19) (cid:18) p E p I + p E (cid:19) = ρ ( − ρ ) , (S49)so that σ = ρ ( − ρ ) (cid:20) α + π ( − α ) K α (cid:21) . (S50)Recalling that α = − τ = − ( − q )( p I + p E ) q , (S51)see Eq. (S22), and deﬁning τ = − α , i.e. K α = K − τ , yields our ﬁnal equation for thestationary-state isogloss density: σ = H ( τ ) ρ ( − ρ ) , (S52)where H ( τ ) = π ( + τ ) K (cid:0) + τ (cid:1) − τ . (S53)The expression in Eq. (S52) is a downward-opening parabola in ρ whose height is ﬁxed by H ( τ ) , where τ = ( − q )( p I + p E ) q . (S54) C.5 Sanity check: the limits τ → and τ → ∞ The complete elliptic integral K ( z ) has the following known properties [52]:lim z → K ( z ) = π z → K ( z ) = ∞ . (S55)Now consider the limit τ →

0. This limit is relevant when either q →

1, or both p E → p I →

0, see Eq. (S54). These are situations in which the spatial (voter) process dominatesthe ingress–egress dynamics. In this limit 1 / ( + τ ) →

1, so that K − τ = K ( / ( + τ )) → ∞ .Noting that Eq. (S53) can be written in the form H ( τ ) = τ (cid:20) π K − τ − (cid:21) + π K − τ , (S56)25e then ﬁnd that H ( τ ) →

0. Thus, in the limit where the spatial (voter) process completelyovertakes the ingress–egress process, the stationary-state isogloss density is zero, indicatingthat all sites agree in their spin.Next consider τ → ∞ . This limit occurs when p I + p E > q →

0, i.e. the ingress–egress process dominates. Then 1 / ( + τ ) →

0, so that K − τ → π /

2. From Eq. (S56), wethen ﬁnd that H ( τ ) → σ = ρ ( − ρ ) , indicating complete independence of the individual spins. D WALS feature levels

The following list gives the values of the WALS features mined; the italicized part after eachvalue gives its value in our binarization ( present for ‘feature present’, absent for ‘featureabsent’ and

N/A if the WALS value was excluded from the binarization as irrelevant). Lan-guages attesting irrelevant values were excluded from the corresponding feature languagesample for the purposes of calculating our statistics.

1. adpositions – WALS feature mined: ‘Person Marking on Adpositions’ (48A)– Values:1: No adpositions ( absent )2: No person marking ( present )3: Pronouns only ( present )4: Pronouns and nouns ( present )

2. deﬁnite marker – WALS feature mined: ‘Deﬁnite Articles’ (37A)– Values:1: Deﬁnite word distinct from demonstrative ( present )2: Demonstrative word used as deﬁnite article ( present )3: Deﬁnite afﬁx ( present )4: No deﬁnite, but indeﬁnite article ( absent )5: No deﬁnite or indeﬁnite article ( absent ) hand and arm identical – WALS feature mined: ‘Hand and Arm’ (129A)– Values:1: Identical ( present )2: Different ( absent ) hand and ﬁnger(s) identical – WALS feature mined: ‘Finger and Hand’ (130A)– Values: 26: Identical ( present )2: Different ( absent )

5. front rounded vowels – WALS feature mined: ‘Front Rounded Vowels’ (11A)– Values:1: None ( absent )2: High and mid ( present )3: High only ( present )4: Mid only ( present )

6. gender in independent personal pronouns – WALS feature mined: ‘Gender Distinctions in Independent Personal Pronouns’ (44A)– Values:1: In 3rd person + 1st and/or 2nd person ( present )2: 3rd person only, but also non-singular ( present )3: 3rd person singular only ( present )4: 1st or 2nd person but not 3rd ( present )5: 3rd person non-singular only ( present )6: No gender distinctions ( absent )

7. glottalized consonants – WALS feature mined: ‘Glottalized Consonants’ (7A)– Values:1: No glottalized consonants ( absent )2: Ejectives only ( present )3: Implosives only ( present )4: Glottalized resonants only ( present )5: Ejectives and implosives ( present )6: Ejectives and glottalized resonants ( present )7: Implosives and glottalized resonants ( present )8: Ejectives, implosives, and glottalized resonants ( present )

8. grammatical evidentials – WALS feature mined: ‘Semantic Distinctions of Evidentiality’ (77A)– Values:1: No grammatical evidentials ( absent )2: Indirect only ( present )3: Direct and indirect ( present ) 27 . indeﬁnite marker – WALS feature mined: ‘Indeﬁnite Articles’ (38A)– Values:1: Indeﬁnite word distinct from ‘one’ ( present )2: Indeﬁnite word same as ‘one’ ( present )3: Indeﬁnite afﬁx ( present )4: No indeﬁnite, but deﬁnite article ( absent )5: No deﬁnite or indeﬁnite article ( absent )

10. inﬂectional morphology – WALS feature mined: ‘Preﬁxing vs. Sufﬁxing in Inﬂectional Morphology’ (26A)– Values:1: Little afﬁxation ( absent )2: Strongly sufﬁxing ( present )3: Weakly sufﬁxing ( present )4: Equal preﬁxing and sufﬁxing ( present )5: Weakly preﬁxing ( present )6: Strong preﬁxing ( present )

11. inﬂectional optative – WALS feature mined: ‘The Optative’ (73A)– Values:1: Inﬂectional optative present ( present )2: Inﬂectional optative absent ( absent )

12. lateral consonants – WALS feature mined: ‘Lateral Consonants’ (8A)– Values:1: No laterals ( absent )2: /l/, no obstruent laterals ( present )3: Laterals, but no /l/, no obstruent laterals ( present )4: /l/ and lateral obstruent ( present )5: No /l/, but lateral obstruents ( present )

13. morphological second-person imperative – WALS feature mined: ‘The Morphological Imperative’ (70A)– Values:1: Second singular and second plural ( present )2: Second singular ( present )3: Second plural ( present )4: Second person number-neutral ( present )5: No second-person imperatives ( absent )28

4. order of adjective and noun is AdjN – WALS feature mined: ‘Order of Adjective and Noun’ (87A)– Values:1: Adjective-Noun ( present )2: Noun-Adjective ( absent )3: No dominant order (

N/A )4: Only internally-headed relative clauses (

N/A )

15. order of degree word and adjective is DegAdj – WALS feature mined: ‘Order of Degree Word and Adjective’ (91A)– Values:1: Degree word-Adjective ( present )2: Adjective-Degree word ( absent )3: No dominant order (

N/A )

16. order of genitive and noun is GenN – WALS feature mined: ‘Order of Genitive and Noun’ (86A)– Values:1: Genitive-Noun ( present )2: Noun-Genitive ( absent )3: No dominant order (

N/A )

17. order of numeral and noun is NumN – WALS feature mined: ‘Order of Numeral and Noun’ (89A)– Values:1: Numeral-Noun ( present )2: Noun-Numeral ( absent )3: No dominant order (

N/A )4: Numeral only modiﬁes verb (

N/A )

18. order of object and verb is OV – WALS feature mined: ‘Order of Object and Verb’ (83A)– Values:1: OV ( present )2: VO ( absent )3: No dominant order (

N/A )

19. order of subject and verb is SV – WALS feature mined: ‘Order of Subject and Verb’ (82A)– Values:1: SV ( present ) 29: VS ( absent )3: No dominant order (

N/A )

20. ordinal numerals – WALS feature mined: ‘Ordinal Numerals’ (53A)– Values:1: None ( absent )2: One, two, three ( present )3: First, two, three ( present )4: One-th, two-th, three-th ( present )5: First/one-th, two-th, three-th ( present )6: First, two-th, three-th ( present )7: First, second, three-th ( present )8: Various ( present )

21. passive construction – WALS feature mined: ‘Passive Constructions’ (107A)– Values:1: Present ( present )2: Absent ( absent )

22. plural – WALS feature mined: ‘Coding of Nominal Plurality’ (33A)– Values:1: Plural preﬁx ( present )2: Plural sufﬁx ( present )3: Plural stem change ( present )4: Plural tone ( present )5: Plural complete reduplication ( present )6: Mixed morphological plural ( present )7: Plural word ( present )8: Plural clitic ( present )9: No plural ( absent )

23. possessive afﬁxes – WALS feature mined: ‘Position of Pronominal Possessive Afﬁxes’ (57A)– Values:1: Possessive preﬁxes ( present )2: Possessive sufﬁxes ( present )3: Preﬁxes and sufﬁxes ( present )4: No possessive afﬁxes ( absent ) 30

4. postverbal negative morpheme – WALS feature mined: ‘Postverbal Negative Morphemes’ (143F)– Values:1: VNeg ( present )2: [V-Neg] ( present )3: VNeg&[V-Neg] ( present )4: None ( absent )

25. preverbal negative morpheme – WALS feature mined: ‘Preverbal Negative Morphemes’ (143E)– Values:1: NegV ( present )2: [Neg-V] ( present )3: NegV&[Neg-V] ( present )4: None ( absent )

26. productive reduplication – WALS feature mined: ‘Reduplication’ (27A)– Values:1: Productive full and partial reduplication ( present )2: Full reduplication only ( present )3: No productive reduplication ( absent )

27. question particle – WALS feature mined: ‘Position of Polar Question Particles’ (92A)– Values:1: Initial ( present )2: Final ( present )3: Second position ( present )4: Other position ( present )5: In either of two positions ( present )6: No question particle ( absent )

28. shared encoding of nominal and locational predication – WALS feature mined: ‘Nominal and Locational Predication’ (119A)– Values:1: Different ( absent )2: Identical ( present ) 31

9. tense-aspect inﬂection – WALS feature mined: ‘Position of Tense-Aspect Afﬁxes’ (69A)– Values:1: Tense-aspect preﬁxes ( present )2: Tense-aspect sufﬁxes ( present )3: Tense-aspect tone ( present )4: Mixed type ( present )5: No tense-aspect inﬂection ( absent )

30. tone – WALS feature mined: ‘Tone’ (13A)– Values:1: No tones ( absent )2: Simple tone system ( present )3: Complex tone system ( present )

31. uvular consonants – WALS feature mined: ‘Uvular Consonants’ (6A)– Values:1: None ( absent )2: Uvular stops only ( present )3: Uvular continuants only ( present )4: Uvular stops and continuants ( present )

32. velar nasal – WALS feature mined: ‘The Velar Nasal’ (9A)– Values:1: Initial velar nasal ( present )2: No initial velar nasal ( present )3: No velar nasal ( absent )

33. verbal person marking – WALS feature mined: ‘Alignment of Verbal Person Marking’ (100A)– Values:1: Neutral ( absent )2: Accusative ( present )3: Ergative ( present )4: Active ( present )5: Hierarchical ( present )6: Split ( present ) 32

4. voicing contrast – WALS feature mined: ‘Voicing in Plosives and Fricatives’ (4A)– Values:1: No voicing contrast ( absent )2: In plosives alone ( present )3: In fricatives alone ( present )4: In both plosives and fricatives ( present )

35. zero copula for predicate nominals – WALS feature mined: ‘Zero Copula for Predicate Nominals’ (120A)– Values:1: Impossible ( absent )2: Possible ( presentpresent