Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change
CCultural Shift or Linguistic Drift? Comparing TwoComputational Measures of Semantic Change
William L. Hamilton, Jure Leskovec, Dan Jurafsky
Department of Computer Science, Stanford University, Stanford CA, 94305 wleif,jure,[email protected]
Abstract
Words shift in meaning for many reasons,including cultural factors like new technolo-gies and regular linguistic processes like sub-jectification. Understanding the evolution oflanguage and culture requires disentanglingthese underlying causes. Here we show howtwo different distributional measures can beused to detect two different types of seman-tic change. The first measure, which has beenused in many previous works, analyzes globalshifts in a word’s distributional semantics; itis sensitive to changes due to regular pro-cesses of linguistic drift, such as the semanticgeneralization of promise (“I promise.” → “Itpromised to be exciting.”). The second mea-sure, which we develop here, focuses on localchanges to a word’s nearest semantic neigh-bors; it is more sensitive to cultural shifts, suchas the change in the meaning of cell (“prisoncell” → “cell phone”). Comparing measure-ments made by these two methods allows re-searchers to determine whether changes aremore cultural or linguistic in nature, a distinc-tion that is essential for work in the digital hu-manities and historical linguistics. Distributional methods of embedding words in vec-tor spaces according to their co-occurrence statis-tics are a promising new tool for diachronic seman-tics (Gulordava and Baroni, 2011; Jatowt and Duh,2014; Kulkarni et al., 2014; Xu and Kemp, 2015;Hamilton et al., 2016). Previous work, however,does not consider the underlying causes of seman- tic change or how to distentangle different types ofchange.We show how two computational measures canbe used to distinguish between semantic changescaused by cultural shifts (e.g., technological ad-vancements) and those caused by more regular pro-cesses of semantic change (e.g., grammaticalizationor subjectification). This distinction is essential forresearch on linguistic and cultural evolution. Detect-ing cultural shifts in language use is crucial to com-putational studies of history and other digital hu-manities projects. By contrast, for advancing histor-ical linguistics, cultural shifts amount to noise andonly the more regular shifts matter.Our work builds on two intuitions: that dis-tributional models can highlight syntagmatic ver-sus paradigmatic relations with neighboring words(Schutze and Pedersen, 1993) and that nouns aremore likely to undergo changes due to irregular cul-tural shifts while verbs more readily participate inregular processes of semantic change (Gentner andFrance, 1988; Traugott and Dasher, 2001). We usethis noun vs. verb mapping as a proxy to compareour two measures’ sensitivities to cultural vs. lin-guistic shifts. Sensitivity to nominal shifts indi-cates a propensity to capture irregular cultural shiftsin language, such as those due to technological ad-vancements (Traugott and Dasher, 2001). Sensitiv-ity to shifts in verbs (and other predicates) indicatesa propensity to capture regular processes of linguis-tic drift (Gentner and France, 1988; Kintsch, 2000;Traugott and Dasher, 2001).The first measure we analyze is based uponchanges to a word’s local semantic neighborhood; a r X i v : . [ c s . C L ] S e p esbian homosexualheterosexualqueerwoman gay (1990s) hispanicdaftwittybrilliantmerry gay (1900s) frolicsomejoyous Global measure of change Local neighborhood measure of change
Figure 1: Two different measures of semantic change.
With the global measure of change, we measure how far a word hasmoved in semantic space between two time-periods. This measure is sensitive to subtle shifts in usage and also global effects dueto the entire semantic space shifting. For example, this captures how actually underwent subjectification during the 20th century,shifting from uses in objective statements about the world (“actually did try”) to subjective statements of attitude (“I actually agree”;see Traugott and Dasher, 2001 for details). In contrast, with the local neighborhood measure of change, we measure changes in aword’s nearest neighbors, which captures drastic shifts in core meaning, such as gay ’s shift in meaning over the 20th century. we show that it is more sensitive to changes in thenominal domain and captures changes due to unpre-dictable cultural shifts. Our second measure relieson a more traditional global notion of change; weshow that it better captures changes, like those inverbs, that are the result of regular linguistic drift.Our analysis relies on a large-scale statisticalstudy of six historical corpora in multiple lan-guages, along with case-studies that illustrate thefine-grained differences between the two measures.
We use the diachronic word2vec embeddings con-structed in our previous work (Hamilton et al., 2016)to measure how word meanings change betweenconsecutive decades. In these representations eachword w i has a vector representation w ( t ) (Turneyand Pantel, 2010) at each time point, which capturesits co-occurrence statistics for that time period. Thevectors are constructed using the skip-gram withnegative sampling (SGNS) algorithm (Mikolov etal., 2013) and post-processed to align the semanticspaces between years. Measuring the distance be-tween word vectors for consecutive decades allowsus to compute the rate at which the different words http://nlp.stanford.edu/projects/histwords/ .This URL also links to detailed dataset descriptions and the codeneeded to replicate the experiments in this paper. change in meaning (Gulordava and Baroni, 2011).We analyzed the decades from 1800 to 1990 usingvectors derived from the Google N-gram datasets(Lin et al., 2012) that have large amounts of his-torical text (English, French, German, and EnglishFiction). We also used vectors derived from the Cor-pus of Historical American English (COHA), whichis smaller than Google N-grams but was carefullyconstructed to be genre balanced and contains wordlemmas as well as surface forms (Davies, 2010). Weexamined all decades from 1850 through 2000 usingthe COHA dataset and used the part-of-speech tagsprovided with the corpora. We examine two different ways to measure semanticchange (Figure 1).
Global measure
The first measure analyzes global shifts in aword’s vector semantics and is identical to the mea-sure used in most previous works (Gulordava andBaroni, 2011; Jatowt and Duh, 2014; Kim et al.,2014; Hamilton et al., 2016). We simply takea word’s vectors for two consecutive decades andmeasure the cosine distance between them, i.e. d G ( w ( t ) i , w ( t +1) i ) = cos-dist ( w ( t ) i , w ( t +1) i ) . (1) nglish (All) English (Fic.) German French COHA (word) COHA (lemma) − . − . − . − . . . . ( V e r b - noun ) c hange Global measureLocal measure
Figure 2: The global measure is more sensitive to semantic changes in verbs while the local neighborhood measure is moresensitive to noun changes.
Examining how much nouns change relative to verbs (using coefficients from mixed-model regressions)reveals that the two measures are sensitive to different types of semantic change. Across all languages, the local neighborhoodmeasure always assigns relatively higher rates of change to nouns (i.e., the right/green bars are lower than the left/blue bars for allpairs), though the results vary by language (e.g., French has high noun change-rates overall). confidence intervals are shown.
Local neighborhood measure
The second measure is based on the intuition thatonly a word’s nearest semantic neighbors are rele-vant. For this measure, we first find word w i ’s set of k nearest-neighbors (according to cosine-similarity)within each decade, which we denote by the orderedset N k ( w ( t ) i ) . Next, to measure the change betweendecades t and t + 1 , we compute a “second-order”similarity vector for w ( t ) i from these neighbor setswith entries defined as s ( t ) ( j ) = cos-sim ( w ( t ) i , w ( t ) j ) ∀ w j ∈ N k ( w ( t ) i ) ∪ N k ( w ( t +1) i ) , (2)and we compute an analogous vector for w ( t +1) i .The second-order vector, s ( t ) i , contains the cosinesimilarity of w i and the vectors of all w i ’s near-est semantic neighbors in the the time-periods t and t + 1 . Working with variants of these second-ordervectors has been a popular approach in many recentworks, though most of these works define these vec-tors against the full vocabulary and not just a word’snearest neighbors (del Prado Martin and Brendel,2016; Eger and Mehler, 2016; Rodda et al., 2016).Finally, we compute the local neighborhoodchange as d L ( w ( t ) i , w ( t +1) i ) = cos-dist ( s ( t ) i , s ( t +1) i ) . (3)This measures the extent to which w i ’s similaritywith its nearest neighbors has changed.The local neighborhood measure defined in (3)captures strong shifts in a word’s paradigmatic re-lations but is less sensitive to global shifts in syntag-matic contexts (Schutze and Pedersen, 1993). We Dataset Table 1: Number of nouns and verbs tested in each dataset. used k = 25 in all experiments (though we foundthe results to be consistent for k ∈ [10 , ). To test whether nouns or verbs change more accord-ing to our two measures of change, we build onour previous work and used a linear mixed modelapproach (Hamilton et al., 2016). This approachamounts to a linear regression where the model alsoincludes “random” effects to account for the fact thatthe measurements for individual words will be cor-related across time (McCulloch and Neuhaus, 2001).We ran two regressions per datatset: one with theglobal d G values as the dependent variables (DVs)and one with the local neighborhood d L values. Inboth cases we examined the change between all con-secutive decades and normalized the DVs to zero-mean and unit variance. We examined nouns/verbswithin the top-10000 words by frequency rank andremoved all words that occurred < times inthe smaller COHA dataset. The independent vari-ables are word frequency, the decade of the change(represented categorically), and variable indicating ord 1850s context 1990s contextactually “...dinners which you have actually eaten.” “With that, I actually agree.”must “O, George, we must have faith.” “Which you must have heard ten years ago...”promise “I promise to pay you...’ “...the day promised to be lovely.”gay “Gay bridals and other merry-makings of men.” “...the result of gay rights demonstrations.”virus “This young man is...infected with the virus.” “...a rapidly spreading computer virus.”cell “The door of a gloomy cell...” “They really need their cell phones.” Table 2: Example case-studies of semantic change.
The first three words are examples of regular linguistic shifts, while the latterthree are examples of words that shifted due to exogenous cultural factors. Contexts are from the COHA data (Davies, 2010). actually must promise gay virus cell − . − . . . G l oba l - l o c a l c hange Regular linguistic shifts Irregular cultural shifts
Figure 3: The global measure captures classic examples of linguistic drift while the local measure captures example culturalshifts.
Examining the semantic distance between the 1850s and 1990s shows that the global measure is more sensitive to regularshifts (and vice-versa for the local measure). The plot shows the difference between the measurements made by the two methods. whether a word is a noun or a verb (proper nounsare excluded, as in Hamilton et al., 2016). Our results show that the two seemingly relatedmeasures actually result in drastically different no-tions of semantic change.
The local neighborhood measure assigns far higherrates of semantic change to nouns across all lan-guages and datasets while the opposite is true forthe global distance measure, which tends to assignhigher rates of change to verbs (Figure 2).We focused on verbs vs. nouns since they arethe two major parts-of-speech and previous researchhas shown that verbs are more semantically mutablethan nouns and thus more likely to undergo linguis-tic drift (Gentner and France, 1988), while nounsare far more likely to change due to cultural shiftslike new technologies (Traugott and Dasher, 2001).However, some well-known regular linguistic shiftsinclude rarer parts of speech like adverbs (includedin our case studies below). Thus we also confirmed Frequency was included since it is known to strongly influ-ence the distributional measures (Hamilton et al., 2016). that the differences shown in Figure 2 also holdwhen adverbs and adjectives are included along withthe verbs. This modified analysis showed analogoussignificant trends, which fits with previous researcharguing that adverbial and adjectival modifiers arealso often the target of regular linguistic changes(Traugott and Dasher, 2001).The results of this large-scale regression analy-sis show that the local measure is more sensitive tochanges in the nominal domain, a domain in whichchange is known to be driven by cultural factors.In contrast, the global measure is more sensitive tochanges in verbs, along with adjectives and adverbs,which are known to be the targets of many regularprocesses of linguistic change (Traugott and Dasher,2001; Hopper and Traugott, 2003)
We examined six case-study words grouped into twosets. These case studies show that three examples ofwell-attested regular linguistic shifts (set A) changedmore according to the global measure, while threewell-known examples of cultural changes (set B)change more according to the local neighborhoodmeasure. Table 2 lists these words with some rep-resentative historical contexts (Davies, 2010).et A contains three words that underwent at-tested regular linguistic shifts detailed in Traugottand Dasher (2001): actually , must , and promise .These three words represent three different types ofregular linguistic shifts: actually is a case of subjec-tification (detailed in Figure 1); must shifted froma deontic/obligation usage (“you must do X”) to aepistemic one (“X must be the case”), exemplifyinga regular pattern of change common to many modalverbs; and promise represents the class of shift-ing “performative speech acts” that undergo richchanges due to their pragmatic uses and subjectifi-cation (Traugott and Dasher, 2001). The contextslisted in Table 2 exemplify these shifts.Set B contains three words that were selectedbecause they underwent well-known cultural shiftsover the last 150 years: gay , virus , and cell .These words gained new meanings due to uses incommunity-specific vernacular ( gay ) or technolog-ical advances ( virus , cell ). The cultural shifts un-derlying these changes in usage — e.g., the devel-opment of the mobile “cell phone” — were unpre-dictable in the sense that they were not the result ofregularities in human linguistic systems.Figure 3 shows how much the meaning of theseword changed from the 1850s to the 1990s accordingto the two different measures on the English Googledata. We see that the words in set A changed morewhen measurements were made using the globalmeasure, while the opposite holds for set B. Our results show that our novel local neighborhoodmeasure of semantic change is more sensitive tochanges in nouns, while the global measure is moresensitive to changes in verbs. This mapping alignswith the traditional distinction between irregular cul-tural shifts in nominals and more regular cases oflinguistic drift (Traugott and Dasher, 2001) and isfurther reinforced by our six case studies.This finding emphasizes that researchers must de-velop and use measures of semantic change thatare tuned to specific tasks. For example, a cul-tural change-point detection framework would bemore successful using our local neighborhood mea-sure, while an empirical study of grammaticalizationwould be better off using the traditional global dis- tance approach. Comparing measurements made bythese two approaches also allows researchers to as-sess the extent to which semantic changes are lin-guistic or cultural in nature.
Acknowledgements
The authors thank C. Manning, V. Prabhakaran,S. Kumar, and our anonymous reviewers for theirhelpful comments. This research has been sup-ported in part by NSF CNS-1010921, IIS-1149837,IIS-1514268 NIH BD2K, ARO MURI, DARPAXDATA, DARPA SIMPLEX, Stanford Data Sci-ence Initiative, SAP Stanford Graduate Fellowship,NSERC PGS-D, Boeing, Lightspeed, and Volkswa-gen.
References [Davies2010] Mark Davies. 2010. The Corpus of Histor-ical American English: 400 million words, 1810-2009.http://corpus.byu.edu/coha/.[del Prado Martin and Brendel2016] Fermin Moscoso delPrado Martin and Christian Brendel. 2016. Case andCause in Icelandic: Reconstructing Causal Networksof Cascaded Language Changes. In
Proc. ACL .[Eger and Mehler2016] Steffen Eger and AlexanderMehler. 2016. On the Linearity of Semantic Change:Investigating Meaning Variation via Dynamic GraphModels. In
Proc. ACL .[Gentner and France1988] Dedre Gentner and Ilene M.France. 1988. The verb mutability effect: Studies ofthe combinatorial semantics of nouns and verbs.
Lexi-cal ambiguity resolution: Perspectives from psycholin-guistics, neuropsychology, and artificial intelligence ,pages 343–382.[Gulordava and Baroni2011] Kristina Gulordava andMarco Baroni. 2011. A distributional similarityapproach to the detection of semantic change inthe Google Books Ngram corpus. In
Proc. GEMS2011 Workshop on Geometrical Models of NaturalLanguage Semantics , pages 67–71. Association forComputational Linguistics.[Hamilton et al.2016] William L. Hamilton, JureLeskovec, and Dan Jurafsky. 2016. DiachronicWord Embeddings Reveal Statistical Laws of Seman-tic Change. In
Proc. ACL .[Hopper and Traugott2003] Paul J Hopper and Eliza-beth Closs Traugott. 2003.
Grammaticalization .Cambridge University Press, Cambridge, UK.[Jatowt and Duh2014] Adam Jatowt and Kevin Duh.2014. A framework for analyzing semantic change ofwords across time. In
Proc. 14th ACM/IEEE-CS Conf.on Digital Libraries , pages 229–238. IEEE Press.Kim et al.2014] Yoon Kim, Yi-I. Chiu, Kentaro Hanaki,Darshan Hegde, and Slav Petrov. 2014. Temporalanalysis of language through neural language models. arXiv preprint arXiv:1405.3515 .[Kintsch2000] Walter Kintsch. 2000. Metaphor compre-hension: A computational theory.
Psychon. Bull. Rev. ,7(2):257–266.[Kulkarni et al.2014] Vivek Kulkarni, Rami Al-Rfou,Bryan Perozzi, and Steven Skiena. 2014. Statisticallysignificant detection of linguistic change. In
Proc.24th WWW Conf. , pages 625–635.[Lin et al.2012] Yuri Lin, Jean-Baptiste Michel,Erez Lieberman Aiden, Jon Orwant, Will Brock-man, and Slav Petrov. 2012. Syntactic annotations forthe google books ngram corpus. In
Proc. ACL SystemDemonstrations .[McCulloch and Neuhaus2001] Charles E McCullochand John M Neuhaus. 2001.
Generalized linearmixed models . Wiley-Interscience, Hoboken, NJ.[Mikolov et al.2013] Tomas Mikolov, Ilya Sutskever, KaiChen, Greg S. Corrado, and Jeff Dean. 2013. Dis-tributed representations of words and phrases and theircompositionality. In
NIPS .[Rodda et al.2016] Martina Rodda, Marco Senaldi, andAlessandro Lenci. 2016. Panta rei: Tracking Seman-tic Change with Distributional Semantics in AncientGreek. In
Italian Conference of Computational Lin-guistics .[Schutze and Pedersen1993] Hinrich Schutze and JanPedersen. 1993. A vector model for syntagmatic andparadigmatic relatedness. In
Proc. 9th Annu. Conf. ofthe UW Centre for the New OED and Text Research ,pages 104–113. Citeseer.[Traugott and Dasher2001] Elizabeth Closs Traugott andRichard B Dasher. 2001.
Regularity in SemanticChange . Cambridge University Press, Cambridge,UK.[Turney and Pantel2010] Peter D. Turney and PatrickPantel. 2010. From frequency to meaning: Vec-tor space models of semantics.
J. Artif. Intell. Res. ,37(1):141–188.[Xu and Kemp2015] Yang Xu and Charles Kemp. 2015.A computational evaluation of two laws of semanticchange. In,37(1):141–188.[Xu and Kemp2015] Yang Xu and Charles Kemp. 2015.A computational evaluation of two laws of semanticchange. In