[PDF] Long-term word frequency dynamics derived from Twitter are corrupted: A bespoke approach to detecting and removing pathologies in ensembles of time series

Abstract

Maintaining the integrity of long-term data collection is an essential scientific practice. As a field evolves, so too will that field's measurement instruments and data storage systems, as they are invented, improved upon, and made obsolete. For data streams generated by opaque sociotechnical systems which may have episodic and unknown internal rule changes, detecting and accounting for shifts in historical datasets requires vigilance and creative analysis. Here, we show that around 10\% of day-scale word usage frequency time series for Twitter collected in real time for a set of roughly 10,000 frequently used words for over 10 years come from tweets with, in effect, corrupted language labels. We describe how we uncovered problematic signals while comparing word usage over varying time frames. We locate time points where Twitter switched on or off different kinds of language identification algorithms, and where data formats may have changed. We then show how we create a statistic for identifying and removing words with pathological time series. While our resulting process for removing `bad' time series from ensembles of time series is particular, the approach leading to its construction may be generalizeable.

Full PDF

LLong-term word frequency dynamics derived from Twitter are corrupted:A bespoke approach to detecting and removing pathologies in ensembles of time series

Peter Sheridan Dodds,

1, 2, ∗ Joshua R. Minot, Michael V. Arnold, Thayer Alshaabi, Jane LydiaAdams, David Rushing Dewhurst, Andrew J. Reagan, and Christopher M. Danforth

1, 2 Computational Story Lab, Vermont Complex Systems Center,MassMutual Center of Excellence for Complex Systems and Data Science,Vermont Advanced Computing Core, University of Vermont, Burlington, VT 05401. Department of Mathematics & Statistics, University of Vermont, Burlington, VT 05401. Charles River Analytics, 625 Mount Auburn Street, Cambridge, MA 02138. MassMutual Data Science, Amherst, MA 01002. (Dated: August 31, 2020)Maintaining the integrity of long-term data collection is an essential scientiﬁc practice. As a ﬁeldevolves, so too will that ﬁeld’s measurement instruments and data storage systems, as they areinvented, improved upon, and made obsolete. For data streams generated by opaque sociotechnicalsystems which may have episodic and unknown internal rule changes, detecting and accounting forshifts in historical datasets requires vigilance and creative analysis. Here, we show that around10% of day-scale word usage frequency time series for Twitter collected in real time for a set ofroughly 10,000 frequently used words for over 10 years come from tweets with, in eﬀect, corruptedlanguage labels. We describe how we uncovered problematic signals while comparing word usageover varying time frames. We locate time points where Twitter switched on or oﬀ diﬀerent kindsof language identiﬁcation algorithms, and where data formats may have changed. We then showhow we create a statistic for identifying and removing words with pathological time series. Whileour resulting process for removing ‘bad’ time series from ensembles of time series is particular, theapproach leading to its construction may be generalizeable.

I. INTRODUCTION

The successful collection, cleaning, and storage of datathrough time requires a stability of data sources, mea-surement instruments, and data storage taxonomy [1–8].Of course, such stability has hardly been the norm forany developing area of measurement. Indeed, over thefull arc of science, measuring and recording time itself.Thousands of years led to the establishment of a settledcalendar, with its quadracentennial leap-year exceptionto an exception to an exception [9, 10]. Accurate clocksonly ﬁrst appeared with chronometers in the 1600s [11],now, in terms of achievement, perhaps manifested by theGlobal Positioning System (GPS) which requires generalrelativity.For internet data, sources go through episodicupgrades as formats are reconﬁgured and expanded. Inthe case of Twitter, our focus here, just a few of the fea-tures that have been added include: retweets as formal-ized entities, images and video, local time, and tweet anduser language. The data object behind any given tweet,whose format began as xml and changed to json, has cor-respondingly grown in size, and the format has evolvedsomewhat biologically. The json for a “quote tweet” con-tains simpliﬁed json for the retweeted tweet. And theexpansion from 140 to 280 characters was accomplishednot by expanding an existing entry ﬁeld but adding asecond one which must be combined with the old one for ∗ [email protected] “long tweets”. Data providers and APIs have also havealso changed, most recently to GNIP as the data providerwith a completely diﬀerent JSON schema.Over time, and not without setbacks, Twitter hasbecome an important global social media service. Ampli-fying and reﬂecting real world stories, Twitter is globallyentrained with politics and news, sports, music, and cul-ture, and also performs as a distributed sensor systemfor natural disasters and emergencies [12–28]. Like anyscientiﬁc enterprise, empirical research involving Twitterand social media in general depends fundamentally onthe quality of data [7]. Because of Twitter’s now sprawl-ing platform across time and language, great care mustbe taken to ensure such integrity.In this short report, we describe: (1) How we uncov-ered anomalies in word usage time series derived fromTwitter (Sec. II), and (2) One approach to identifyingand removing corrupted time series (Sec. III). We oﬀerconcluding thoughts in Sec. IV.We emphasize that we are not attempting to cleanindividual time series, a common statistical practice, butrather we are cleaning ensembles of times series by remov-ing problematic, unsalvagable time series. Our workwould be suitable for any many-component complex sys-tem where abundances of components are recorded overtime.Our approach is intended to be used for ensembles oftime series which can only be taken as they are, i.e.,they cannot be rebuilt from more primary data sets.The work we present here has inspired a ground-up, re-identiﬁcation of language for our Twitter data set [30]which in turn has led to the building of our n -gram timeTypeset by REVTEX a r X i v : . [ phy s i c s . s o c - ph ] A ug A B C D FIG. 1.

A–C.

Three Jensen-Shannon (JS) divergence ( D JS ) time series for day-day comparisons of word usage frequencydistributions for the labMT word list [29] over times scales of 1 day, 1 week, and 1 year. In the second panel (B), for example,each point gives the D JS for the labMT word frequency distributions of a day and for the same day a week earlier. Increasesin story turbulence are suggested by the D JS time series for the day and week comparisons ( A and B ) which both show slowincreases from around 2011/2012 on. The day-same-date-a-year-before jsd time series ( C ) has peculiar jumps indicating theunderlying time series of labMT words are corrupted in some way. D. Raw counts in the labMT data set as a function ofdate derived from an approximate 10% feed of tweets from Twitter. There are clear jumps and drops in volume and thesereﬂect changes in the feed rather than the collection process. None of the the dates of sharp transitions correspond with thosepresented by Jensen-Shannon divergence ( D JS ) for year-scale comparisons in panel C . One explainable spike that does occurin A , B , and C is due to Twitter’s entire systems failing for around a week in May of 2019. The jumps in D JS turn out to dueto changes in Twitter’s language detection algorithm. A B C B C FIG. 2. A. Normalized usage frequency time series for the Dutch word ‘niet’, an example of a word strongly showing theeﬀect of Twitter episodically altering their language detection algorithm. Dutch words were especially susceptible to beingmisclassiﬁed as being English giving rise to a corrupted time series. We expand the time series for ‘niet’ in the two regionsshaded in gray in panel A and present them in panels B and C . The jumps in the time series in panel B appear to be due toTwitter putting into place a series of language identiﬁcation algorithms (which we do not attempt to reverse engineer in anyway). The second jump in panel B seems to be due to the initial algorithm being switched oﬀ. The time series for ‘niet’ staysroughly two orders of magnitude lower for over three years before one last major adjustment in late 2016 shown in panel C . series for Twitter project Storywrangler [31, 32], andthe revision and expansion of our Hedonometer instru-ment [29, 33, 34] (more below). Our work is also con-nected to our studies of how the COVID-19 pandemichas been discussed across languages on Twitter [35, 36],as well as story turbulence and chronopathy in connec-tion with Trump [37].Throughout, we do not attempt to reverse engineerany of Twitter’s proprietary algorithms, but rather con-tend with derived data and changing formats only. Nei-ther do we suggest that there is any fault of Twitter inchanging language identiﬁcation methods, or indeed anyaspect of their service, over time. We also acknowledgethat some data artifacts may have been introduced byour own struggle with the complexities of consistentlyprocessing formats that have changed many times. II. UNCOVERING THE PRESENCE OFCORRUPTED TIME SERIES

The instigation of our work here came from ﬁrst notic-ing in June of 2018 that our Hedonometer’s happinesstime series for English Twitter [29, 33, 34] had begunto apparently show increasing turbulence from the year2016 on. While a weekly cycle had always been a featureof our measure of Twitter’s day-scale happiness (Satur-day had been typically happiest, Tuesday the least), itsstrength appeared to be waning.Deciding that this observation deserved further inves-tigation, we began to conceive of ways to measure lexicaland story turbulence [37, 38].Our Hedonometer functions by averaging the individu-al, oﬄine-crowd-sourced happiness scores. At that pointin time, we were using a “lexical lens” of 10,222 wordsto create a single score for each day [29]. In brief, ourmethod ultimately derived from Osgood et al. ’s work onthe measurement of meaning [1]. Through semantic dif- ferentials, Osgood et al. found that valence (happiness-sadness) was the ﬁrst dimension of the experiences ofmeaning, followed by excitement and dominance. Usinga double-Likert scale, we improved upon earlier eﬀorts toscore individual words [39], drawing on the most commonwords used for various time periods of Twitter, GoogleBooks, the New York Times, and music lyrics [29]. Wescored 10,222 words using Amazon’s Mechanical Turkcrowd-sourcing service, calling the resulting data setlabMT (language assessment by Mechanical Turk). Torun the Hedonometer, we created a usage frequency dis-tribution for this set of 10,222 labMT words, doing sofor each day (according to Coordinated Universal Time)using tweets identiﬁed as English by Twitter.For an initial attempt to quantify turbulence on Twit-ter, we left the Hedonometer part aside and focused onthe underlying labMT word frequency distributions. Weused Jensen-Shannon divergence (JSD) to compare fre-quency distributions between dates over diﬀerent timescales, with the distributions normalized as probabili-ties (or rates). Our choice of Jensen-Shannon diver-gence was not crucial, but rather something to try, andwe later developed alternate kinds of divergences (seeRefs. [40, 41]).In Fig. 1A–C, we show three JSD time series repre-senting comparisons between a date and A. the previousday, B. the same day of the week, a week before, C. andthe same date, a year before. We ﬁrst plotted just thepanels in Fig. 1A and Fig. 1B, and saw that these JSDtime series, after trending down from 2009 through to2011, were both increasing from 2012 on, in agreementwith our visual observations of Hedonometer.In seeking to further develop our analysis of lexical tur-bulence, we then examined JSD over longer time scalesbetween dates, including the year scale of Fig. 1C. Andit was here that we ﬁrst clearly saw there were problemswith our word distributions. In late 2012, through 2013,and into 2014, we see striking jumps in year-scale JSD.We see more isolated jumps at the ends of 2015, 2016,and 2017. Because we are comparing across years, weexpect the anomalous patterns to appear twice with yearseparation, once for a problematic date looking back ayear, and then again for a year ahead looking back atthe same problematic date.We were able to say something immediately aboutwhat these anomalies are not. They are not due to iso-lated corrupted dates, something we would have to con-tend with in collecting any form of streaming data, as wewould see these as spikes in the JSD. Some aspect of thedistributions was being switched and maintained. Norare the changes somehow volume dependent, as Fig. 1Dmakes clear. While we do have some inconsistencies andchanges in the volume of labMT words collected overtime, they do not line up with the jumps in the year-scale JSD time series. While Twitter is ever-changing incontent, we nevertheless expect to ﬁnd reasonable consis-tencies in aggregate word usage patterns we may derive.Upon visual inspection of individual frequency timeseries for Twitter around the dates of the jumps in theyear-scale JSD time series, we ﬁnd some correspondingpeculiar jump sequences. (In the following section, wedevelop a systematic approach to identifying such anoma-lous time series.)For an individual example, in Fig. 2A, we show howthe normalized usage frequency for the Dutch word ‘niet’(English: ‘not/no’) exhibits a number of sharp jumps(shaded regions). The word usage rate for ‘niet’ increasesor drops over several orders magnitude around certaindates. Expanding the shaded regions of Fig. 2A, Fig. 2Bshows four jumps occurring at the end of 2012 and in2013, and Fig. 2C shows one in late 2016.We have the suggestion then that individual tweets(and hence words) are being diﬀerentially classiﬁed by asequence of language identiﬁcation algorithms employedby Twitter. Overall, from Fig. 2, the example wordof ‘niet’ seems to be initially identiﬁed as coming fromEnglish tweets, then, after a several months of algorithmsswitching on or oﬀ, appears to have been excluded fromEnglish for several years until the end of 2016, or appearso due to a change to the tweet distribution system pro-vided by GNIP.For the Hedonometer, for which these time series wereprepared, we had accepted tweets for processing unlessthey were identiﬁed as being a language other thanEnglish or the user a speaker of language diﬀerent fromEnglish (in other words, not not English tweets). Wenote that we had not noticed any of the year-scale JSDartifacts in our Hedonometer signal, which itself is a day-scale average.Word usage distributions are of course determinedwithin the context of all words for each day. Given thebehavior of year-scale JSD in Fig. 1C, we must expect thetime series of more words to follow the speciﬁc form of‘niet’. We should also expect that these corrupted timeseries would also lead to corrupted time series of basicfunction words in English (e.g., ‘the’).

III. BESPOKE DETECTION AND REMOVALOF CORRUPTED TIME SERIES

Clearly we do not want to involve poorly sampled timeseries in any of our analyses. And because we haveobserved that some words follow the ‘niet’ pattern whilethe majority track well (i.e., largely continuously if noisy,and with jumps that have historical explanations), wecan hope to remove this particular set of poorly sampledwords. We are thus able to overcome Twitter’s hiddenshifts in algorithmic classiﬁcation, at least in this mostessential task of extracting basic word usage frequencytime series.We construct a specialized method for identifying cor-rupted time series as follows. For the ﬁve jumps overallfor ‘niet’ in Fig. 2, we notice that adjacent and inter-sticial time periods are relatively quiescent. Observingthat similar patterns hold for other words, we constructa “jump statistic” to measure the degree to which aword’s time series locally tracks the shapes in Fig. 2Band Fig. 2C.For the four jumps in the ﬁrst time period of change(Fig. 2B), we choose ﬁve similar-length time ranges with-in which we expect words to be relatively similar in abun-dance on a logarithmic scale:2012-10-19 to 2012-12-08 (51 days),2012-12-15 to 2013-02-03 (51 days),2013-02-10 to 2013-03-22 (41 days),2013-03-29 to 2013-06-04 (68 days),and2013-06-11 to 2013-07-31 (51 days).Again referring to the behavior of ‘niet’, we expect thetransitions of corrupted words between these time peri-ods to be down, up, down, and down.For the second time period (Fig. 2C), we bound theone jump with two periods:2016-10-15 to 2016-12-04 (51 days),and2016-12-11 to 2017-01-30 (51 days).We expect corrupted words to jump up across this singletransition.For each word w in our set of 10,222 words, we con-struct a jump statistic J by averaging across diﬀerencesof the logarithms of normalized frequency P w,d for allpossible pairs of dates across each transition point. Weincorporate the expected transition direction for corrupt-ed time series by multiplying by +1 (up) or -1 (down), asappropriate. By using sums of diﬀerences of logarithms,we are equivalently computing ratios of normalized fre-quencies and taking their geometric mean.A simpler estimate might be to take the average prob-ability of a word in each region and sum the signed diﬀer-ences across the transition points. However, comparingeach pair of dates around each transition point generatesa distribution of J values, allowing us to estimate otherstatistics, such as a variance. A B C FIG. 3. A. Sorted jump statistic J with example words anno-tated. Words with high values of J > B. The same plot as A but now for the 1,500 words withthe highest value of J and with the 95% conﬁdence interval[ J − σ, J + 2 σ ] marked in gray. C. The same plot as B withwords re-ordered by descending values of J − σ . We take J − σ > A B FIG. 4. Two evidently connected words, “weather” and“channel”, maintained an anomalous elevation in usage fre-quency for around a year, ending in November 2016. Thesewords proved to induce an anomaly in the JSD time series fordates separated by one year, warranting their removal.

We compute a variance for each word by creating adistribution of values. For each component of J aroundeach transition point (one value for each pair of dates).For example, the ﬁrst two time periods of 51 days eachgive us 2601 possible date pairs. We use these to estimatevariances for individual jumps. We then sum variancesover all ﬁve transition points to obtain a variance for J which we will denote simply by σ .We compute J and σ for each word w . We ﬁrst sortthem by descending values of J , and the main plot inFig. 3A shows these values of J for all 10,222 labMTwords. Annotated disks along the curve give examplewords. We see that for positive values of J , the wordsthat track with the corrupted form are non-English words(‘zijn’, ‘kalo’, ‘gak’, etc.) and come from a range of lan-guages. We also ﬁnd corrupted time series for commonwords that tend to be used across languages, such as“hahaha”.Visually, it appears that many of the words ( ∼ J close to 0 (between -1/2 and 1/2, say).These are non-corrupted words (‘coke’, ‘britain’, and A B FIG. 5. A. For easy of comparison, a repeat of the year-scale JSD for labMT words presented panel C of Fig. 1. B. The sameyear-scale JSD times series but now performed on the labMT Zipf distributions with corrupted words removed. ‘varying’). We will ﬁrm up our measure of closenessbelow.Some words go strongly against the trend of word cor-ruption (

J < −

1) with ‘clinton’ and ‘hillary’ being promi-nent examples. Twitter changed their language identiﬁ-cation algorithm about a month after the US presidentialelection, and Clinton’s loss led to her name dropping inprevalence, going against the grain of jumping upwardsfor corrupted word time series (Fig. 2C).Now, having

J > J scores to craft a better criterion.In Fig. 3B, we show the ﬁrst 1500 words ordered againby decreasing J but now with the range J − σ to J +2 σ shaded. We observe words with 0 < J < J >

0. Wewill instead take our criterion for a time series to be cor-rupted if J − σ >

0. In Fig. 3C, we re-order words sothat they are descending according to the lower limit oftheir 95% conﬁdence interval, J − σ . We preserve theexample labeled words from Fig. 3B to show how they move around.With this criterion, we ﬁnd that the time series of 9,030of our 10,222 word are relatively unaﬀected by the ﬁvemajor changes in Twitter’s language detection algorithmwe have identiﬁed. We deem 1,192 words to be suﬃcient-ly problematic that we should exclude them.With these words removed, we return to our JSD calcu-lations and examine how the year-scale JSD now behaves.We ﬁnd that the jumps that appeared to be due to Twit-ter’s language detection algorithm changes have all beeneliminated.However, one last peculiar structure remained due toanomalous word frequency changes in 2015 and 2016.We were able to ﬁnd that two words, “weather” and“channel”, were unusually prominent during this time,per their time series in Fig. 4. We are unsure exactly whythis artefact appeared for our labMT data set. We notethat in our Twitter n -grams project, Storywrangler, wedo not see any anomalous behavior for “weather”, “chan-nel”, or “weather channel” in English [31, 32]. (For Sto-rywrangler, whose development was directly motivatedby the ﬁndings of our present paper, we used FastTextfor language identiﬁcation of tweets [30].)Finally, in Fig. 5, we show the year-scale JSD timeseries for our labMT data set with corrupted wordsremoved. To be compared with Fig. 1C, we now see anoisy time series more in keeping with the 1-day and 1-week time scale JSD times series in Figs. 1A and 1B.While we cannot be sure that there are no other prob-lems for our labMT word list, we have at least been ableto systematically contend with the time series corrup-tions induced by changes in Twitter’s language detectionalgorithms. IV. CONCLUDING REMARKS

We have shown that certain kinds of time series forindividual words on Twitter may be functionally corrupt-ed due to changes in how Twitter has deployed languagedetection algorithms over the last decade coupled withthe diﬃculties of constantly needing to recognize andadapt to data format changes. In the absence of the abil-ity to rebuild these problematic time series from originalprimary data, we have demonstrated how a systematic, ifbespoke, method can be developed to generate a ‘clean’ensemble of time series. We repeat that we do not cleanindividual time series but rather remove them entirelyfrom an ensemble.Anomalies within ensembles of interrelated time seriesmay in general be diﬃcult to discern. While pursuingother research directions may have uncovered the sametime series problems—our original research interest con- cerned lexical turbulence [37, 38]—measuring the diver-gence between Zipf distributions for days proved power-ful here. Our stumbling upon aberrant time series washelped by Jensen-Shannon divergence being just one ofmany divergences that would have worked, though evi-dence of time series problems only arose when we lookedbeyond short time scales.We believe our ﬁndings should elicit some measureof concern as they suggest that existing work based onlanguage-speciﬁc time series derived from Twitter mayneed to be re-examined. More generally, our work wouldsupport the very reasonable concern any researcher mighthave about the long-term integrity of data collected onthe ﬂy from social media and other internet services.Indeed, our investigations have led us to rebuild ourTwitter database, resulting in important upgrades forour happiness measurement instrument, Hedonometer,and the development of our Twitter n -gram viewer, Sto-rywrangler. ACKNOWLEDGMENTS

The authors are grateful for the computing resourcesprovided by the Vermont Advanced Computing Corewhich was supported in part by NSF award No. OAC-1827314, and ﬁnancial support from the MassachusettsMutual Life Insurance Company and Google. [1] C. Osgood, G. Suci, and P. Tannenbaum,

The Mea-surement of Meaning (University of Illinois, Urbana, IL,1957).[2] Louis L. Thurstone,

The Measurement of Values (Univer.Chicago Press, 1959).[3] Graeme Gooday, “Precision measurement and the gene-sis of physics teaching laboratories in Victorian Britain,”The British journal for the history of science , 25–51(1990).[4] Stephen Jay Gould, The Mismeasure of Man (WW Nor-ton & company, 1996).[5] Peter J. Mohr, Barry N. Taylor, and David B. Newell,“CODATA recommended values of the fundamentalphysical constants: 2006,” Journal of Physical andChemical Reference Data , 633–1284 (2008).[6] Jan Golinski, Making Natural Knowledge: Construc-tivism and the History of Science, with a new preface (University of Chicago Press, 2008).[7] J¨urgen Pfeﬀer, Katja Mayer, and Fred Morstatter,“Tampering with Twitter’s sample API,” EPJ Data Sci-ence , 50 (2018).[8] Jessica Kay Flake and Eiko I Fried, “Measurementschmeasurement: Questionable measurement practicesand how to avoid them,” (2019).[9] Umberto Eco, Foucault’s Pendulum (William Weaver,London: Seeker & Warburg, 1989).[10] Stephen Jay Gould,

Questioning the Millennium: ARationalist’s Guide to a Precisely Arbitrary Countdown (Revised Edition) (Crown, 1999).[11] Dava Sobel,

Longitude: The True Story of a Lone GeniusWho Solved the Greatest Scientiﬁc Problem of His Time (Bloomsbury Publishing, US, 2007).[12] Dhiraj Murthy,

Twitter (Polity Press Cambridge, UK,2018).[13] Takeshi Sakaki, Makoto Okazaki, and Yutaka Mat-suo, “Earthquake shakes Twitter users: Real-time eventdetection by social sensors,” in

Proceedings of the 19thinternational conference on World wide web (2010) pp.851–860.[14] Vasileios Lampos and Nello Cristianini, “Tracking theﬂu pandemic by monitoring the social web,” in (2010) pp. 411–416.[15] Aron Culotta, “Towards detecting inﬂuenza epidemics byanalyzing Twitter messages,” in

Proceedings of the FirstWorkshop on Social Media Analytics , SOMA 10 (Asso-ciation for Computing Machinery, New York, NY, USA,2010) pp. 115–122.[16] Sounman Hong and Daniel Nadler, “Does the early birdmove the polls? The use of the social media tool ‘Twit-ter’ by US politicians and its impact on public opinion,”in

Proceedings of the 12th Annual International DigitalGovernment Research Conference: Digital GovernmentInnovation in Challenging Times (2011) pp. 182–186.[17] Arjumand Younus, M Atif Qureshi, Fiza Fatima Asar,Muhammad Azam, Muhammad Saeed, and Nasir

Touheed, “What do the average twitterers say: A Twit-ter model for public opinion analysis in the face ofmajor political events,” in (IEEE, 2011) pp. 618–623.[18] J. Bollen, H. Mao, and X.-J. Zeng, “Twitter mood pre-dicts the stock market,” Journal of Computational Sci-ence , 1–8 (2011).[19] Galen Pickard, Wei Pan, Iyad Rahwan, Manuel Cebri-an, Riley Crane, Anmol Madan, and Alex Pentland,“Time-critical social mobilization,” Science , 509–512(2011).[20] Huiji Gao, Geoﬀrey Barbier, and Rebecca Goolsby,“Harnessing the crowdsourcing power of social mediafor disaster relief,” IEEE Intelligent Systems , 10–14(2011).[21] Ugur Kursuncu, Manas Gaur, Usha Lokala, Krish-naprasad Thirunarayan, Amit Sheth, and I BudakArpinar, “Predictive analysis on Twitter: Techniquesand applications,” in Emerging research challenges andopportunities in computational social network analysisand mining (Springer, 2019) pp. 67–104.[22] Nir Grinberg, Kenneth Joseph, Lisa Friedland, BrionySwire-Thompson, and David Lazer, “Fake news on Twit-ter during the 2016 US presidential election,” Science , 374–378 (2019).[23] Ryan J Gallagher, Elizabeth Stowell, Andrea G Park-er, and Brooke Foucault Welles, “Reclaiming stigma-tized narratives: The networked disclosure landscapeof , 1–30 (2019).[24] Preslav Nakov, Alan Ritter, Sara Rosenthal, FabrizioSebastiani, and Veselin Stoyanov, “SemEval-2016 task4: Sentiment analysis in Twitter,” arXiv preprint arX-iv:1912.01973 (2019).[25] Sarah J. Jackson, Moya Bailey, and Brooke Fou-cault Welles, (MIT Press, 2020).[26] Timothy R Tangherlini, Shadi Shahsavari, BehnamShahbazi, Ehsan Ebrahimzadeh, and Vwani Roychowd-hury, “An automated pipeline for the discovery of con-spiracy and conspiracy theory narrative frameworks:Bridgegate, Pizzagate and storytelling on the web,” PLoSONE , e0233879 (2020).[27] Jennifer Allen, Baird Howland, Markus Mobius, DavidRothschild, and Duncan J Watts, “Evaluating the fakenews problem at the scale of the information ecosystem,”Science Advances , eaay3539 (2020).[28] Zachary C. Steinert-Threlkeld, Delia Mocanu, Alessan-dro Vespignani, and James Fowler, “Online social net-works and oﬄine protest,” EPJ Data Science , 19(2015).[29] P. S. Dodds, K. D. Harris, I. M. Kloumann, C. A. Bliss,and C. M. Danforth, “Temporal patterns of happinessand information in a global social network: Hedonomet-rics and Twitter,” PLoS ONE , e26752 (2011).[30] Thayer Alshaabi, David R. Dewhurst, Joshua R. Minot,Michael V. Arnold, Jane L. Adams, Christopher M.Danforth, and Peter Sheridan Dodds, “The grow-ing ampliﬁcation of social media: Measuring temporaland social contagion dynamics for over 150 languages on twitter for 2009–2020,” (2020), available online athttp://arxiv.org/abs/2003.03667.[31] Thayer Alshaabi, Jane L. Adams, Michael V. Arnold,Joshua R. Minot, David R. Dewhurst, Andrew J. Reagan,Christopher M. Danforth, and Peter Sheridan Dodds,“Storywrangler: A massive exploratorium for sociolin-guistic, cultural, socioeconomic, and political timelinesusing Twitter,”.[32] storywrangling.org, accessed August 23, 2020.[33] P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J.Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M.Kloumann, J. P. Bagrow, K. Megerdoomian, M. T.McMahon, B. F. Tivnan, and C. M. Danforth, “Humanlanguage reveals a universal positivity bias,” Proc. Natl.Acad. Sci. n -gram time series for 24 languages on Twitter,” (2020),available online at http://arxiv.org/abs/2003.12614.[36] David Rushing Dewhurst, Thayer Alshaabi, Michael V.Arnold, Joshua R. Minot, Christopher M. Danforth, andPeter Sheridan Dodds, “Divergent modes of online collec-tive attention to the COVID-19 pandemic are associatedwith future caseload variance,” (2020), available onlineat http://arxiv.org/abs/2004.03516.[37] Peter Sheridan Dodds, Joshua R. Minot, Michael V.Arnold, Thayer Alshaabi, Jane L. Adams, Andrew J.Reagan, and Christopher M. Danforth, “Computa-tional timeline reconstruction of the stories surround-ing Trump: Story turbulence, narrative control, andcollective chronopathy,” (2020), available online athttps://arxiv.org/abs/2008.07301.[38] Eitan A. Pechenick, Chrisopher M. Danforth, andPeter Sheridan Dodds, “Is language evolution grind-ing to a halt? The scaling of lexical turbulence inEnglish ﬁction suggests it is not,” Journal of Compu-tational Science , 24–37 (2017), available online athttp://arxiv.org/abs/1503.03512.[39] M. M. Bradley and P. J. Lang, Aﬀective norms forEnglish words (ANEW): Stimuli, instruction manual andaﬀective ratings , Technical report C-1 (University ofFlorida, Gainesville, FL, 1999).[40] Peter Sheridan Dodds, Joshua R. Minot, Michael V.Arnold, Thayer Alshaabi, Jane Lydia Adams,David Rushing Dewhurst, Tyler J. Gray, Morgan R.Frank, Andrew J. Reagan, and Christopher M.Danforth, “Allotaxonometry and rank-turbulencedivergence: A universal instrument for compar-ing complex systems,” (2020), available online athttps://arxiv.org/abs/2002.09770.[41] Peter Sheridan Dodds et al.et al.