[PDF] Counting Protests in News Articles: A Dataset and Semi-Automated Data Collection Pipeline

Abstract

Between January 2017 and January 2021, thousands of local news sources in the United States reported on over 42,000 protests about topics such as civil rights, immigration, guns, and the environment. Given the vast number of local journalists that report on protests daily, extracting these events as structured data to understand temporal and geographic trends can empower civic decision-making. However, the task of extracting events from news articles presents well known challenges to the NLP community in the fields of domain detection, slot filling, and coreference resolution. To help improve the resources available for extracting structured data from news stories, our contribution is three-fold. We 1) release a manually labeled dataset of news article URLs, dates, locations, crowd size estimates, and 494 discrete descriptive tags corresponding to 42,347 reported protest events in the United States between January 2017 and January 2021; 2) describe the semi-automated data collection pipeline used to discover, sort, and review the 144,568 English articles that comprise the dataset; and 3) benchmark a long-short term memory (LSTM) low dimensional classifier that demonstrates the utility of processing news articles based on syntactic structures, such as paragraphs and sentences, to count the number of reported protest events.

Full PDF

CCounting Protests in News Articles:A Dataset and Semi-Automated Data Collection Pipeline

Tommy Leung * , L. Nathan Perkins * *authors contributed equallyIndependent ScholarsCambridge, MA, [email protected], [email protected] Abstract

Between January 2017 and January 2021, thousands of local news sources in the United States reported on over 42,000 protests abouttopics such as civil rights, immigration, guns, and the environment. Given the vast number of local journalists that report on protestsdaily, extracting these events as structured data to understand temporal and geographic trends can empower civic decision-making.However, the task of extracting events from news articles presents well known challenges to the NLP community in the ﬁelds of domaindetection, slot ﬁlling, and coreference resolution.To help improve the resources available for extracting structured data from news stories, our contribution is three-fold. We 1)release a manually labeled dataset of news article URLs, dates, locations, crowd size estimates, and 494 discrete descriptive tagscorresponding to 42,347 reported protest events in the United States between January 2017 and January 2021; 2) describe thesemi-automated data collection pipeline used to discover, sort, and review the 138,826 English articles that comprise the dataset; and 3)benchmark a long-short term memory (LSTM) low dimensional classiﬁer that demonstrates the utility of processing news articles basedon syntactic structures, such as paragraphs and sentences, to count the number of reported protest events.

Protests have played a notable role in social movements andpolitical outcomes in the United States, including the CivilRights and Anti-War movements that started in the 1960s(McAdam and Su, 2002; Soule and Earl, 2005; Andrewsand Biggs, 2006) and the rise of the Tea Party in Ameri-can politics starting in 2009 (Madestam et al., 2013). Re-searchers that study the relationships between protests, po-litical outcomes, and social movements often rely on time-and labor-intensive, manually coded datasets based on localnewspaper reports.Since January 2017, millions of Americans have protestedabout topics such as civil rights, immigration, guns, health-care, collective bargaining, and the environment; and jour-nalists have recorded much of this protest activity in newsarticles in local papers. However, for these event data tobe contemporaneously useful for deeper analysis, citizens,policymakers, journalists, activists, and researchers must beable to ask structured questions of the news articles in ag-gregate, such as, “How often did people protest against po-lice brutality?”The task of associating details with events in natural lan-guage processing is often referred to as coreference reso-lution or multi-slot ﬁlling and constitutes an active area ofresearch (Soderland et al., 1999; Hendrickx et al., 2009;Mesnil et al., 2015; Hakkani-T¨ur et al., 2016). Multi-slotﬁlling and coreference resolution in the context of open-ended news articles are sufﬁciently complex problems thatthey have recently inspired the deﬁnition of their own natu-ral language processing (NLP) research task (Postma et al.,2018). To illustrate some of the speciﬁc NLP challenges re-lated to extracting protest event details from news articles,here are example excerpts from candidate articles that con-tain variations of the keywords “protest,” “rally,” “demon-stration,” or “march”: 1. “Protestors marched for greater gun control in Tor-rance.” (one protest for gun control)2. “A few dozen white nationalists held a rally; they wereoutnumbered by hundreds of counterprotestors.” (twoprotests, one for and one against white supremacy)3. “Residents in Corrales held a local demonstration tomirror the national rally in Washington, DC; ralliesalso occurred in New York, Boston, Seattle, Chicago,and Albuquerque.” (seven protests, protest objectivenot described in this sentence)4. “Teachers rallied for more funding last year, and theywill do so again this year at the Capitol.” (two protestsa year apart, against systemic underfunding)5. “Dow rallies 100 points.” (zero protests)To help improve the resources available for extracting struc-tured data from news stories, our contribution is three-fold.We 1) present a manually labeled dataset of news URLscorresponding to 42,347 reported protests in the UnitedStates between January 2017 and January 2021; 2) describethe semi-automated data collection pipeline used to com-pile the dataset nightly with a team of two researchers; and3) benchmark a long-short term memory (LSTM) low di-mensional classiﬁer to count the number of protest eventsreported in a news article.In Section 2, we describe key statistics about the dataset. InSection 3, we describe our data collection pipeline and cod-ing process, including key automation techniques for arti-cle discovery, similarity sorting, and classiﬁcation. In Sec-tion 4, we discuss the results of using the dataset to trainand test a series of fully-supervised, LSTM-based neuralnetworks to predict the number of protest events reportedin a news article. a r X i v : . [ c s . C L ] F e b Dataset Statistics

The dataset that we present in this paper contains 138,826URLs corresponding to 42,347 unique protests reported inthe United States between January 2017 and January 2021.In the data, we differentiate between past references toprotests that have occurred and future references to proteststhat are planned. In total, there are 15,756 future event ref-erences and 100,963 past event references.We ﬁrst highlight other news-based civil unrest/violencedatasets of social and journalistic import that may also beuseful as NLP language resources for event extraction fromnews articles:1. the Dynamics of Collective Action database, whichthoroughly documents approximately 23,000 unique,historic protest events reported in the New York Timesbetween 1960 to 1995 and includes code ﬁelds for de-tails such as protest targets, participation by organiza-tions, the presence and nature of any related violence,and the number of people arrested;2. the Event Status Corpus, which contains 4,500 newsarticles about civil unrest in English and Spanish withexplicit annotations describing the temporal status ofeach reported event; (Huang et al., 2016)3. and the Gun Violence Database, which collects reportsof gun violence incidents in the United States by com-bining an automated crawling pipeline with crowd-sourced article annotation. (Pavlick et al., 2016)These datasets differ from the protest events dataset pre-sented in this paper by the types of civil unrest events doc-umented, breadth of included news sources, contemporane-ity of data collection, time period covered, volume of arti-cles, and annotation taxonomy. To ﬁnd candidate articles for the dataset, we automati-cally crawled 3,410 news sources on a nightly basis. Ofthese news sources, 2,683 sources reported on at leastone protest in the dataset; 2,363 sources reported on atleast two protests; and 2,167 sources reported on at leastthree protests. Because the crawler automatically discov-ers candidate news articles using keywords, only 72,083 ofthe 138,826 URLs in the dataset actually describe protestevents. The other 66,743 URLs are negative exampleswith articles about cancelled protests, stock markets, sportsgames, and other topics. Figure 1 shows a histogram ofthe event count distribution for all crawled articles: 95.5% We do not have permission to directly distribute thecorpus due to intellectual property rights regarding the useof news articles. Following (Sharoff, 2005), we release alist of web URLs ( https://github.com/count-love/protest-data ) and example code ( https://github.com/count-love/crawler ) that can be used to recreate thecorpus. Given the transient nature of internet URLs, not all URLsremain available; however, where possible, multiple sources thatreport about the same event have been noted in the dataset. https://web.stanford.edu/group/collectiveaction Figure 1: Distribution of the number of protest events re-ported per article in the dataset between January 2017 andJanuary 2021.of the articles in the dataset have either zero, one, or twoprotest events.In addition to enumerating all protests reported in a singlenews article, the dataset also enumerates all articles that re-port about the same protest event—this happens frequentlyfor prominent protests, as well as when local newspaperssyndicate stories from wire services such as the AssociatedPress or Reuters.

Category Tags Events ArticlesCivil Rights 16,130 31,230Collective Bargaining 1,932 2,821Education 2,305 4,110Environment 1,940 3,209Executive 3,750 7,880Guns 3,917 5,262Healthcare 2,106 4,290Immigration 3,937 6,163International 747 908Judicial 426 684Legislative 531 733Other 5,156 7,589Table 1: Protest categories as of January 31, 2021Each protest event in the dataset contains details about thatprotest’s (1) date, (2) location, (3) number of attendees (ifreported ), (4) article references in the dataset, and (5) tagscorresponding to the category and reasons for protest. Forprotest tags, as of January 31, 2021, the dataset contains494 unique tags covering the 12 general categories shownin Table 1; 359 positions for or against a topic (e.g., “Forgreater gun control” or “Against white supremacy”); and For event attendee counts, we report the most speciﬁc, conser-vative estimate and choose a single source as ground truth if mul-tiple articles provide different estimates. We interpret “a dozen”as 10, “dozens” as 20, “hundreds” as 100, “a couple hundred” as200, etc.

23 details, including the names of national demonstrations(e.g., “Women’s March”) and common themes (e.g., “Po-lice”).Every event is labeled with at least one category and oneposition tag. Table 1 shows the representation of categorytags in the dataset as of January 31, 2021. Tags are not mu-tually exclusive, and an event can have multiple category,position, and detail tags.A unique set of tags denotes a speciﬁc, semantically uniqueprotest or protest objective—for example, “Civil Rights;For racial justice; For greater accountability; Police” de-notes protests against racially-motivated police brutality.As of January 31, 2021, the dataset had 2,395 unique com-binations of tags. Table 2 shows the top ﬁve tag sets byevent count. We developed the protest taxonomy initiallyby manually coding events with open-ended ﬁelds. Sixmonths after, we reviewed and binned the open-ended en-tries into recurring category, position, and detail tags.Tag Sets Events ArticlesCivil Rights; For racial justice;For greater accountability; Police 8,353 16,864Guns; For greater gun control;National Walkout Day 1,468 779Guns; For greater gun control 1,208 1,622Civil Rights; For women’s rights;Women’s March 1,021 1,502Healthcare; Coronavirus; Againstpandemic intervention 978 2,821Table 2: Top ﬁve most frequent protest tag sets as of Jan-uary 31, 2021. Each unique tag set denotes a distinct protestobjective or set of protests.As one of our contributions in this paper, we release thedataset just described. The dataset documents protests inthe United States reported by local news sources and willbe made available in an archival GitHub repository.As our primary contribution in this paper, we release theCount Love Protest Dataset just described. The datasetdocuments protests in the United States reported by lo-cal news sources and is available in an archival GitHubrepository at https://github.com/count-love/protest-data . When we started reviewing and labeling news reports inFebruary 2017, we ran into several issues: 1) the candi-date articles that we crawled often contained protest key-words even though the article itself did not describe protestevents; 2) the crawler found redundant coverage about thesame protest event; and 3) the crawler found exact copiesof syndicated articles. We observed these issues on a recur-ring basis due to the volume of our review load: betweenFebruary 2017 and January 2021, on average, we manuallyreviewed 99 articles each night with a peak of 1,296 arti-cles. To make article discovery and review possible with ateam of two researchers, we built a data collection pipelinethat relies heavily on automation and broadly follows thissequence: 1. each night, automatically crawl news sites to identifycandidate articles with protest keywords;2. automatically deduplicate syndicated articles andgroup similar articles by topic;3. automatically score candidate articles to determine ifthe article describes at least one protest event (the do-main detection task in NLP);4. automatically predict relevant protest details, such asthe total number of reported protest events in an arti-cle, to use as suggestions during manual review; and5. manually read articles to label protest events.We describe key components of our data collection pipelineand review process in greater detail in the remainder of thissection.

To ﬁnd news articles that potentially contain reports ofprotest events, we ﬁrst compiled a list of 3,410 URLs forlocal news sources in the United States. This list containsboth home pages and local/metro sections. We assembledthis list of news sources starting with newspapers linkedin Wikipedia results for the query “[state] newspapers.”We also periodically augmented our list of new sources byadding new sources found during validation of local protestevents planned as part of national advocacy efforts, suchas the March for Our Lives protests for greater gun controland the Families Belong protests for more compassionateimmigration policies.We crawl each news source on a nightly basis and followarticle links that contain the stem words “march,” “demon-stration,” “rally,” or “protest.” Restricting our crawl to ti-tles that contain stem words limits the scope of articles thatwe can automatically discover but also makes the crawltractable. For each downloaded article, we heuristically de-termine the text content by parsing the article using Beau-tifulSoup and heuristics modeled after Readability . Afterdownloading a candidate article, the text extractor scoresevery text element in the document object model based oncharacteristics such as length and punctuation and returnsthe most likely article text. We published the crawler’score logic at https://github.com/count-love/crawler . Many local news outlets will publish syndicated articles—articles that are fully, or nearly, a duplicate of another—from national services such as Associated Press andReuters. Additionally, on any given day, news outlets oftenreport about the same set of national topics, even if thoseoutlets do not use syndicated content. Grouping semanti-cally similar articles together before manual review sim-pliﬁes the task of identifying duplicate news coverage of aprotest event. https://github.com/mozilla/readability fter crawling each night, to detect potential duplicates andgroup similar articles together, we apply two locality sen-sitive hashing techniques: a Nilsimsa-based detector (com-monly also used to detect email spam) to identify similarparagraphs (Damiani et al., 2004) and a document signatureapproach based on comparing sets of text shingle hashes(Broder, 1997) to calculate the Jaccard similarity coefﬁ-cient between two arbitrary articles (Jaccard, 1912). TheJaccard similarity coefﬁcient ranges between 0 for com-pletely different articles to 1 for completely identical ar-ticles. For pairs of articles with high Jaccard similarity co-efﬁcients, if we have already reviewed one of the two, thenwe run a “diff” operation to highlight inserted and deletedwords in the new article. If a pair of articles has both ahigh Jaccard similarity coefﬁcient and a similar text “diff,”we automatically associate all events found in the originalarticle with the new article.By using the Jaccard similarity coefﬁcients as the distancemetric between all pairs of unreviewed articles, we can alsogeneralize the problem of grouping similar articles togetherto the traveling salesman problem (Dantzig et al., 1954):given a list of cities and the distances between all pairs ofcities, what is the shortest path that visits each city once?Empirically in our dataset, by sorting articles based on theshortest path, news articles about the same topic (even thosethat are not syndicated) share enough text shingles that theyoften appear near one another during review. In addition to ﬂagging duplicates and grouping similar arti-cles together to reduce the time required for manual review,we also train and utilize fully supervised, bidirectionallong short-term memory networks (BiLSTMs) (Graves andSchmidhuber, 2005) to help suggest decisions for the tasksof domain detection (does the article actually contain aprotest event) and event classiﬁcation (assigning category,protest position, and detail tags to protest events). We de-scribe these networks in greater detail in the following sec-tion.

For our neural network inputs, we rely on GloVe for wordembeddings (Pennington et al., 2014). To prepare eacharticle’s text for training and scoring, we make out-of-vocabulary substitutions to the crawler text for frequent,relevant phrases that lack corresponding GloVe embed-dings. To date, the most common out-of-vocabulary wordsare compound words related to counterprotests, such as“counterprotested” or “counterdemonstration.” Given thatthe subwords in these compound words do have GloVeembeddings, we hyphenate the compound words after“counter.” After making these substitutions, we use thespaCy library to tokenize and generate embedded inputs tofeed into all of our neural network models, including thosefor domain detection and event classiﬁcation. To assist with the task of domain detection—determiningwhether a candidate article actually describes a protest, or https://spacy.io/ an out-of-domain topic such as a stock market rally—wetrained a fully-supervised, single-layer, 64-unit BiLSTMwith a binary cross-entropy loss function and 50% dropoutto score new articles. Once scored, we manually review thetitles for the lowest scoring articles each night and decidewhich stories to completely skip without review. Impor-tantly, not all articles are eligible for skipping. We min-imize false positives—the error of skipping articles thatactually contain at least one protest event—by setting thethreshold score for articles eligible for title-only reviewsuch that the false positive rate is less than 1.7%, basedon the BiLSTM classiﬁer’s receiver operating characteris-tic curve. Similarly, to assist with event classiﬁcation—determiningthe category, protest position, and detail tags to assign toa particular protest event—we trained a fully supervised,single-layer, 256-unit BiLSTM with a binary cross-entropyloss function and 50% dropout to suggest potential eventtags for every candidate article. However, given that thetags that describe a protest event are generally not mutuallyexclusive (with the exception of opposite positional tags,such as “For greater gun control” and “Against greater guncontrol”), the tag network has a 494 dimensional output—one dimension for each tag. This event classiﬁcation net-work output suggests candidate tags for each article, en-couraging tag reuse and a minimal protest taxonomy.

After automatic crawling, deduplication, sorting, domaindetection, and classiﬁcation each night, we manually readeach candidate article to make ﬁnal event coding deci-sions. We also manually deduplicate events by date, lo-cation proximity, and protest tags.To date, we, as a team of two researchers, have coded theentire dataset since February 2017. Given the article reviewvolume, we typically assign one researcher per article andjointly adjudicate ambiguities in real time. Importantly fordata accuracy and consistency, the dataset released in thispaper does not contain unreviewed classiﬁer outputs. Al-though the machine learning models described in this sec-tion for domain detection and event classiﬁcation guide ourmanual data entry, they do not yet perform with sufﬁcientrecall and precision for fully automatic data entry (and con-sequently we do not utilize them in this fashion).

In Sections 2 and 3, we presented a protest dataset based onnews articles and described the semi-automated discoveryand review process used to build the dataset, including theuse of standard LSTM classiﬁcation networks to aid withinitial article domain detection and protest event classiﬁ-cation. In this section we benchmark the performance ofa token-, sentence-, and paragraph-based low-dimensionalclassiﬁcation network trained and tested with the dataset toautomatically count the number of protest events in eacharticle under fully supervised training. Counting the num-ber of reported protests in a news article represents a usefultep toward automatically extracting other relevant protestdetails because an event count can constrain the total num-ber of subsequent slot ﬁlling tasks (e.g., if an article con-tains two protests, then we must also identify two dates,locations, and reasons for protest). Motivated by the distri-bution of protests per article in the dataset (see Figure 1),we recast the problem of counting protest events reportedas a low dimensional classiﬁcation problem where the ﬁ-nal output can be one of four categories: zero, one, two, ormore than two protest events.

To implement a low-dimensional classiﬁer to predict thenumber of protest events in an article, we used a two-layerBiLSTM as shown in Figure 2.Figure 2: Given the length of most news articles, the net-work design for predicting the number of reported protestevents ﬁrst separates text into individual syntactic units,such as sentences or paragraphs, to help mitigate the long-term dependencies credit assignment problem.Given the length of most news articles, to help mitigate thefundamental challenge of credit assignment with long-termdependencies (Moniz and Krueger, 2017), the ﬁrst layer of128-unit BiLSTMs encodes syntactic units of text, such assentences or paragraphs. This intermediate encoding allowsthe neural network to summarize sections of an article, ei-ther at the sentence or the paragraph level. For an articlewith n syntactic units, the ﬁrst layer outputs a matrix of en-codings with size n × . The second layer of 64-unit BiL-STMs operates over all n syntactic units, producing an out-put vector of length 128. This output vector passes througha dense (fully connected) layer, followed by a ﬁnal softmaxlayer, to predict whether the article describes 0, 1, 2, or 3+protest events.To benchmark the performance gains of intermediate syn-tactic encoding, we also trained and tested a baseline,token-based network with no intermediate encoding. Thenetwork feeds each token through two BiLSTM layers us-ing the same number of layers, units, and weights as theparagraph and sentence variants (see Figure 3).For training and evaluation, we randomly split our corpusinto three groups: 70% for training, 15% for validation, and15% for testing. We used the same document groups to testtoken, sentence, and paragraph syntactic units. We trained each network using the Adam solver (Kingmaand Ba, 2015) by minimizing the categorical cross-entropyloss and passing in batches of 12 articles at a time over10,000 training iterations. Consistent with the events per Figure 3: For benchmarking, the token-based network con-tains the same number of layers, units, and weights as thesentence- and paragraph-based networks, but processes ar-ticles without intermediate encoding.article distribution shown in Figure 1, we selected articlesusing stratiﬁed sampling such that each batch contained 6articles with no protest events, 4 articles with one protestevent, 1 article with two protest events, and 1 article withthree or more protest events. We tried training variationswith larger network sizes and did not observe substantialdifferences in the ﬁnal benchmark metrics. In addition, wemeasured performance on the validation dataset every 500steps to avoid overﬁtting.

After training, we evaluated the network on the withhelddocuments in the test set. Table 3 shows the results for thelow-dimensional counting classiﬁcation network under allthree syntactic input variations: the token model with nointermediate encoding, the sentence-based model, and theparagraph-based model.All three model variants performed best at identifying ar-ticles with zero protest events—that is, identifying arti-cles that are outside the domain of protests. All threevariants performed worst at identifying articles with twoprotest events, most commonly classifying these articles ascontaining only a single event. These errors likely reﬂectthe similarities in representation between news articles thatcontain only one or two protest events, as well as the di-versity of representations for reporting about two protestevents (examples of which include reporting about a protestand counter-protest; a prior and current protest; or a currentand future protest).Compared to each other, the token model slightly outper-forms the sentence and paragraph models in identifying ar-ticles with no protest events. However, both the sentenceand the paragraph models performed better at distinguish-ing between articles with one or more events, suggestingthat the intermediate encoding of syntactic units can behelpful in mitigating long-term dependencies in the creditassignment problem for event counting.In summary, the F scores show that regardless of the syn-tactic unit used, recasting the counting problem to a catego-rization problem can be a useful simpliﬁcation for countingprotest events in news articles. The performance of the net-work with sentence- and paragraph-based intermediate en-coding also suggests that larger syntactic text structures aremeaningful units of analysis and can incrementally improvethe performance of existing LSTM networks. Local news journalists contribute to the body of civic databy documenting community-speciﬁc events. Given the vol- oken Model Sentence Model Paragraph ModelP R F P R F P R F Table 3: Results for the counting as a classiﬁcation network under three different syntactic units of analysis: tokens,sentences, and paragraphs. Results including precision (P), recall (R), and F score. The sentence model performed best,closely followed by the paragraph model.ume and frequency of their reporting work, extracting struc-tured data about these events and enabling citizens, policy-makers, journalists, activists, and researchers to query thesedata in aggregate can empower civic decision-making withfact-based, longitudinal analyses and narratives.To help improve the resources available for extracting struc-tured data from news stories, in this paper we introduced amanually labeled dataset of 42,347 protest events reportedin local news sources in the United States between January2017 and January 2021. The dataset contains metainfor-mation about each protest, such as when, where, and whythe protest occurred; a crowd size estimate; and a list ofURLs to news articles that describe the protest. The datasetalso contains negative examples of URLs to news articleswhose titles contain protest stem words, but do not actuallydescribe a protest event. We described the semi-automateddata collection pipeline that we built to ﬁnd, sort, and re-view news articles. Lastly, we benchmarked the perfor-mance of a low-dimensional counting classiﬁcation net-work, trained and tested using the dataset, to count the num-ber of protest events reported in a news article. We thank Marek ˇSuppa, William Li, and EmilyPrud’hommeaux for their encouragement and advice!

Andrews, K. T. and Biggs, M. (2006). The Dynamics ofProtest Diffusion: Movement Organizations, Social Net-works, and News Media in the 1960 Sit-Ins.

AmericanSociological Review , 71(5):752–777, oct.Broder, A. (1997). On the resemblance and containment ofdocuments. In

Proceedings. Compression and Complex-ity of SEQUENCES 1997 (Cat. No.97TB100171) , pages21–29. IEEE Comput. Soc.Damiani, E., di Vimercati, S. D. C., Paraboschi, S., andSamarati, P. (2004). An Open Digest-based Techniquefor Spam Detection.

Proceedings of the 2004 Interna-tional Workshop on Security in Parallel and DistributedSystems , 1(1):559–564.Dantzig, G., Fulkerson, R., and Johnson, S. (1954). So-lution of a Large-Scale Traveling-Salesman Problem.

Journal of the Operations Research Society of America ,2(4):393–410, nov.Graves, A. and Schmidhuber, J. (2005). Framewisephoneme classiﬁcation with bidirectional LSTM and other neural network architectures.

Neural Networks ,18(5-6):602–610, jul.Hakkani-T¨ur, D., T¨ur, G., Celikyilmaz, A., Chen, Y.-N., Gao, J., Deng, L., and Wang, Y.-Y. (2016).Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM.

Interspeech , pages 715–719.Hendrickx, I., Kim, S. N., Kozareva, Z., Nakov, P.,S´eaghdha, D. O., Pad´o, S., Pennacchiotti, M., Romano,L., and Szpakowicz, S. (2009). SemEval-2010 Task8: Multi-Way Classiﬁcation of Semantic Relations Be-tween Pairs of Nominals.

Proceedings of the Workshopon Semantic Evaluations: Recent Achievements and Fu-ture Directions , (July):94–99.Huang, R., Cases, I., Jurafsky, D., Condoravdi, C., andRiloff, E. (2016). Distinguishing Past, On-going, andFuture Events: The EventStatus Corpus. In

Proceedingsof the 2016 Conference on Empirical Methods in NaturalLanguage Processing , number 2, pages 44–54, Strouds-burg, PA, USA. Association for Computational Linguis-tics.Jaccard, P. (1912). The Distribution of the Flora in theAlpine Zone.

New Phytologist , 11(2):37–50.Kingma, D. P. and Ba, J. (2015). Adam: A Method forStochastic Optimization. In

International Conference onLearning Representations .Madestam, A., Shoag, D., Veuger, S., and Yanagizawa-Drott, D. (2013). Do Political Protests Matter? Evi-dence from the Tea Party Movement*.

The QuarterlyJournal of Economics , 128(4):1633–1685, nov.McAdam, D. and Su, Y. (2002). The War at Home: An-tiwar Protests and Congressional Voting, 1965 to 1973.

American Sociological Review , 67(5):696.Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L.,Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D.,and Zweig, G. (2015). Using recurrent neural networksfor slot ﬁlling in spoken language understanding.

IEEETransactions on Audio, Speech and Language Process-ing , 23(3):530–539.Moniz, J. R. A. and Krueger, D. (2017). Nested LSTMs.

Proceedings of the Ninth Asian Conference on MachineLearning , pages 1–15, jan.Pavlick, E., Ji, H., Pan, X., and Callison-Burch, C. (2016).The Gun Violence Database: A new task and data setfor NLP. In

Proceedings of the 2016 Conference onEmpirical Methods in Natural Language Processing ,pages 1018–1024, Stroudsburg, PA, USA. Associationor Computational Linguistics.Pennington, J., Socher, R., and Manning, C. (2014).Glove: Global Vectors for Word Representation. In

Pro-ceedings of the 2014 Conference on Empirical Methodsin Natural Language Processing (EMNLP) , pages 1532–1543, Stroudsburg, PA, USA. Association for Computa-tional Linguistics.Postma, M., Ilievski, F., and Vossen, P. (2018). SemEval-2018 Task 5: Counting Events and Participants inthe Long Tail. In

Proceedings of The 12th Interna-tional Workshop on Semantic Evaluation , pages 70–80,Stroudsburg, PA, USA. Association for ComputationalLinguistics.Sharoff, S. (2005). Open-Source Corpora: Using the Netto Fish for Linguistic Data.

International Journal of Cor-pus Linguistics , 11:435–462.Soderland, S., Cardie, C., and Mooney, R. (1999). Learn-ing Information Extraction Rules for Semi-Structuredand Free Text.

Machine Learning , 34:233–272.Soule, S. A. and Earl, J. (2005). A movement societyevaluated: Collective protest in the United States, 1960-1986.