[PDF] Temporal Event Knowledge Acquisition via Identifying Narratives

Abstract

Inspired by the double temporality characteristic of narrative texts, we propose a novel approach for acquiring rich temporal "before/after" event knowledge across sentences in narrative stories. The double temporality states that a narrative story often describes a sequence of events following the chronological order and therefore, the temporal order of events matches with their textual order. We explored narratology principles and built a weakly supervised approach that identifies 287k narrative paragraphs from three large text corpora. We then extracted rich temporal event knowledge from these narrative paragraphs. Such event knowledge is shown useful to improve temporal relation classification and outperform several recent neural network models on the narrative cloze task.

Full PDF

TTemporal Event Knowledge Acquisition via Identifying Narratives

Wenlin Yao and Ruihong Huang

Department of Computer Science and EngineeringTexas A&M University { wenlinyao, huangrh } @tamu.edu Abstract

Inspired by the double temporality char-acteristic of narrative texts, we propose anovel approach for acquiring rich tempo-ral “before/after” event knowledge acrosssentences in narrative stories. The dou-ble temporality states that a narrative storyoften describes a sequence of events fol-lowing the chronological order and there-fore, the temporal order of events matcheswith their textual order. We explorednarratology principles and built a weaklysupervised approach that identiﬁes 287knarrative paragraphs from three large textcorpora. We then extracted rich tem-poral event knowledge from these narra-tive paragraphs. Such event knowledgeis shown useful to improve temporal rela-tion classiﬁcation and outperform severalrecent neural network models on the nar-rative cloze task.

Occurrences of events, referring to changes andactions, show regularities. Speciﬁcally, certainevents often co-occur and in a particular temporalorder. For example, people often go to work af-ter graduation with a degree. Such “before/after”temporal event knowledge can be used to recog-nize temporal relations between events in a doc-ument even when their local contexts do not in-dicate any temporal relations. Temporal eventknowledge is also useful to predict an event givenseveral other events in the context. Improvingevent temporal relation identiﬁcation and eventprediction capabilities can beneﬁt various NLPapplications, including event timeline generation,text summarization and question answering.While being in high demand, temporal event

Michael Kennedy graduated with a bachelor’s degree fromHarvard University in 1980. He married his wife, Vic-toria, in 1981 and attended law school at the Universityof Virginia. After receiving his law degree, he brieﬂyworked for a private law ﬁrm before joining Citizens En-ergy Corp. He took over management of the corporation,a non-proﬁt ﬁrm that delivered heating fuel to the poor,from his brother Joseph in 1988. Kennedy expanded theorganization goals and increased fund raising.Beth paid the taxi driver. She jumped out of the taxi andheaded towards the door of her small cottage. She reachedinto her purse for keys. Beth entered her cottage and gotundressed. Beth quickly showered deciding a bath wouldtake too long. She changed into a pair of jeans, a tee shirt,and a sweater. Then, she grabbed her bag and left thecottage.

Figure 1: Two narrative examplesknowledge is lacking and difﬁcult to obtain. Ex-isting knowledge bases, such as Freebase (Bol-lacker et al., 2008) or Probase (Wu et al., 2012),often contain rich knowledge about entities, e.g.,the birthplace of a person, but contain little eventknowledge. Several approaches have been pro-posed to acquire temporal event knowledge froma text corpus, by either utilizing textual patterns(Chklovski and Pantel, 2004) or building a tempo-ral relation identiﬁer (Yao et al., 2017). However,most of these approaches are limited to identifyingtemporal relations within one sentence.Inspired by the double temporality character-istic of narrative texts, we propose a novel ap-proach for acquiring rich temporal “before/after”event knowledge across sentences via identify-ing narrative stories. The double temporalitystates that a narrative story often describes a se-quence of events following the chronological or-der and therefore, the temporal order of eventsmatches with their textual order (Walsh, 2001;Riedl and Young, 2010; Grabes, 2013). There-fore, we can easily distill temporal event knowl-edge if we have identiﬁed a large collection of a r X i v : . [ c s . C L ] M a y arrative texts. Consider the two narrative ex-amples in ﬁgure 1, where the top one is froma news article of New York Times and the bot-tom one is from a novel book. From the topone, we can easily extract one chronologically or-dered event sequence { graduated, marry, attend,receive, work, take over, expand, increase } , withall events related to the main character MichaelKennedy. While some parts of the event sequenceare speciﬁc to this story, the event sequence con-tains regular event temporal relations, e.g., peopleoften { graduate } ﬁrst and then get { married } , or { take over } a role ﬁrst and then { expand } a goal.Similarly, from the bottom one, we can easily ex-tract another event sequence { pay, jump out, head,reach into, enter, undress, shower, change, grab,leave } that contains routine actions when peopletake a shower and change clothes.There has been recent research on narrativeidentiﬁcation from blogs by building a text clas-siﬁer in a supervised manner (Gordon and Swan-son, 2009; Ceran et al., 2012). However, narra-tive texts are common in other genres as well,including news articles and novel books, wherelittle annotated data is readily available. There-fore, in order to identify narrative texts from richsources, we develop a weakly supervised methodthat can quickly adapt and identify narrative textsfrom different genres, by heavily exploring theprinciples that are used to characterize narrativestructures in narratology studies. It is generallyagreed in narratology (Forster, 1962; Mani, 2012;Pentland, 1999; Bal, 2009) that a narrative is a dis-course presenting a sequence of events arranged intheir time order (the plot) and involving speciﬁccharacters (the characters). First, we derive spe-ciﬁc grammatical and entity co-reference rules toidentify narrative paragraphs that each contains asequence of sentences sharing the same actantialsyntax structure (i.e., NP VP describing a charac-ter did something ) (Greimas, 1971) and mention-ing the same character. Then, we train a classiﬁerusing the initially identiﬁed seed narrative textsand a collection of grammatical, co-reference andlinguistic features that capture the two key princi-ples and other textual devices of narratives. Next,the classiﬁer is applied back to identify new narra-tives from raw texts. The newly identiﬁed narra-tives will be used to augment seed narratives andthe bootstrapping learning process iterates until noenough new narratives can be found. Then by leveraging the double temporality char-acteristic of narrative paragraphs, we distill gen-eral temporal event knowledge. Speciﬁcally, weextract event pairs as well as longer event se-quences consisting of strongly associated eventsthat often appear in a particular textual order innarrative paragraphs, by calculating Causal Poten-tial (Beamer and Girju, 2009; Hu et al., 2013) be-tween events.Speciﬁcally, we obtained 19k event pairs and25k event sequences with three to ﬁve events fromthe 287k narrative paragraphs we identiﬁed acrossthree genres, news articles, novel books and blogs.Our evaluation shows that both the automaticallyidentiﬁed narrative paragraphs and the extractedevent knowledge are of high quality. Furthermore,the learned temporal event knowledge is shown toyield additional performance gains when used fortemporal relation identiﬁcation and the NarrativeCloze task. The acquired event temporal knowl-edge and the knowledge acquisition system arepublicly available . Several previous works have focused on acquir-ing temporal event knowledge from texts. Ver-bOcean (Chklovski and Pantel, 2004) used pre-deﬁned lexico-syntactic patterns (e.g., “X and thenY”) to acquire event pairs with the temporal hap-pens before relation from the Web. Yao et al.(2017) simultaneously trained a temporal “be-fore/after” relation classiﬁer and acquired eventpairs that are regularly in a temporal relation byexploring the observation that some event pairstend to show the same temporal relation regardlessof contexts. Note that these prior works are limitedto identifying temporal relations within individualsentences. In contrast, our approach is designedto acquire temporal relations across sentences in anarrative paragraph. Interestingly, only 195 (1%)out of 19k event pairs acquired by our approachcan be found in VerbOcean or regular event pairslearned by the previous two approaches.Our design of the overall event knowledge ac-quisition also beneﬁts from recent progress on nar-rative identiﬁcation. Gordon and Swanson (2009)annotated a small set of paragraphs presenting sto-ries in the ICWSM Spinn3r Blog corpus (Burtonet al., 2009) and trained a classiﬁer using bag-of-words features to identify more stories. (Ceran http://nlp.cs.tamu.edu/resources.html t al., 2012) trained a narrative classiﬁer using se-mantic triplet features on the CSC Islamic Extrem-ist corpus. Our weakly supervised narrative iden-tiﬁcation method is closely related to Eisenbergand Finlayson (2017), which also explored the twokey elements of narratives, the plot and the charac-ters, in designing features with the goal of obtain-ing a generalizable story detector. But differentfrom this work, our narrative identiﬁcation methoddoes not require any human annotations and canquickly adapt to new text sources.Temporal event knowledge acquisition is re-lated to script learning (Chambers and Jurafsky,2008), where a script consists of a sequence ofevents that are often temporally ordered and rep-resent a typical scenario. However, most of theexisting approaches on script learning (Chambersand Jurafsky, 2009; Pichotta and Mooney, 2016;Granroth-Wilding and Clark, 2016) were designedto identify clusters of closely related events, not tolearn the temporal order between events though.For example, Chambers and Jurafsky (2008, 2009)learned event scripts by ﬁrst identifying closely re-lated events that share an argument and then rec-ognizing their partial temporal orders by a separatetemporal relation classiﬁer trained on the small la-beled dataset TimeBank (Pustejovsky et al., 2003).Using the same method to get training data, Janset al. (2012); Granroth-Wilding and Clark (2016);Pichotta and Mooney (2016); Wang et al. (2017)applied neural networks to learn event embed-dings and predict the following event in a con-text. Distinguished from the previous script learn-ing works, we focus on acquiring event pairs orlonger script-like event sequences with events ar-ranged in a complete temporal order. In addi-tion, recent works (Regneri et al., 2010; Modiet al., 2016) collected script knowledge by directlyasking Amazon Mechanical Turk (AMT) to writedown typical temporally ordered event sequencesin a given scenario (e.g., shopping or cooking). In-terestingly, our evaluation shows that our approachcan yield temporal event knowledge that covers48% of human-provided script knowledge. It is generally agreed in narratology (Forster,1962; Mani, 2012; Pentland, 1999; Bal, 2009) thata narrative presents a sequence of events arrangedin their time order (the plot) and involving speciﬁccharacters (the characters).

Plot . The plot consists of a sequence of closely re-lated events. According to (Bal, 2009), an event ina narrative often describes a “transition from onestate to another state, caused or experienced by ac-tors”. Moreover, as Mani (2012) illustrates, a nar-rative is often “an account of past events in some-one’s life or in the development of something”.These prior studies suggest that sentences contain-ing a plot event are likely to have the actantial syn-tax “NP VP” (Greimas, 1971) with the main verbin the past tense. Character . A narrative usually describes eventscaused or experienced by actors. Therefore, a nar-rative story often has one or two main characters,called protagonists, who are involved in multipleevents and tie events together. The main charactercan be a person or an organization.

Other Textual Devices . A narrative may containperipheral contents other than events and charac-ters, including time, place, the emotional and psy-chological states of characters etc., which do notadvance the plot but provide essential informationto the interpretation of the events (Pentland, 1999).We use rich Linguistic Inquiry and Word Count(LIWC) (Pennebaker et al., 2015) features to cap-ture a variety of textual devices used to describesuch contents.

In order to acquire rich temporal event knowledge,we ﬁrst develop a weakly supervised approach thatcan quickly adapt to identify narrative paragraphsfrom various text sources.

The weakly supervised method is designed to cap-ture key elements of narratives in each of twostages. As shown in ﬁgure 2, in the ﬁrst stage, weidentify the initial batch of narrative paragraphsthat satisfy strict rules and the key principles ofnarratives. Then in the second stage, we train a sta-tistical classiﬁer using the initially identiﬁed seednarrative texts and a collection of soft features forcapturing the same key principles and other tex-tual devices of narratives. Next, the classiﬁer isapplied to identify new narratives from raw textsagain. The newly identiﬁed narratives will be usedto augment seed narratives and the bootstrapping NP is Noun Phrase and VP is Verb Phrase. igure 2: Overview of the Narrative Learning Systemlearning process iterates until no enough (speciﬁ-cally, less than 2,000) new narratives can be found.Here, in order to specialize the statistical classiﬁerto each genre, we conduct the learning process onnews, novels and blogs separately. .Guided by the prior narratology studies (Greimas,1971; Mani, 2012) and our observations, we usecontext-free grammar production rules to identifysentences that describe an event in an actantialsyntax structure. Speciﬁcally, we use three setsof grammar rules to specify the overall syntacticstructure of a sentence. First, we require a sen-tence to have the basic active voiced structure “S → NP VP” or one of the more complex sentencestructures that are derived from the basic struc-ture considering Coordinating Conjunctions (CC),Adverbial Phrase (ADVP) or Prepositional Phrase(PP) attachments . For example, in the narrative ofFigure 1, the sentence “Michael Kennedy earneda bachelor’s degree from Harvard University in1980.” has the basic sentence structure “S → NPVP”, where the “NP” governs the character men-tion of ‘Michael Kennedy’ and the “VP” governsthe rest of the sentence and describes a plot event.In addition, considering that a narrative is usu-ally “an account of past events in someone’s lifeor in the development of something” (Mani, 2012;Dictionary, 2007), we require the headword of theVP to be in the past tense. Furthermore, the sub-ject of the sentence is meant to represent a char-acter. Therefore, we specify 12 grammar rules to We manually identiﬁed 14 top-level sentence productionrules, for example, “S → NP ADVP VP”, “S → PP , NP VP”and “S → S CC S”. Appendix shows all the rules. The example NP rules include “NP → NNP”, “NP → NP CC NP” and “NP → DT NNP”. require the sentence subject noun phrase to havea simple structure and have a proper noun or pro-noun as its head word.For seed narratives, we consider paragraphscontaining at least four sentences and we require60% or more sentences to satisfy the sentencestructure speciﬁed above. We also require a nar-rative paragraph to contain no more than 20% ofsentences that are interrogative, exclamatory or di-alogue, which normally do not contain any plotevents. The speciﬁc parameter settings are mainlydetermined based on our observations and anal-ysis of narrative samples. The threshold of 60%for “sentences with actantial structure” was set toreﬂect the observation that sentences in a narra-tive paragraph usually (over half) have an actan-tial structure. A small portion (20%) of interroga-tive, exclamatory or dialogue sentences is allowedto reﬂect the observation that many paragraphs areoverall narratives even though they may contain1 or 2 such sentences, so that we achieve a goodcoverage in narrative identiﬁcation.

The Character Rule . A narrative usually has aprotagonist character that appears in multiple sen-tences and ties a sequence of events, therefore,we also specify a rule requiring a narrative para-graph to have a protagonist character. Concretely,inspired by Eisenberg and Finlayson (2017), weapplied the named entity recognizer (Finkel et al.,2005) and entity coreference resolver (Lee et al.,2013) from the CoreNLP toolkit (Manning et al.,2014) to identify the longest entity chain in a para-graph that has at least one mention recognized asa

Person or Organization , or a gendered pronoun.Then we calculate the normalized length of thisentity chain by dividing the number of entity men-tions by the number of sentences in the paragraph.We require the normalized length of this longestntity chain to be ≥ . , meaning that 40% ormore sentences in a narrative mention a character . Using the seed narrative paragraphs identiﬁed inthe ﬁrst stage as positive instances, we train a sta-tistical classiﬁer to continue to identify more nar-rative paragraphs that may not satisfy the speciﬁcrules. We also prepare negative instances to com-pete with positive narrative paragraphs in train-ing. Negative instances are paragraphs that are notlikely to be narratives and do not present a plot orprotagonist character, but are similar to seed narra-tives in others aspects. Speciﬁcally, similar to seednarratives, we require a non-narrative paragraph tocontain at least four sentences with no more than20% of sentences being interrogative, exclamatoryor dialogue; but in contrast to seed narratives, anon-narrative paragraph should contain 30% of orfewer sentences that have the actantial sentencestructure, where the longest character entity chainshould not span over 20% of sentences. We ran-domly sample such non-narrative paragraphs thatare ﬁve times of narrative paragraphs .In addition, since it is infeasible to apply thetrained classiﬁer to all the paragraphs in a largetext corpus, such as the Gigaword corpus (Graffand Cieri, 2003), we identify candidate narrativeparagraphs and only apply the statistical classiﬁerto these candidate paragraphs. Speciﬁcally, we re-quire a candidate paragraph to satisfy all the con-straints used for identifying seed narrative para-graphs but contain only 30% or more sentenceswith an actantial structure and have the longestcharacter entity chain spanning over 20% of ormore sentences.We choose Maximum Entropy (Berger et al.,1996) as the classiﬁer. Speciﬁcally, we use theMaxEnt model implementation in the LIBLIN-

40% was chosen to reﬂect that a narrative paragraph of-ten contains a main character that is commonly mentionedacross sentences (half or a bit less than half of all the sen-tences). We used the skewed pos:neg ratio of 1:5 in all bootstrap-ping iterations to reﬂect the observation that there are gener-ally many more non-narrative paragraphs than narrative para-graphs in a document. This value is half of the corresponding thresshold usedfor identifying seed narrative paragraphs. This value is half of the corresponding thresshold usedfor identifying seed narrative paragraphs.

EAR library (Fan et al., 2008) with default pa-rameter settings. Next, we describe the featuresused to capture the key elements of narratives. Features for Identifying Plot Events:

Realiz-ing that grammar production rules are effective inidentifying sentences that contain a plot event, weencode all the production rules as features in thestatistical classiﬁer. Speciﬁcally, for each narra-tive paragraph, we use the frequency of all syntac-tic production rules as features. Note that the bot-tom level syntactic production rules have the formof POS tag → WORD and contain a lexical word,which made these rules dependent on speciﬁc con-texts of a paragraph. Therefore, we exclude thesebottom level production rules from the feature setin order to model generalizable narrative elementsrather than speciﬁc contents of a paragraph.In addition, to capture potential event sequenceoverlaps between new narratives and the alreadylearned narratives, we build a verb bigram lan-guage model using verb sequences extracted fromthe learned narrative paragraphs and calculate theperplexity score (as a feature) of the verb sequencein a candidate narrative paragraph. Speciﬁcally,we calculate the perplexity score of an event se-quence that is normalized by the number of events,

P P ( e , ..., e N ) = N (cid:113)(cid:81) Ni =1 1 P ( e i | e i − ) , where N isthe total number of events in a sequence and e i is a event word. We approximate P ( e i | e i − ) = C ( e i − ,e i ) C ( e i − ) , where C ( e i − ) is the number of oc-currences of e i − and C ( e i − , e i ) is the numberof co-occurrences of e i − and e i . C ( e i − , e i ) and C ( e i − ) are calculated based on all event se-quences from known narrative paragraphs. Features for the Protagonist Characters:

Weconsider the longest three coreferent entity chainsin a paragraph that have at least one mention rec-ognized as a

Person or Organization , or a gen-dered pronoun. Similar to the seed narrative iden-tiﬁcation stage, we obtain the normalized lengthof each entity chain by dividing the number ofentity mentions with the number of sentences inthe paragraph. In addition, we also observe thata protagonist character appears frequently in thesurrounding paragraphs as well, therefore, we cal-culate the normalized length of each entity chainbased on its presences in the target paragraph aswell as one preceding paragraph and one follow- (Seeds) 1 2 3 4 TotalNews 20k 40k 12k 5k 1k 78kNovels 75k 82k 24k 6k 2k 189kBlogs 6k 10k 3k 1k - 20kSum 101k 132k 39k 12k 3k 287k Table 1: Number of new narratives generated aftereach bootstrapping iterationing paragraph. We use 6 normalized lengths (3from the target paragraph and 3 from surround-ing paragraphs) as features. Other Writing Style Features:

We create a fea-ture for each semantic category in the LinguisticInquiry and Word Count (LIWC) dictionary (Pen-nebaker et al., 2015), and the feature value is thetotal number of occurrences of all words in thatcategory. These LIWC features capture presencesof certain types of words, such as words denotingrelativity (e.g., motion, time, space) and words re-ferring to psychological processes (e.g., emotionand cognitive). In addition, we encode Parts-of-Speech (POS) tag frequencies as features as wellwhich have been shown effective in identifyingtext genres and writing styles.

Our weakly supervised system is based on theprinciples shared across all narratives, so it canbe applied to different text sources for identify-ing narratives. We considered three types of texts:(1)

News Articles . News articles contain narrativeparagraphs to describe the background of an im-portant ﬁgure or to provide details for a signiﬁcantevent. We use English Gigaword 5th edition (Graffand Cieri, 2003; Napoles et al., 2012), which con-tains 10 million news articles. (2)

Novel Books .Novels contain rich narratives to describe actionsby characters. BookCorpus (Zhu et al., 2015) is alarge collection of free novel books written by un-published authors, which contains 11,038 books of16 different sub-genres (e.g., Romance, Historical,Adventure, etc.). (3)

Blogs . Vast publicly accessi-ble blogs also contain narratives because “personallife and experiences” is a primary topic of blogposts (Lenhart, 2006). We use the Blog Author-ship Corpus (Schler et al., 2006) collected fromthe blogger.com website, which consists of 680kposts written by thousands of authors. We applied Speciﬁcally, the lengths of the longest, second longestand third longest entity chains. the Stanford CoreNLP tools (Manning et al., 2014)to the three text corpora to obtain POS tags, parsetrees, named entities, coreference chains, etc.In order to combat semantic drifts (McIntoshand Curran, 2009) in bootstrapping learning, weset the initial selection conﬁdence score producedby the statistical classiﬁer at 0.5 and increase it by0.05 after each iteration. The bootstrapping sys-tem runs for four iterations and learns 287k narra-tive paragraphs in total. Table 1 shows the num-ber of narratives that were obtained in the seed-ing stage and in each bootstrapping iteration fromeach text corpus.

Narratives we obtained from the ﬁrst phase maydescribe speciﬁc stories and contain uncommonevents or event transitions. Therefore, we applyPointwise Mutual Information (PMI) based statis-tical metrics to measure strengths of event tempo-ral relations in order to identify general knowl-edge that is not speciﬁc to any particular story.Our goal is to learn event pairs and longer eventchains with events completely ordered in the tem-poral “before/after” relation.First, by leveraging the double temporality char-acteristic of narratives, we only consider eventpairs and longer event chains with 3-5 events thathave occurred as a segment in at least one eventsequence extracted from a narrative paragraph.Speciﬁcally, we extract the event sequence (theplot) from a narrative paragraph by ﬁnding themain event in each sentence and chaining the mainevents according to their textual order.Then we rank candidate event pairs based ontwo factors, how strongly associated two eventsare and how common they appear in a particu-lar temporal order. We adopt the existing met-ric, Causal Potential (CP), which has been ap-plied to acquire causally related events (Beamerand Girju, 2009) and exactly measures the two as-pects. Speciﬁcally, the CP score of an event pair iscalculated using the following equation: cp ( e i , e j ) = pmi ( e i , e j ) + log P ( e i → e j ) P ( e j → e i ) (1) where, the ﬁrst part refers to the Pointwise MutualInformation (PMI) between two events and the We only consider main events that are in base verb formsor in the past tense, by requiring their POS tags to be VB,VBP, VBZ or VBD. econd part measures the relative ordering or twoevents. P ( e i → e j ) refers to the probability that e i occurs before e j in a text, which is proportionalto the raw frequency of the pair. PMI measuresthe association strength of two events, formally, pmi ( e i , e j ) = log P ( e i ,e j ) P ( e i ) P ( e j ) , P ( e i ) = C ( e i ) (cid:80) x C ( e x ) and P ( e i , e j ) = C ( e i ,e j ) (cid:80) x (cid:80) y C ( e x ,e y ) , where, x and y refer to all the events in a corpus, C ( e i ) is the num-ber of occurrences of e i , C ( e i , e j ) is the number ofco-occurrences of e i and e j .While each candidate pair of events shouldhave appeared consecutively as a segment in atleast one narrative paragraph, when calculating theCP score, we consider event co-occurrences evenwhen two events are not consecutive in a narra-tive paragraph but have one or two other eventsin between. Speciﬁcally, the same as in (Hu andWalker, 2017), we calculate separate CP scoresbased on event co-occurrences with zero (consec-utive), one or two events in between, and use theweighted average CP score for ranking an eventpair, formally, CP ( e i , e j ) = (cid:80) d =1 cp d ( e i ,e j ) d .Then we rank longer event sequences based onCP scores for individual event pairs that are in-cluded in an event sequence. However, an eventsequence of length n is more than n − eventpairs with any two consecutive events as a pair.We prefer event sequences that are coherent over-all, where the events that are one or two eventsaway are highly related as well. Therefore, we de-ﬁne the following metric to measure the quality ofan event sequence: CP ( e , e , · · · , e n ) = (cid:80) d =1 (cid:80) n − dj =1 CP ( e j ,e j + d ) d n − . (2) From all the learned narrative paragraphs, we ran-domly selected 150 texts, with 25 texts selectedfrom narratives learned in each of the two stages(i.e., seed narratives and bootstrapped narratives)using each of the three text corpora (i.e., news,novels, and blogs). Following the same deﬁnition“A story is a narrative of events arranged in theirtime sequence” (Forster, 1962; Gordon and Swan-son, 2009), two human adjudicators were asked tojudge whether each text is a narrative or a non-narrative. In order to obtain high inter-agreements,before the ofﬁcial annotations, we trained the twoannotators for several iterations. Note that the Narratives Seed BootstrappedNews 0.84 0.72Novel 0.88 0.92Blogs 0.92 0.88AVG 0.88 0.84Table 2: Precision of narratives based on humanannotation pairs graduate → teach (5.7), meet → marry (5.3)pick up → carry (6.3), park → get out (7.3)turn around → face (6.5), dial → ring (6.3)chains drive → park → get out (7.8)toss → ﬂy → land (5.9)grow up → attend → graduate → marry (6.9)contact → call → invite → accept (4.2)knock → open → reach → pull out → hold (6.0) Table 3: Examples of event pairs and chains (withCP scores). → represents before relation.texts we used in training annotators are differentfrom the ﬁnal texts we used for evaluation pur-poses. The overall kappa inter-agreement betweenthe two annotators is 0.77.Table 2 shows the precision of narrativeslearned in the two stages using the three corpora.We determined that a text is a correct narrativeif both annotators labeled it as a narrative. Wecan see that on average, the rule-based classiﬁerachieves the precision of 88% on initializing seednarratives and the statistical classiﬁer achieves theprecision of 84% on bootstrapping new ones. Us-ing narratology based features enables the statis-tical classiﬁer to extensively learn new narrative,and meanwhile maintain a high precision. To evaluate the quality of the extracted event pairsand chains, we randomly sampled 20 pairs (2%)from every 1,000 event pairs up to the top 18,929pairs with CP score ≥ (250 chains selected intotal). The average CP scores for all event pairsand all event chains we considered are 2.9 and5.1 respectively. Two human adjudicators wereasked to judge whether or not events are likelyto occur in the temporal order shown. For eventchains, we have one additional criterion requiringthat events form a coherent sequence overall. An It turns out that many event chains have a high CP scoreclose to 5.0, so we decided not to use a cut-off CP score ofevent chains but simply chose to evaluate the top 25,000 eventchains. igure 3: Top-ranked event pairs evaluation

Table 4: Precision of top-ranked event chainsevent pair/chain is deemed correct if both anno-tators labeled it as correct. The two annotatorsachieved kappa inter-agreement scores of 0.71 and0.66, on annotating event pairs and event chainsrespectively.As we know, coverage on acquired knowledgeis often hard to evaluate because we do not havea complete knowledge base to compare to. Thus,we propose a pseudo recall metric to evaluate thecoverage of event knowledge we acquired. Reg-neri et al. (2010) collected Event Sequence De-scriptions (ESDs) of several types of human ac-tivities (e.g., baking a cake, going to the theater,etc.) using crowdsourcing. Our ﬁrst pseudo re-call score is calculated based on how many con-secutive event pairs in human-written scripts canbe found in our top-ranked event pairs. Figure 3illustrates the precision of top-ranked pairs basedon human annotation and the pseudo recall scorebased on ESDs. We can see that about 75% ofthe top 19k event pairs are correct, which captures48% of human-written script knowledge in ESDs.In addition, table 4 shows the precision of top-ranked event chains with 3 to 5 events. Amongthe top 25k event chains, about 70% are correctlyordered with the temporal “after” relation. Table 3shows several examples of event pairs and chains.

To ﬁnd out whether the learned temporal eventknowledge can help with improving temporal re- Models Acc.(%)Choubey and Huang (2017) 51.2+ CP score

Table 5: Results on TimeBank corpusMethod Acc.(%)(Chambers and Jurafsky, 2008) 30.92(Granroth-Wilding and Clark, 2016) 43.28(Pichotta and Mooney, 2016) 43.17(Wang et al., 2017) 46.67Our Results

Table 6: Results on MCNC tasklation classiﬁcation performance, we conductedexperiments on a benchmark dataset - TimeBankcorpus v1.2, which contains 2308 event pairs thatare annotated with 14 temporal relations .To facilitate direct comparisons, we used thesame state-of-the-art temporal relation classiﬁca-tion system as described in our previous workChoubey and Huang (2017) and considered allthe 14 relations in classiﬁcation. Choubey andHuang (2017) forms three sequences (i.e., wordforms, POS tags, and dependency relations) ofcontext words that align with the dependency pathbetween two event mentions and uses three bi-directional LSTMs to get the embedding of eachsequence. The ﬁnal fully connected layer mapsthe concatenated embeddings of all sequences to14 ﬁne-grained temporal relations. We applied thesame model here, but if an event pair appears inour learned list of event pairs, we concatenated theCP score of the event pair as additional evidence inthe ﬁnal layer. To be consistent with Choubey andHuang (2017), we used the same train/test split-ting, the same parameters for the neural networkand only considered intra-sentence event pairs.Table 5 shows that by incorporating our learnedevent knowledge, the overall prediction accuracywas improved by 1.1%. Not surprisingly, out ofthe 14 temporal relations, the performance on therelation before was improved the most by 4.9%. Multiple Choice version of the Narrative Clozetask (MCNC) proposed by Granroth-Wilding andClark (2016); Wang et al. (2017), aims to eval- Speciﬁcally, the 14 relations are simultaneous, before,after, ibefore, iafter, begins, begun by, ends, ended by, in-cludes, is included, during, during inv, identity ate understanding of a script by predicting thenext event given several context events. Present-ing a chain of contextual events e , e , ..., e n − ,the task is to select the next event from ﬁve eventcandidates, one of which is correct and the oth-ers are randomly sampled elsewhere in the cor-pus. Following the same settings of Wang et al.(2017) and Granroth-Wilding and Clark (2016),we adapted the dataset (test set) of Chambers andJurafsky (2008) to the multiple choice setting. Thedataset contains 69 documents and 349 multiplechoice questions.We calculated a PMI score between a candidateevent and each context event e , e , ..., e n − basedon event sequences extracted from our learned287k narratives and we chose the event that havethe highest sum score of all individual PMI scores.Since the prediction accuracy on 349 multiplechoice questions depends on the random initial-ization of four negative candidate events, we ranthe experiment 10 times and took the average ac-curacy as the ﬁnal performance.Table 6 shows the comparisons of our resultswith the performance of several previous models,which were all trained with 1,500k event chainsextracted from the NYT portion of the Gigawordcorpus (Graff and Cieri, 2003). Each event chainconsists of a sequence of verbs sharing an actorwithin a news article. Except Chambers and Ju-rafsky (2008), other recent models utilized moreand more sophisticated neural language models.Granroth-Wilding and Clark (2016) proposed atwo layer neural network model that learns embed-dings of event predicates and their arguments forpredicting the next event. Pichotta and Mooney(2016) introduced a LSTM-based language modelfor event prediction. Wang et al. (2017) used dy-namic memory as attention in LSTM for predic-tion. It is encouraging that by using event knowl-edge extracted from automatically identiﬁed nar-ratives, we achieved the best event prediction per-formance, which is 2.2% higher than the best neu-ral network model. This paper presents a novel approach for leverag-ing the double temporality characteristic of narra-tive texts and acquiring temporal event knowledgeacross sentences in narrative paragraphs. We de-veloped a weakly supervised system that exploresnarratology principles and identiﬁes narrative texts from three text corpora of distinct genres. Thetemporal event knowledge distilled from narrativetexts were shown useful to improve temporal re-lation classiﬁcation and outperform several neurallanguage models on the narrative cloze task. Forthe future work, we plan to expand event temporalknowledge acquisition by dealing with event sensedisambiguation and event synonym identiﬁcation(e.g., drag, pull and haul).

We thank our anonymous reviewers for providinginsightful review comments.

References

Mieke Bal. 2009.

Narratology: Introduction to the the-ory of narrative . University of Toronto Press.Brandon Beamer and Roxana Girju. 2009. Using a bi-gram event model to predict causal potential. In

CI-CLing . Springer, pages 430–441.Adam L Berger, Vincent J Della Pietra, and StephenA Della Pietra. 1996. A maximum entropy approachto natural language processing.

Computational lin-guistics

Proceedings of the 2008 ACMSIGMOD international conference on Managementof data . AcM, pages 1247–1250.Kevin Burton, Akshay Java, and Ian Soboroff. 2009.The icwsm 2009 spinn3r dataset. In

Third AnnualConference on Weblogs and Social Media (ICWSM2009) . AAAI.Betul Ceran, Ravi Karad, Steven Corman, and HasanDavulcu. 2012. A hybrid model and memory basedstory classiﬁer. In

Proceedings of the 3rd Workshopon Computational Models of Narrative . pages 58–62.Nathanael Chambers and Dan Jurafsky. 2009. Unsu-pervised learning of narrative schemas and their par-ticipants. In

Proceedings of the Joint Conference ofthe 47th Annual Meeting of the ACL and the 4th In-ternational Joint Conference on Natural LanguageProcessing of the AFNLP: Volume 2-Volume 2 . As-sociation for Computational Linguistics, pages 602–610.Nathanael Chambers and Daniel Jurafsky. 2008. Unsu-pervised learning of narrative event chains. In

ACL .volume 94305, pages 789–797.Timothy Chklovski and Patrick Pantel. 2004. Verbo-cean: Mining the web for ﬁne-grained semantic verbelations. In

Proceedings of the 2004 Conference onEmpirical Methods in Natural Language Process-ing .Prafulla Kumar Choubey and Ruihong Huang. 2017. Asequential model for classifying temporal relationsbetween intra-sentence events. In

Proceedings of the2017 Conference on Empirical Methods in NaturalLanguage Processing . pages 1796–1802.Oxford English Dictionary. 2007. Oxford english dic-tionary online.Joshua Eisenberg and Mark Finlayson. 2017. A sim-pler and more generalizable story detector usingverb and character features. In

Proceedings of the2017 Conference on Empirical Methods in NaturalLanguage Processing . pages 2698–2705.Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. Liblinear: Alibrary for large linear classiﬁcation.

Journal of ma-chine learning research

Proceedings of the 43rd annual meet-ing on association for computational linguistics . As-sociation for Computational Linguistics, pages 363–370.Edward Morgan Forster. 1962. Aspects of the novel.1927.

Ed. Oliver Stallybrass .Andrew Gordon and Reid Swanson. 2009. Identify-ing personal stories in millions of weblog entries.In

Third International Conference on Weblogs andSocial Media, Data Challenge Workshop, San Jose,CA . volume 46.Hebert Grabes. 2013. Sequentiality.

Handbook of Nar-ratology

Linguistic Data Consortium .Mark Granroth-Wilding and Stephen Clark. 2016.What happens next? event prediction using a com-positional neural network model. In

AAAI . pages2727–2733.Algirdas Julien Greimas. 1971. Narrative grammar:Units and levels.

MLN

Proceedings of the 2013 Conferenceon Empirical Methods in Natural Language Pro-cessing . pages 369–379.Zhichao Hu and Marilyn Walker. 2017. Inferring nar-rative causality between event pairs in ﬁlms. In

Pro-ceedings of the 18th Annual SIGdial Meeting on Dis-course and Dialogue . pages 342–351. Bram Jans, Steven Bethard, Ivan Vuli´c, andMarie Francine Moens. 2012. Skip n-gramsand ranking functions for predicting script events.In

Proceedings of the 13th Conference of theEuropean Chapter of the Association for Computa-tional Linguistics . Association for ComputationalLinguistics, pages 336–344.Heeyoung Lee, Angel Chang, Yves Peirsman,Nathanael Chambers, Mihai Surdeanu, and DanJurafsky. 2013. Deterministic coreference resolu-tion based on entity-centric, precision-ranked rules.

Computational Linguistics

Bloggers: A portrait of the in-ternet’s new storytellers . Pew Internet & AmericanLife Project.Inderjeet Mani. 2012. Computational modeling ofnarrative.

Synthesis Lectures on Human LanguageTechnologies

ACL (System Demon-strations) . pages 55–60.Tara McIntosh and James R Curran. 2009. Reducingsemantic drift with bagging and distributional sim-ilarity. In

Proceedings of the Joint Conference ofthe 47th Annual Meeting of the ACL and the 4th In-ternational Joint Conference on Natural LanguageProcessing of the AFNLP: Volume 1-Volume 1 . As-sociation for Computational Linguistics, pages 396–404.Ashutosh Modi, Tatjana Anikina, Simon Ostermann,and Manfred Pinkal. 2016. Inscript: Narrative textsannotated with script information. In

LREC . pages3485–3493.Courtney Napoles, Matthew Gormley, and BenjaminVan Durme. 2012. Annotated gigaword. In

Pro-ceedings of the Joint Workshop on Automatic Knowl-edge Base Construction and Web-scale KnowledgeExtraction . Association for Computational Linguis-tics, pages 95–100.James W Pennebaker, Ryan L Boyd, Kayla Jordan, andKate Blackburn. 2015. The development and psy-chometric properties of liwc2015. Technical report.Brian T Pentland. 1999. Building process theorywith narrative: From description to explanation.

Academy of management Review

AAAI . pages 2800–2806.James Pustejovsky, Patrick Hanks, Roser Sauri, An-drew See, Robert Gaizauskas, Andrea Setzer,Dragomir Radev, Beth Sundheim, David Day, LisaFerro, et al. 2003. The timebank corpus. In

Corpuslinguistics . Lancaster, UK., volume 2003, page 40.ichaela Regneri, Alexander Koller, and ManfredPinkal. 2010. Learning script knowledge with webexperiments. In

Proceedings of the 48th AnnualMeeting of the Association for Computational Lin-guistics . Association for Computational Linguistics,pages 979–988.Mark O Riedl and Robert Michael Young. 2010. Nar-rative planning: Balancing plot and character.

Jour-nal of Artiﬁcial Intelligence Research

AAAI spring sympo-sium: Computational approaches to analyzing we-blogs . volume 6, pages 199–205.Richard Walsh. 2001. Fabula and ﬁctionality in narra-tive theory.

Style

Proceedings ofthe 2017 Conference on Empirical Methods in Nat-ural Language Processing . pages 57–67.Wentao Wu, Hongsong Li, Haixun Wang, and Kenny QZhu. 2012. Probase: A probabilistic taxonomy fortext understanding. In

Proceedings of the 2012 ACMSIGMOD International Conference on Managementof Data . ACM, pages 481–492.Wenlin Yao, Saipravallika Nettyam, and RuihongHuang. 2017. A weakly supervised approach totrain temporal relation classiﬁers and acquire reg-ular event pairs simultaneously. In

Proceedings ofthe 2017 Conference on Recent Advances in NaturalLanguage Processing . pages 803–812.Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhut-dinov, Raquel Urtasun, Antonio Torralba, and SanjaFidler. 2015. Aligning books and movies: Towardsstory-like visual explanations by watching moviesand reading books. In

Proceedings of the IEEEinternational conference on computer vision . pages19–27.

A Appendix

Here is the full list of grammar rules for identify-ing plot events in the seeding stage (Section 4.2).Sentence rules (14):S → S CC SS → S PRN CC SS → NP VPS → NP ADVP VPS → NP VP ADVPS → CC NP VPS → PP NP VPS → NP PP VPS → PP NP ADVP VPS → ADVP S NP VP S → ADVP NP VPS → SBAR NP VPS → SBAR ADVP NP VPS → CC ADVP NP VPNoun Phrase rules (12):NP → PRPNP → NNPNP → NNSNP → NNP NNPNP → NNP CC NNPNP → NP CC NPNP → DT NNNP → DT NNSNP → DT NNPNP → DT NNPSNP → NP NNPNP →→