Implicit Argument Prediction with Event Knowledge
IImplicit Argument Prediction with Event Knowledge
Pengxiang Cheng
Department of Computer ScienceThe University of Texas at Austin [email protected]
Katrin Erk
Department of LinguisticsThe University of Texas at Austin [email protected]
Abstract
Implicit arguments are not syntactically con-nected to their predicates, and are thereforehard to extract. Previous work has used mod-els with large numbers of features, evaluatedon very small datasets. We propose to trainmodels for implicit argument prediction on asimple cloze task, for which data can be gen-erated automatically at scale. This allows us touse a neural model, which draws on narrativecoherence and entity salience for predictions.We show that our model has superior perfor-mance on both synthetic and natural data. When parts of an event description in a text aremissing, this event cannot be easily extracted, andit cannot easily be found as the answer to a ques-tion. This is the case with implicit arguments , asin this example from the reading comprehensiondataset of Hermann et al. (2015):
Text:
More than 2,600 people have beeninfected by Ebola in Liberia, Guinea,Sierra Leone and Nigeria since the out-break began in December, according tothe World Health Organization. Nearly1,500 have died . Question:
The X outbreak has killednearly 1,500.In this example, it is Ebola that broke out, andEbola was also the cause of nearly 1,500 peopledying, but the text does not state this explicitly.
Ebola is an implicit argument of both outbreak and die , which is crucial to answering the question.We are particularly interested in implicit argu-ments that, like
Ebola in this case, do appear inthe text, but not as syntactic arguments of their Our code is available at https://github.com/pxch/event_imp_arg . predicates. Event knowledge is key to determin-ing implicit arguments. In our example, diseasesare maybe the single most typical things to breakout , and diseases also typically kill people.The task of identifying implicit arguments wasfirst addressed by Gerber and Chai (2010) andRuppenhofer et al. (2010). However, the datasetsfor the task were very small, and to our knowledgethere has been very little further development onthe task since then.In this paper, we address the data issue by train-ing models for implicit argument prediction on asimple cloze task, similar to the narrative clozetask (Chambers and Jurafsky, 2008), for whichdata can be generated automatically at scale. Thisallows us to train a neural network to perform thetask, building on two insights. First, event knowl-edge is crucial for implicit argument detection.Therefore we build on models for narrative eventprediction (Granroth-Wilding and Clark, 2016; Pi-chotta and Mooney, 2016a), using them to judgehow coherent the narrative would be when we fillin a particular entity as the missing (implicit) ar-gument. Second, the omitted arguments tend tobe salient, as Ebola is in the text from which theabove example is taken. So in addition to narra-tive coherence, our model takes into account entitysalience (Dunietz and Gillick, 2014).In an evaluation on a large automatically gener-ated dataset, our model clearly outperforms evenstrong baselines, and we find salience features tobe important to the success of the model. We alsoevaluate against a variant of the Gerber and Chai(2012) model that does not rely on gold features,finding that our simple neural model outperformstheir much more complex model.Our paper thus makes two major contributions.1) We propose an argument cloze task to gener-ate synthetic training data at scale for implicit ar-gument prediction. 2) We show that neural event a r X i v : . [ c s . C L ] A p r odels for narrative schema prediction can beused on implicit argument prediction, and that astraightforward combination of event knowledgeand entity salience can do well on the task. While dependency parsing and semantic role la-beling only deal with arguments that are availablein the syntactic context of the predicate, implicitargument labeling seeks to find argument that arenot syntactically connected to their predicates, like
Ebola in our introductory example.The most relevant work on implicit argumentprediction came from Gerber and Chai (2010),who built an implicit arguments dataset by select-ing 10 nominal predicates from NomBank (Mey-ers et al., 2004) and manually annotating implicitarguments for all occurrences of these predicates.In an analysis of their data they found implicit ar-guments to be very frequent, as their annotationadded 65% more arguments to NomBank. Gerberand Chai (2012) also trained a linear classifier forthe task relying on many hand-crafted features, in-cluding gold features from FrameNet (Baker et al.,1998), PropBank (Palmer et al., 2005) and Nom-Bank. This classifier has, to the best of our knowl-edge, not been outperformed by follow-up work(Laparra and Rigau, 2013; Schenk and Chiarcos,2016; Do et al., 2017). We evaluate on the Gerberand Chai dataset below. Ruppenhofer et al. (2010)also introduced an implicit argument dataset, butwe do not evaluate on it as it is even smaller andmuch more complex than Gerber and Chai (2010).More recently, Modi et al. (2017) introduced thereferent cloze task, in which they predicted a man-ually removed discourse referent from a humanannotated narrative text. This task is closely re-lated to our argument cloze task.Since we intend to exploit event knowledge inpredicting implicit arguments, we here refer torecent work on statistical script learning, startedby Chambers and Jurafsky (2008, 2009). Theyintroduced the idea of using statistical informa-tion on coreference chains to induce prototypi-cal sequences of narrative events and participants,which is related to the classical notion of a script(Schank and Abelson, 1977). They also proposedthe narrative cloze evaluation, in which one eventis removed at random from a sequence of narrativeevents, then the missing event is predicted givenall context events. We use a similar trick to de- fine a cloze task for implicit argument prediction,discussed in Section 3.Many follow-up papers on script learning haveused neural networks. Rudinger et al. (2015)showed that sequences of events can be efficientlymodeled by a log-bilinear language model. Pi-chotta and Mooney (2016a,b) used an LSTM tomodel a sequence of events. Granroth-Wildingand Clark (2016) built a network that producesan event representation by composing its compo-nents. To do the cloze task, they select the mostprobable event based on pairwise event coherencescores. For our task we want to do something sim-ilar: We want to predict how coherent a narrativewould be with a particular entity candidate fillingthe implicit argument position. So we take themodel of Granroth-Wilding and Clark (2016) asour starting point.The Hermann et al. (2015) reading comprehen-sion task, like our cloze task, requires systems toguess a removed entity. However in their case theentity is removed in a summary, not in the maintext. In their case, the task typically amounts tofinding a main text passage that paraphrases thesentence with the removed entity; this is not thecase in our cloze task.
We present the argument cloze task, which allowsus to automatically generate large scale data fortraining (Section 6.1) and evaluation (Section 5.1).In this task, we randomly remove an entity froman argument position of one event in the text. Theentity in question needs to appear in at least oneother place in the text. The task is then for themodel to pick, from all entities appearing in thetext, the one that has been removed. We first de-fine what we mean by an event, then what we meanby an entity. Like Pichotta and Mooney (2016a);Granroth-Wilding and Clark (2016), we define an event e as consisting of a verbal predicate v , a sub-ject s , a direct object o , and a prepositional object p (along with the preposition). Here we only al-low one prepositional argument in the structure, toavoid variable length input in the event composi-tion model. By an entity , we mean a coreferencechain with a length of at least two – that is, theentity needs to appear at least twice in the text.For example, from a piece of raw text (Figure In case of multiple prepositional objects, we select theone that is closest to the predicate. anville Corp. said it will build a $ 24 million power plant to provide electricity to its Igaras pulp and paper mill in Brazil .The company said the plant will ensure that it has adequate energy for the mill and will reduce the mill’s energy costs . (a) A piece of raw text from OntoNotes corpus. x = The company x = mill x = power plante : ( build-pred, x -subj, x -dobj, — )e : ( provide-pred, —, electricity-dobj, x -prep_to )e : ( ensure-pred, x -subj, —, — )e : ( has-pred, x -subj, energy-dobj, x -prep_for )e : ( reduce-pred, x -subj, cost-dobj, — ) (b) Extracted events ( e ~ e ) and entities ( x ~ x ), using goldannotations from OntoNotes. e e e e : same as abovee : ( provide-pred, —, electricity-dobj, ??-prep_to )x = The company x = mill x = power plant (c) Example of an argument cloze task for prep to of e . Figure 1 : Example of automatically extractedevents and entities and an argument cloze task.1a), we automatically extract a sequence of eventsfrom a dependency parse, and a list of entities fromcoreference chains. In Figure 1b, e ~ e are events, x ~ x are entities. The arguments electricity-dobj and energy-dobj are not in coreference chains andare thus not candidates for removal. An exam-ple of the argument cloze task is shown in Figure1c. Here the prep to argument of e has been re-moved.Coreference resolution is very noisy. Thereforewe use gold coreference annotation for creatingevaluation data, but automatically generated coref-erence chains for creating training data. We model implicit argument prediction as select-ing the entity that, when filled in as the implicitargument, makes the overall most coherent nar-rative. Suppose we are trying to predict the di-rect object argument of some target event e t . Then we complete e t by putting an entity candidate intothe direct object argument position, and check thecoherence of the resulting event with the rest ofthe narrative. Say we have a sequence of events e , e , . . . , e n in a narrative, and a list of entitycandidates x , x , . . . , x m . Then for any candidate x j , we first complete the target event to be e t ( j ) = ( v t , s t , x j , p t ) , j = 1 , . . . , m (1)where v t , s t , and p t are the predicate, subject, andprepositional object of e t respectively, and x j isfilled as the direct object. (Event completion foromitted subjects and prepositional objects is anal-ogous.)Then we compute the narrative coherence score S j of the candidate x j by S j = n max c =1 , c (cid:54) = t coh (cid:16) (cid:126)e t ( j ) , (cid:126)e c (cid:17) , j = 1 , . . . , m (2)where (cid:126)e t ( j ) and (cid:126)e c are representations for thecompleted target event e t ( j ) and one context event e c , and coh is a function computing a coherencescore between two events, both depending on themodel being used. The candidate x j with the high-est score S j is then selected as our prediction. To model coherence ( coh ) between a context eventand a target event, we build an event compositionmodel consisting of three parts, as shown in Fig-ure 2: event components are representated through event-based word embeddings , which encodeevent knowledge in word representations; the ar-gument composition network combines the com-ponents to produce event representations; and the pair composition network compute a coherencescore for two event representations.This basic architecture is as in the model ofGranroth-Wilding and Clark (2016). However ourmodel is designed for a different task, argumentcloze rather than narrative cloze, and for our taskentity-specific information is more important. Wetherefore create the training data in a different way,as described in Section 4.2.1. We now discuss thethree parts of the model in more detail.
Event-Based Word Embeddings
The modeltakes word embeddings of both predicates and We have also tried using the sum instead of the maxi-mum, but it did not perform as well across different modelsand datasets. ontext Event Argument IndexTarget Event (Missing Object ) Entity SalienceCoherence ScorePair CompositionNetworkEvent-BasedWord EmbeddingsArgument CompositionNetwork Extra Features v t s t o t p t v c s c o c p c o t coh Figure 2 : Diagram for event composition model.
Input : a context event and a target event.
Event-BasedWord Embeddings : embeddings for components of both events that encodes event knowledge.
Argu-ment Composition Network : produces an event representation from its components.
Pair CompositionNetwork : computes a coherence score coh from two event representations.
Extra Features : argumentindex and entity salience features as additional input to the pair composition network.arguments as input to compute event representa-tions. To better encode event knowledge in wordlevel, we train an SGNS (skip-gram with nega-tive sampling) word2vec model (Mikolov et al.,2013) with event-specific information. For eachextracted event sequence, we create a sentencewith the predicates and arguments of all events inthe sequence. An example of such a training sen-tence is given in Figure 3. build-pred company-subj plant-dobj provide-pred electricity-dobj mill-prep_to ensure-pred plant -subj has-pred company-subj energy-dobj mill-prep_for reduce-pred plant-subj cost-dobj
Figure 3 : Event-based word2vec training sentence,constructed from events and entities in Figure 1b.
Argument Composition Network
The argu-ment composition network (dark blue area in Fig-ure 2) is a two-layer feedforward neural networkthat composes an event representation from theembeddings of its components. Non-existent ar-gument positions are filled with zeros.
Pair Composition Network
The pair composi-tion network (light blue area in Figure 2) computesa coherence score coh between 0 and 1, given thevector representations of a context event and a tar-get event. The coherence score should be highwhen the target event contains the correct argu-ment, and low otherwise. So we construct the training objective function to distinguish the cor-rect argument from wrong ones, as described inEquation 3.
To train the model to pick the correct candidate,we automatically construct training samples asevent triples consisting of a context event e c , a pos-itive event e p , and a negative event e n . The con-text event and positive event are randomly sam-pled from an observed sequence of events, whilethe negative event is generated by replacing oneargument of positive event by a random entity inthe narrative, as shown in Figure 4. x = The company x = mill x = power plantContext: ( build-pred, x -subj, x -dobj, — )Positive: ( reduce-pred, x -subj , cost-dobj, — )Negative: ( reduce-pred, x -subj , cost-dobj, — ) Figure 4 : Example of an event triple constructedfrom events and entities in Figure 1b.We want the coherence score between e c and e p to be close to , while the score for e c and e n should be close to . Therefore, we train the modelto minimize cross-entropy as follows: m m (cid:88) i =1 − log( coh ( e ci , e pi )) − log(1 − coh ( e ci , e ni )) (3)here e ci , e pi , and e ni are the context, positive,and negative events of the i th training sample re-spectively. Implicit arguments tend to be salient entities inthe document. So we extend our model by en-tity salience features, building on recent work byDunietz and Gillick (2014), who introduced a sim-ple model with several surface level features forentity salience detection. Among the features theyused, we discard those that require external re-sources, and only use the remaining three features,as illustrated in Table 1. Dunietz and Gillick found mentions to be the most powerful indicator for en-tity salience among all features. We expect similarresults in our experiments, however we include allthree features in our event composition model fornow, and conduct an ablation test afterwards.Feature Description
Index of the sentence where thefirst mention of the entity appears head count
Number of times the head wordof the entity appears mentions
A vector containing the numbersof named, nominal, pronominal,and total mentions of the entity
Table 1 : Entity salience features from Dunietz andGillick (2014).The entity salience features are directly passedinto the pair composition network as additional in-put. We also add an extra feature for argumentposition index (encoding whether the missing ar-gument is a subject, direct object, or prepositionalobject), as shown in the red area in Figure 2.
Previous implicit argument datasets were verysmall. To overcome that limitation, we automat-ically create a large and comprehensive evaluationdataset, following the argument cloze task settingin Section 3.Since the events and entities are extracted fromdependency labels and coreference chains, we donot want to introduce systematic error into theevaluation from imperfect parsing and coreferencealgorithms. Therefore, we create the evaluation set from OntoNotes (Hovy et al., 2006), which con-tains human-labeled dependency and coreferenceannotation for a large corpus. So the extractedevents and entities in the evaluation set are gold.Note that this is only for evaluation; in training wedo not rely on any gold annotations (Section 6.1).There are four English sub-corpora inOntoNotes Release 5.0 that are annotatedwith dependency labels and coreference chains.Three of them, which are mainly from broadcastnews, share similar statistics in document length,so we combine them into a single dataset andname it ON-S
HORT as it consists mostly of shortdocuments. The fourth subcorpus is from the
Wall Street Journal and has significantly longerdocuments. We call this subcorpus
ON-L
ONG and evaluate on it separately. Some statistics areshown in Table 2. ON-S
HORT
ON-L
ONG
Table 2 : Statistics on argument cloze datasets.
The implicit argument dataset from Gerber andChai (2010) (referred as
G&C henceforth) con-sists of 966 human-annotated implicit argumentinstances on 10 nominal predicates.To evaluate our model on G&C, we convert theannotations to the input format of our model as fol-lows: We map nominal predicates to their verbalform, and semantic role labels to syntactic argu-ment types based on the NomBank frame defini-tions. One of the examples (after mapping seman-tic role labels) is as follows:[Participants] subj will be able to trans-fer [money] dobj to [other investmentfunds] prep to . The [investment] pred choices are limited to [a stock fund anda money-market fund] prep to .For the nominal predicate investment , there arethree arguments missing ( subj , dobj , prep to ). Themodel first needs to determine that each of thoseargument positions in fact has an implicit filler.Then, from a list of candidates (not shown here), it LDC Catalog No. LDC2013T19 eeds to select
Participants as the implicit subj ar-gument, money as the implicit dobj argument, andeither other investment funds or a stock fund and amoney-market fund as the implicit prep to . We train our neural model using synthetic data asdescribed in Section 3. For creating the trainingdata, we do not use gold parses or gold corefer-ence chains. We use the 20160901 dump of En-glish Wikipedia , with 5,228,621 documents in to-tal. For each document, we extract plain text andbreak it into paragraphs, while discarding all struc-tured data like lists and tables . We construct a se-quence of events and entities from each paragraph,by running Stanford CoreNLP (Manning et al.,2014) to obtain dependency parses and corefer-ence chains. We lemmatize all verbs and argu-ments. We incorporate negation and particles inverbs, and normalize passive constructions. Werepresent all arguments by their entity indices ifthey exist, otherwise by their head lemmas. Wekeep verbs and arguments with counts over 500,together with the 50 most frequent prepositions,leading to a vocabulary of 53,345 tokens; all otherwords are replaced with an out-of-vocabulary to-ken. The most frequent verbs (with counts over100,000) are down-sampled.For training the event-based word embeddings,we create pseudo-sentences (Section 4.2) from allevents of all sequences (approximately 87 millionevents) as training samples. We train an SGNSword2vec model with embedding size = , win-dow size = , subsampling threshold = − , andnegative samples = , using the Gensim package( ˇReh˚uˇrek and Sojka, 2010).For training the event composition model, wefollow the procedure described in Section 4.2.1,and extract approximately 40 million event triplesas training samples . We use a two-layer feed-forward neural network with layer sizes 600 and300 for the argument composition network, andanother two-layer network with layer sizes 400and 200 for the pair composition network. We usecross-entropy loss with (cid:96) regularization of 0.01. https://dumps.wikimedia.org/enwiki/ We use the WikiExtractor tool at https://github.com/attardi/wikiextractor . We only sample one negative event for each pair of con-text and positive events for fast training, though more trainingsamples are easily accessible.
We train the model using stochastic gradient de-scent (SGD) with a learning rate of 0.01 and abatch size of 100 for 20 epochs.To study how the size of the training set af-fects performance, we downsample the 40 milliontraining samples to another set of 8 million train-ing samples. We refer to the resulting models as E VENT C OMP -8M and E VENT C OMP -40M . For the synthetic argument cloze task, we compareour model with 3 baselines. R ANDOM
Randomly select one entity from thecandidate list. M OST F REQ
Always select the entity with high-est number of mentions. E VENT W ORD VEC
Use the event-based wordembeddings described in Section 4.2 for predi-cates and arguments. The representation of anevent e is the sum of the embeddings of its com-ponents, i.e., (cid:126)e = (cid:126)v + (cid:126)s + (cid:126)o + (cid:126)p (4)where (cid:126)v, (cid:126)s, (cid:126)o, (cid:126)p are the embeddings of verb, sub-ject, object, and prepositional object, respectively.The coherence score of two events in this base-line model is their cosine similarity. Like in ourmain model, the coherence score of the candidateis then the maximum pairwise coherence score, asdescribed in Section 4.1.The evaluation results on the ON-S HORT dataset are shown in Table 3. The E
VENT -W ORD VEC baseline is much stronger than theother two, achieving an accuracy of 38.40%. Infact, E
VENT C OMP -8M by itself does not dobetter than E
VENT W ORD VEC , but adding en-tity salience greatly boosts performance. Usingmore training data (E
VENT C OMP -40M) helps bya substantial margin both with and without entitysalience features.To see which of the entity salience features areimportant, we conduct an ablation test with theE
VENT C OMP -8M model on ON-S
HORT . Fromthe results in Table 4, we can see that in our task,as in Dunietz and Gillick (2014), the entity men-tions features, i.e., the numbers of named, nomi-nal, pronominal, and total mentions of the entity,are most helpful. In fact, the other two featureseven decrease performance slightly.
UBJ DOBJ POBJ
Argument Type A cc u r a c y ( % ) M OST F REQ E VENT W ORD VEC E VENT C OMP -40ME
VENT C OMP -40M + Salience (a) Accuracy by Argument Type
Noun Pronoun
POS of Head Word A cc u r a c y ( % ) M OST F REQ E VENT W ORD VEC E VENT C OMP -40ME
VENT C OMP -40M + Salience (b) Accuracy by POS of Head Word
Entity Frequency A cc u r a c y ( % ) M OST F REQ E VENT W ORD VEC E VENT C OMP -40ME
VENT C OMP -40M + Salience (c) Accuracy by Entity Frequency
Figure 5 : Performance of E
VENT C OMP (with and without entity salience) and two baseline models by(a) argument type, (b) part-of-speech tag of the head word of the entity, and (c) entity frequency.Accuracy (%)R
ANDOM
OST F REQ
VENT W ORD VEC
VENT C OMP -8M 38.26+ entity salience 45.05E
VENT C OMP -40M 41.89+ entity salience
Table 3 : Evaluation on ON-S
HORT .Features Accuracy (%)no entity salience feature 38.26– mentions head count – all entity salience features 45.05 Table 4 : Ablation test on entity salience features.(Using E
VENT C OMP -8M on ON-S
HORT .)We take a closer look at several of the mod-els in Figure 5. Figure 5a breaks down the re-sults by the argument type of the removed argu-ment. On subjects, the E
VENT W ORD VEC base-line matches the performance of E
VENT C OMP ,but not on direct objects and prepositional objects.Subjects are semantically much less diverse thanthe other argument types, as they are very often an-imate. A similar pattern is apparent in Figure 5b,which has results by the part-of-speech tag of thehead word of the removed entity. Note that an en-tity is a coreference chain, not a single mention; sowhen the head word is a pronoun, this is an entity which has only pronoun mentions. A pronoun en-tity provides little semantic content beyond, again,animacy. And again, E
VENT W ORD VEC per-forms well on pronoun entities, but less so on en-tities described by a noun. It seems that E
VENT -W ORD VEC can pick up on a coarse-grained pat-tern such as animate/inanimate, but not on morefine-grained distinctions needed to select the rightnoun, or to select a fitting direct object or prepo-sitional object. This matches the fact that E
VENT -W ORD VEC gets a less clear signal on the task,in two respects: It gets much less informationthan
EVENT C OMP on the distinction between ar-gument positions, and it only looks at overallevent similarity while E VENT C OMP is trained todetect narrative coherence. Entity salience con-tributes greatly across all argument types and partsof speech, but more strongly on subjects and pro-nouns. This is again because subjects, and pro-nouns, are semantically less distinct, so they canonly be distinguished by relative salience.Figure 5c analyzes results by the frequency ofthe removed entity, that is, by its number of men-tions. The M
OST F REQ baseline, unsurprisingly,only does well when the removed entity is a highlyfrequent one. The E
VENT C OMP model is muchbetter than M
OST F REQ at picking out the rightentity when it is a rare one, as it can look at thesemantic content of the entity as well as its fre-quency. Entity salience boosts the performance ofE
VENT C OMP in particular for frequent entities.The ON-L
ONG dataset, as discussed in Sec-tion 5.1, consists of OntoNotes data with much As shown in Figure 3, the “words” for which embed-dings are computed are role-lemma pairs. onger documents than found in ON-S
HORT .Evaluation results on ON-L
ONG are shown in Ta-ble 5. Although the overall numbers are lowerthan those for ON-S
HORT , we are selecting from . candidates on average, more than 3 timesmore than for ON-S HORT . Considering that theaccuracy of randomly selecting an entity is as lowas . , the performance of our best performingmodel, with an accuracy of . , is quite good.Accuracy (%)R ANDOM
OST F REQ
VENT W ORD VEC
VENT C OMP -8M 18.79+ entity salience 26.23E
VENT C OMP -40M 21.79+ entity salience
Table 5 : Evaluation on ON-L
ONG . The G&C data differs from the Argument Clozedata in two respects. First, not every argument po-sition that seems to be open needs to be filled: Themodel must additionally make a fill / no-fill de-cision . Whether a particular argument position istypically filled is highly predicate-specific. As thesmall G&C dataset does not provide enough datato train our neural model on this task, we insteadtrain a simple logistic classifier, the fill / no-fillclassifier , with a small subset of shallow lexicalfeatures used in Gerber and Chai (2012), to makethe decision. These features describe the syntacticcontext of the predicate. We use only 14 features;the original Gerber and Chai model had more than80 features, and our re-implementation, describedbelow, has around 60.The second difference is that in G&C, an eventmay have multiple open argument positions. Inthat case, the task is not just to select a candidateentity, but also to determine which of the open ar-gument positions it should fill. So the model mustdo multi implicit argument prediction . We canflexibly adapt our method for training data gener-ation to this case. In particular, we create extranegative training events, in which an argument ofthe positive event has been moved to another argu-ment position in the same event, as shown in Fig-ure 6. We can then simply train our E
VENT C OMP model on this extended training data. We refer tothe extra training process as multi-arg training . x = The company x = mill x = power plantContext: ( build-pred, x -subj, x -dobj, — )Positive: ( reduce-pred, x -subj , cost-dobj, — )Negative: ( reduce-pred, —, cost-dobj, x -prep ) Figure 6 : Event triples for training multi implicitargument prediction.We compare our models to that of Gerber andChai (2012). However, their original logistic re-gression model used many features based on goldannotation from FrameNet, PropBank and Nom-Bank. To create a more realistic evaluation setup,we re-implement a variant of their original modelby removing gold features, and name it GC
AUTO .Results from GC
AUTO are directly comparable toour models, as both are trained on automaticallygenerated features. P R F Gerber and Chai (2012) 57.9 44.5 50.3GC
AUTO
VENT C OMP -8M 8.9 27.9 13.5+ fill / no-fill classifier 22.0 22.3 22.1+ multi-arg training 43.5 44.1 43.8+ entity salience 45.7 46.4 E VENT C OMP -40M 9.4 30.3 14.3+ fill / no-fill classifier 23.7 24.0 23.9+ multi-arg training 46.7 47.3 47.0+ entity salience 49.3 49.9
Table 6 : Evaluation on G&C dataset.We present the evaluation results in Table 6.The original E
VENT C OMP models do not per-form well, which is as expected since the modelis not designed to do the fill / no-fill decision and multi implicit argument prediction tasks as de-scribed above. With the fill / no-fill classifier,precision rises by around 13 points because thisclassifier prevents many false positives. With ad-ditional multi-arg training, F score improves byanother 22-23 points. At this point, our model To be fair, we also tested adding the fill / no-fill classi-fier to GC
AUTO . However the classifier only increases preci-sion at the cost of reducing recall, and GC
AUTO already hashigher precision than recall. The resulting F score is actu-ally worse, and thus is not reported here. chieves a performance comparable to the muchmore complex G&C reimplementation GC AUTO .Adding entity salience features further boosts bothprecision and recall, showing that implicit argu-ments do tend to be filled by salient entities, as wehad hypothesized. Again, more training data sub-stantially benefits the task. Our best performingmodel, at 49.6 F , clearly outperforms GC AUTO ,and is comparable with the original Gerber andChai (2012) model trained with gold features. In this paper we have addressed the task of im-plicit argument prediction. To support training atscale, we have introduced a simple cloze task forwhich data can be generated automatically. Wehave introduced a neural model, which frames im-plicit argument prediction as the task of select-ing the textual entity that completes the event ina maximally narratively coherent way. The modelprefers salient entities, where salience is mainlydefined through the number of mentions. Evalu-ating on synthetic data from OntoNotes, we findthat our model clearly outperforms even strongbaselines, that salience is important throughout forperformance, and that event knowledge is partic-ularly useful for the (more verb-specific) objectand prepositional object arguments. Evaluatingon the naturally occurring data from Gerber andChai, we find that in a comparison without goldfeatures, our model clearly outperforms the pre-vious state-of-the-art model, where again salienceinformation is important.The current paper takes a first step towards pre-dicting implicit arguments based on narrative co-herence. We currently use a relatively simplemodel for local narrative coherence; in the futurewe will turn to models that can test global coher-ence for an implicit argument candidate. We alsoplan to investigate how the extracted implicit ar-guments can be integrated into a downstream taskthat makes use of event information, in particularwe would like to experiment with reading compre-hension.
Acknowledgments
This research was supported by NSF grant IIS1523637. We also acknowledge the Texas Ad- We also tried fine tune our model on the G&C datasetwith cross validation, but the model severely overfit, possiblydue to the very small size of the dataset. vanced Computing Center for providing grid re-sources that contributed to these results, and wewould like to thank the anonymous reviewers fortheir valuable feedback.
References
Collin F. Baker, Charles J. Fillmore, and John B. Lowe.1998. The Berkeley FrameNet project. In . .Nathanael Chambers and Dan Jurafsky. 2008. Unsu-pervised learning of narrative event chains. In Pro-ceedings of ACL-08: HLT . Association for Com-putational Linguistics, pages 789–797. .Nathanael Chambers and Dan Jurafsky. 2009. Unsu-pervised learning of narrative schemas and their par-ticipants. In
Proceedings of the Joint Conference ofthe 47th Annual Meeting of the ACL and the 4th In-ternational Joint Conference on Natural LanguageProcessing of the AFNLP . Association for Computa-tional Linguistics, pages 602–610. .Quynh Ngoc Thi Do, Steven Bethard, and Marie-Francine Moens. 2017. Improving implicit seman-tic role labeling by predicting semantic frame argu-ments. In
Proceedings of the Eighth InternationalJoint Conference on Natural Language Processing(Volume 1: Long Papers) . Asian Federation of Nat-ural Language Processing, pages 90–99. .Jesse Dunietz and Daniel Gillick. 2014. A new en-tity salience task with millions of training exam-ples. In
Proceedings of the 14th Conference ofthe European Chapter of the Association for Com-putational Linguistics, volume 2: Short Papers .Association for Computational Linguistics, pages205–209. https://doi.org/10.3115/v1/E14-4040 .Matthew Gerber and Joyce Chai. 2010. Beyond Nom-Bank: A study of implicit arguments for nomi-nal predicates. In
Proceedings of the 48th AnnualMeeting of the Association for Computational Lin-guistics . Association for Computational Linguistics,pages 1583–1592. .Matthew Gerber and Joyce Y. Chai. 2012. Semanticrole labeling of implicit arguments for nominal pred-icates.
Computational Linguistics https://doi.org/10.1162/COLI_a_00110 .Mark Granroth-Wilding and Stephen Clark. 2016.What happens next? event prediction using a com-positional neural network model. In
AAAI Confer-ence on Artificial Intelligence . pages 2727–2733.arl Moritz Hermann, Tomas Kocisky, EdwardGrefenstette, Lasse Espeholt, Will Kay, Mustafa Su-leyman, and Phil Blunsom. 2015. Teaching ma-chines to read and comprehend. In
Advances in Neu-ral Information Processing Systems . pages 1693–1701.Eduard Hovy, Mitchell Marcus, Martha Palmer, LanceRamshaw, and Ralph Weischedel. 2006. OntoNotes:The 90% solution. In
Proceedings of the HumanLanguage Technology Conference of the NAACL,Companion Volume: Short Papers . .Egoitz Laparra and German Rigau. 2013. ImpAr: Adeterministic algorithm for implicit semantic role la-belling. In Proceedings of the 51st Annual Meet-ing of the Association for Computational Linguistics(Volume 1: Long Papers) . Association for Compu-tational Linguistics, pages 1180–1189. .Christopher Manning, Mihai Surdeanu, John Bauer,Jenny Finkel, Steven Bethard, and David McClosky.2014. The Stanford CoreNLP natural language pro-cessing toolkit. In
Proceedings of 52nd AnnualMeeting of the Association for Computational Lin-guistics: System Demonstrations . Association forComputational Linguistics, pages 55–60. https://doi.org/10.3115/v1/P14-5010 .Adam Meyers, Ruth Reeves, Catherine Macleod,Rachel Szekely, Veronika Zielinska, Brian Young,and Ralph Grishman. 2004. The NomBank project:An interim report. In
Proceedings of the Work-shop Frontiers in Corpus Annotation at HLT-NAACL2004 . pages 24–31. .Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-rado, and Jeff Dean. 2013. Distributed representa-tions of words and phrases and their compositional-ity. In
Advances in neural information processingsystems . pages 3111–3119.Ashutosh Modi, Ivan Titov, Vera Demberg, Asad Say-eed, and Manfred Pinkal. 2017. Modelling seman-tic expectation: Using script knowledge for referentprediction.
Transactions of the Association of Com-putational Linguistics .Martha Palmer, Daniel Gildea, and Paul Kingsbury.2005. The Proposition Bank: An annotatedcorpus of semantic roles.
Computational Lin-guistics .Karl Pichotta and Raymond J Mooney. 2016a. Learn-ing statistical scripts with LSTM recurrent neuralnetworks. In
AAAI Conference on Artificial Intel-ligence . pages 2800–2806.Karl Pichotta and Raymond J. Mooney. 2016b. Usingsentence-level LSTM language models for script in- ference. In
Proceedings of the 54th Annual Meet-ing of the Association for Computational Linguis-tics (Volume 1: Long Papers) . Association for Com-putational Linguistics, pages 279–289. https://doi.org/10.18653/v1/P16-1027 .Radim ˇReh˚uˇrek and Petr Sojka. 2010. Software Frame-work for Topic Modelling with Large Corpora. In
Proceedings of the LREC 2010 Workshop on NewChallenges for NLP Frameworks . ELRA, Valletta,Malta, pages 45–50. http://is.muni.cz/publication/884893/en .Rachel Rudinger, Pushpendre Rastogi, Francis Fer-raro, and Benjamin Van Durme. 2015. Script in-duction as language modeling. In
Proceedings ofthe 2015 Conference on Empirical Methods in Nat-ural Language Processing . Association for Compu-tational Linguistics, pages 1681–1686. https://doi.org/10.18653/v1/D15-1195 .Josef Ruppenhofer, Caroline Sporleder, RoserMorante, Collin Baker, and Martha Palmer. 2010.SemEval-2010 Task 10: Linking events and theirparticipants in discourse. In
Proceedings of the5th International Workshop on Semantic Evalua-tion . Association for Computational Linguistics,pages 45–50. .Roger C Schank and Robert Abelson. 1977. Scripts,goals, plans, and understanding.Niko Schenk and Christian Chiarcos. 2016. Unsuper-vised learning of prototypical fillers for implicit se-mantic role labeling. In
Proceedings of the 2016Conference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies . Association for Computa-tional Linguistics, pages 1473–1479. https://doi.org/10.18653/v1/N16-1173 . Supplemental Material
Nominal Predicate Verbal Form arg arg arg arg arg bid bid subj prep for dobj – –sale sell subj dobj prep to prep for prep loan loan subj dobj prep to prep prep at cost cost – subj dobj prep to prep plan plan subj dobj prep for prep for –investor invest subj dobj prep in – –price price subj dobj prep at prep –loss lose subj dobj prep to prep on –investment invest subj dobj prep in – –fund fund subj dobj prep prep on – Table 7 : Mappings from the 10 nominal predicates to their verbal forms, and mappings from the semanticrole labels of each predicate to the corresponding dependency labels, as discussed in Section 5.2. p itself.2 p & p ’s morphological suffix.3 p & iarg n .4 Verbal form of p & iarg n .5 Frequency of p within the document.6 p & the stemmed content words in a one-word window around p .7 p & the stemmed content words in a two-word window around p .8 p & the stemmed content words in a three-word window around p .9 p & whether p is before a passive verb.10 p & the head of the following prepositional phrase’s object.11 p & the syntactic parse tree path from p to the nearest passive verb.12 p & the part-of-speech of p ’s parent’s head word.13 p & the last word of p ’s right sibling.14 Whether or not p ’s left sibling is a quantifier (many, most, all, etc.). Table 8 : Features used in the fill / no-fill classifier, as discussed in Section 6.3. This is a subset of featuresused by Gerber and Chai (2012). Here, p is the nominal predicate, iarg n is the integer nn