[PDF] Multi-Field Structural Decomposition for Question Answering

Abstract

This paper presents a precursory yet novel approach to the question answering task using structural decomposition. Our system first generates linguistic structures such as syntactic and semantic trees from text, decomposes them into multiple fields, then indexes the terms in each field. For each question, it decomposes the question into multiple fields, measures the relevance score of each field to the indexed ones, then ranks all documents by their relevance scores and weights associated with the fields, where the weights are learned through statistical modeling. Our final model gives an absolute improvement of over 40% to the baseline approach using simple search for detecting documents containing answers.

Full PDF

MMulti-Field Structural Decomposition for Question Answering

Tomasz Jurczyk

Mathematics and Computer ScienceEmory UniversityAtlanta, GA 30322, USA [email protected]

Jinho D. Choi

Mathematics and Computer ScienceEmory UniversityAtlanta, GA 30322, USA [email protected]

Abstract

This paper presents a precursory yet novelapproach to the question answering task us-ing structural decomposition. Our systemﬁrst generates linguistic structures such assyntactic and semantic trees from text, de-composes them into multiple ﬁelds, then in-dexes the terms in each ﬁeld. For each ques-tion, it decomposes the question into multi-ple ﬁelds, measures the relevance score ofeach ﬁeld to the indexed ones, then ranksall documents by their relevance scores andweights associated with the ﬁelds, wherethe weights are learned through statisticalmodeling. Our ﬁnal model gives an abso-lute improvement of over 40% to the base-line approach using simple search for de-tecting documents containing answers.

Towards machine reading, question answering hasrecently gained lots of interest among researchersfrom both natural language processing (Moschittiand Quarteroni, 2011; Yih et al., 2013; Hixon et al.,2015) and information retrieval (Schiffman et al.,2007; Kolomiyets and Moens, 2011). People fromthese two research ﬁelds, NLP and IR, have showntremendous progress on question answering, yetonly few efforts have been made to adapt technolo-gies from both sides. The NLP side often tacklesthe task by analyzing linguistic aspects, whereasthe IR side tackles it by searching likely patterns.While these two approaches perform well indi-vidually, more sophisticated solutions are needed tohandle a wide range of questions. By consideringlinguistic structures such as syntactic and seman-tic trees, QA systems can infer deeper meaning ofthe context and handle more complex questions.However, extracting answers from these structuresthrough either graph matching or predicate logic is not necessarily scalable when the size of the con-text is large. On the other hand, searching patternsis scalable for large data, especially when coupledwith indexing, although it does not always concernwith the actual meaning of the context.We present a multi-ﬁeld weighted indexing ap-proach for question answering that combines goodaspects of both NLP and IR. We begin by describ-ing how linguistic structures are decomposed intomultiple ﬁelds (Section 3.3), and explain how thedecomposed ﬁelds are used to rank documents con-taining answers through statistical learning (Sec-tions 3.4 and 3.5). We evaluate our approach to 8types of questions; our ﬁnal model shows signiﬁ-cant improvement over the baseline model usingsimple search (Section 4).

Shen and Lapata (2007) assessed the contributionof semantic roles to factoid question answeringand showed promising results. Pizzato and Moll´a(2008) proposed a question prediction languagemodel providing rich information and achieved im-proved speed and accuracy. Although related, ourwork is distinguished from theirs because we con-sider multiple ﬁelds whereas the others consideronly one ﬁeld representing semantic roles. Ferrucciet al. (2010) presented IBM Watson taking a hybridapproach between NLP and IR, and advanced thequestion answering task to another level.Fader et al. (2013) proposed a paraphrase-drivenperceptron learning approach using a seed lexicon.Our learning process is similar; however, it is dis-tinguished in a way that we learn weights for indi-vidual ﬁelds instead of lexicons. Yih et al. (2014)introduced a semantic parsing framework for opendomain question answering, which used convolu-tional neural networks for measuring similaritiesbetween decomposed entities. Weston et al. (2015)presented the Memory Networks models designedto memorize information about known objects and a r X i v : . [ c s . C L ] A p r on t e x t Field ExtractorNLP Tools

Documents

Disk

Index Engine Q ue r y i n g Q questions Answer Ranker q u e s t i o n s A answersquestionscontext con t e x t Figure 1: The overall framework of our question answering system.actors. Our work is related to the this work; how-ever, memory networks are designed to store andmanipulate information about speciﬁc types of ob-jects while our framework is generalizable to anytype of objects induced from the context.

Figure 1 shows the overall framework. Our systemis designed in a modular architectural way, so anyfurther extension of ﬁelds can be easily integrated.The system takes input documents, generates lin-guistic structures using NLP tools, decomposesthem into multiple ﬁelds, and indexes those ﬁelds.Questions are processed in the same way. To an-swer a question, the system queries the index foreach ﬁeld extracted from the question and measuresthe relevance score. All documents are ranked withrespect to the relevance scores and their weightsassociated with the ﬁelds, and the document withthe highest score is selected as the answer.

Our system consists of several modules closely con-nected together providing a fully working solutionfor the question answering selection task.

Documents provide the context where the questionsﬁnd their answers from. Each document can con-tain one or more sentences, in which answers forcoming questions are annotated for training. Doc-uments may simply be Wikipedia articles, newsarticles, ﬁctional stories, etc. Questions are treatedas regular documents containing only one sentence.

For the generation of syntactic and semantic struc-tures, we used the part-of-speech tagger (Choi andPalmer, 2012), the dependency parser (Choi andMcCallum, 2013), the semantic role labeler (Choiand Palmer, 2011), and the coreference resolutiontool in ClearNLP . Ensuring good and robust accu-racy for these NLP tools is important because allthe following modules depend on their output. The ﬁeld extractor takes the linguistic structuresfrom the NLP tools and decomposes them intomultiple ﬁelds (Section 3.3). All ﬁelds extractedfrom the documents are passed to the index engine,whereas ﬁelds extracted from the questions are sentdirectly to the answer ranker module.

The index engine is a search server that receivesa list of ﬁelds decomposed by the ﬁeld extractor,indexes terms in the ﬁelds, and responses to thequeries generated from questions with their rele-vance scores. We used Elastic Search , as it pro-vides a distributed, multi-tenancy-capable search. The answer ranker takes the decomposed ﬁelds ex-tracted from a question, converts them into queries,and builds a matrix of documents with their rel-evance scores across all ﬁelds through the indexengine (Section 3.4). It also uses different weightsfor individual ﬁelds trained by statistical modeling(Section 3.5). .3 Structural decomposition Julie is either in the school or the cinema

Field Extractor Indexed Documents … {julie_A1_is, school_A2_is, cinema_A2_is …}f Disk

Index {july_nsubj, is_root, either_preconj, …}f {julie, is, either, in, the, school, …}f {…}f n LexiconSyntaxSemantics

More Fields … Figure 2: The ﬂow of the sentence,

Julie is eitherin the school or the cinema , through our system.Each sentence is represented by the index engineas a document with multiple ﬁelds grouped intocategories. Figure 2 shows an example of how thesentence is decomposed into multiple ﬁelds con-sisting of syntactic and semantic structures. Dueto the extensible nature of our ﬁeld extractor, ad-ditional groups and ﬁelds can be easily integrated.Currently, our system supports 24 ﬁelds groupedinto the following three categories: • Lexical ﬁelds (e.g., word-forms, lemmas). • Syntactic ﬁelds (e.g., dependency labels). • Semantic ﬁelds (e.g., semantic roles, distancesbetween predicates).

When a question q is asked, it is decomposed intothe n -number of ﬁelds. Each ﬁeld is transformedinto a query where certain words are replaced withwildcards (e.g., { where a1, is pred, she a2 } →{ * a1 is pred she a2 } ). Then, the relevance score r is measured between each ﬁeld in the questionand the same ﬁeld in each document d t ∈ D by theindex engine. The product of the relevance scores We set the elastic search results limit to 20. and individual weights for all ﬁelds are summed,and the document ˆ d with the highest score f istaken as the answer. Note that in our dataset, eachdocument contains only one sentence so that re-trieving a document is equivalent to retrieving asentence. The following equations describe howthe document ˆ d is selected by measuring the overallscore f ( q, d t ) using the relevance scores r ( q i , d ti ) and the weights λ i . ˆ d = arg max d t ∈ D f ( q, d t ) f ( q, d t ) = n (cid:88) i =1 λ i · r ( q i , d ti ) r ( q i , d ti ) = (cid:88) v ∈ q i ∩ d ti tf ti ( v ) · idf i ( v ) · norm ti ( v ) Algorithm 1 shows how the weights for all ﬁeldsare learned during training. We adapt the averagedperceptron algorithm, which has been widely usedfor many NLP tasks. All the weights (cid:126)λ are initial-ized to 1. For each question q ∈ Q , it predicts thedocument ˆ d that most likely contains the answer. If ˆ d is incorrect, then it compares the relevance score r between ( q, ˆ d ) and ( q, d ) for each ﬁeld, and up-dates the weight accordingly, where d is the truedocument from the oracle. This procedure is re-peated multiple times through iterations. Finally,the algorithm returns the averaged weights, whereeach dimension represents the weight for each ﬁeld. Algorithm 1

Averaged perceptron training.

Input: D : document set , Q : question set . M : max-number of iterations, α : learning rate. Output:

The averaged weight vector.1: (cid:126)λ ← (cid:126)λ (cid:48) ← for iter ∈ [1 , M ] do foreach q ∈ Q do ˆ d = arg max d t ∈ D f ( q, d t ) if ˆ d (cid:54) = d then d is the oracle6: foreach i ∈ [1 , n ] do δ ← α · sign[ r ( q i , d i ) − r ( q i , ˆ d i )] λ i ← λ i + δ (cid:126)λ (cid:48) ← (cid:126)λ (cid:48) + (cid:126)λ return (cid:126)λ (cid:48) · M ∗| Q | All hyper-parameters were optimized on the devel-opment sets and evaluated on the test sets. Forour experiments, we used the following hyper-parameters: M = 40 , α = 0 . . ype Lexical Lexical + Syntax Lexical + Syntax + Semantics λ = 1 λ is learned λ = 1 λ is learned λ = 1 λ is learnedMAP MRR MAP MRR MAP MRR MAP MRR MAP MRR MAP MRR1 (qa1) 39.62 61.73 39.62 61.73 29.90 48.05 40.50 61.47 72.60 85.07 Avg. 44.45 61.25 44.63 61.37 45.16 60.34 48.41 63.76 59.60 73.70

Table 1: Results from our question-answering system on 8 types of questions in the bAbI tasks.

Our approach is evaluated on a subset of the bAbItasks (Weston et al., 2015). The original data con-tains 20 tasks, where each task represents a dif-ferent kind of question answering challenge. Weselect 8 tasks, in which answer for a single questionis located within a single sentence. For consistencyand replicability, we follow the same training, de-velopment, and evaluation set splits as provided,where every set contains 1,000 questions.For the evaluation metrics, we use mean averageprecision (

MAP ) and mean reciprocal rank (

MRR )of the top-3 predictions. The mean average preci-sion is measured by counting the number of ques-tions, for which sentences containing the answersare correctly selected as the best predictions. Thereciprocal rank of a query response is the multi-plicative inverse of the rank of the ﬁrst correct an-swer. Mean reciprocal rank is the average of thereciprocal ranks of all question queries.

Table 1 shows the results from our system on dif-ferent types of questions. The

MAP and

MRR showclear correlation with respect to the number of ac-tive ﬁelds. For the majority of tasks, using only thelexical ﬁelds does not perform well. The ﬁctionalstories included in this data often contain multipleoccurrences of the same lexicons, and the lexicalﬁelds alone are not able to select the correct answer.Signiﬁcantly lower accuracy for the last task is dueto a fact that besides an answer is located withina single sentence, multiple passages for the singlequestion are required to correctly locate the sen-tence with the answers. Lexical ﬁelds coupled withonly syntactic ﬁelds do not perform much better. Itmay be due to a fact that the syntactic ﬁelds con- taining ordinary dependency labels do not providesufﬁcient context-wise information so that they donot generate enough features for statistical learn-ing to capture speciﬁc characteristic of the context.The signiﬁcant improvement, however, is reachedwhen the semantics ﬁelds are added as they providedeeper understanding of the context.Not that this data set has also been used for eval-uating the Memory Networks approach to ques-tion answering (Weston et al., 2015). The authorsachieved high accuracy, reaching 100% in severaltasks; however, our work still ﬁnds its own valuebecause our approach is completely data-drivensuch that it can be easily adapted or extended toother types of questions. As a matter of fact, weare using the same system for all tasks with differ-ent trained models, yet still able to achieve highaccuracy for most tasks we evaluate on.

This paper presents a multi-ﬁeld weighted indexingapproach for question answering. Our system de-composes linguistic structures into multiple ﬁelds,indexes terms of individual ﬁelds, and retrieves thedocuments containing the answers with respect tothe relevance scores weighted differently. We ob-serve signiﬁcant improvement as we add more se-mantic ﬁelds and apply averaged perceptron learn-ing to statistically designate weights for the ﬁelds.In the future, we plan to extend our work by in-tegrating additional layers of ﬁelds (e.g., Freebase,WordNet). Furthermore, we plan to improve ourNLP tools to enable even deeper understanding ofthe context for more complex question answering.

References

Jinho D. Choi and Andrew McCallum. 2013.Transition-based Dependency Parsing with Selec-ional Branching. In

Proceedings of the 51st AnnualMeeting of the Association for Computational Lin-guistics (Volume 1: Long Papers) , ACL’13, pages1052–1062, Soﬁa, Bulgaria, August.Jinho D. Choi and Martha Palmer. 2011. Transition-based Semantic Role Labeling Using PredicateArgument Clustering. In

Proceedings of ACLworkshop on Relational Models of Semantics ,RELMS’11, pages 37–45.Jinho D. Choi and Martha Palmer. 2012. Fast and Ro-bust Part-of-Speech Tagging Using Dynamic ModelSelection. In

Proceedings of the 50th Annual Meet-ing of the Association for Computational Linguistics(Volume 2: Short Papers) , ACL’12, pages 363–367,Jeju Island, Korea, July.Anthony Fader, Luke Zettlemoyer, and Oren Etzioni.2013. Paraphrase-Driven Learning for Open Ques-tion Answering. In

Proceedings of the 51st AnnualMeeting of the Association for Computational Lin-guistics , ACL’13, pages 1608–1618.David A. Ferrucci, Eric W. Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya Kalyan-pur, Adam Lally, J. William Murdock, Eric Nyberg,John M. Prager, Nico Schlaefer, and Christopher A.Welty. 2010. Building Watson: An Overview of theDeepQA Project.

AI Magazine , 31(3):59–79.Ben Hixon, Peter Clark, and Hannaneh Hajishirzi.2015. Learning Knowledge Graphs for Ques-tion Answering through Conversational Dialog. In

Proceedings of the 2015 Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies ,NAACL’15, pages 851–861.Oleksandr Kolomiyets and Marie-Francine Moens.2011. A Survey on Question Answering Technologyfrom an Information Retrieval Perspective.

Informa-tion Sciences , 181(24):5412–5434.Alessandro Moschitti and Silvia Quarteroni. 2011.Linguistic kernels for answer re-ranking in ques-tion answering systems.

Information and Process-ing Management , 47(6):825–842.Luiz Augusto Pizzato and Diego Moll´a. 2008. In-dexing on Semantic Roles for Question Answering.In

Proceedings of the 2nd workshop on InformationRetrieval for Question Answering , IR4QA’08, pages74–81.Barry Schiffman, Kathleen McKeown, Ralph Grish-man, and James Allan. 2007. Question AnsweringUsing Integrated Information Retrieval and Informa-tion Extraction. In

The Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics , ACL’07, pages 532–539.Dan Shen and Mirella Lapata. 2007. Using Seman-tic Roles to Improve Question Answering. In

Pro-ceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Com-putational Natural Language Learning , EMNLP-CoNLL’07, pages 12–21.Jason Weston, Antoine Bordes, Sumit Chopra, andTomas Mikolov. 2015. Towards AI-Complete Ques-tion Answering: A Set of Prerequisite Toy Tasks. arXiv:1502.05698 .Wen-tau Yih, Ming-Wei Chang, Christopher Meek, andAndrzej Pastusiak. 2013. Question Answering Us-ing Enhanced Lexical Semantic Models. In

Pro-ceedings of the 51st Annual Meeting of the Associ-ation for Computational Linguistics , ACL’13, pages1744–1753.Wen-tau Yih, Xiaodong He, and Christopher Meek.2014. Semantic Parsing for Single-Relation Ques-tion Answering. In