Multi-Field Structural Decomposition for Question Answering
MMulti-Field Structural Decomposition for Question Answering
Tomasz Jurczyk
Mathematics and Computer ScienceEmory UniversityAtlanta, GA 30322, USA [email protected]
Jinho D. Choi
Mathematics and Computer ScienceEmory UniversityAtlanta, GA 30322, USA [email protected]
Abstract
This paper presents a precursory yet novelapproach to the question answering task us-ing structural decomposition. Our systemfirst generates linguistic structures such assyntactic and semantic trees from text, de-composes them into multiple fields, then in-dexes the terms in each field. For each ques-tion, it decomposes the question into multi-ple fields, measures the relevance score ofeach field to the indexed ones, then ranksall documents by their relevance scores andweights associated with the fields, wherethe weights are learned through statisticalmodeling. Our final model gives an abso-lute improvement of over 40% to the base-line approach using simple search for de-tecting documents containing answers.
Towards machine reading, question answering hasrecently gained lots of interest among researchersfrom both natural language processing (Moschittiand Quarteroni, 2011; Yih et al., 2013; Hixon et al.,2015) and information retrieval (Schiffman et al.,2007; Kolomiyets and Moens, 2011). People fromthese two research fields, NLP and IR, have showntremendous progress on question answering, yetonly few efforts have been made to adapt technolo-gies from both sides. The NLP side often tacklesthe task by analyzing linguistic aspects, whereasthe IR side tackles it by searching likely patterns.While these two approaches perform well indi-vidually, more sophisticated solutions are needed tohandle a wide range of questions. By consideringlinguistic structures such as syntactic and seman-tic trees, QA systems can infer deeper meaning ofthe context and handle more complex questions.However, extracting answers from these structuresthrough either graph matching or predicate logic is not necessarily scalable when the size of the con-text is large. On the other hand, searching patternsis scalable for large data, especially when coupledwith indexing, although it does not always concernwith the actual meaning of the context.We present a multi-field weighted indexing ap-proach for question answering that combines goodaspects of both NLP and IR. We begin by describ-ing how linguistic structures are decomposed intomultiple fields (Section 3.3), and explain how thedecomposed fields are used to rank documents con-taining answers through statistical learning (Sec-tions 3.4 and 3.5). We evaluate our approach to 8types of questions; our final model shows signifi-cant improvement over the baseline model usingsimple search (Section 4).
Shen and Lapata (2007) assessed the contributionof semantic roles to factoid question answeringand showed promising results. Pizzato and Moll´a(2008) proposed a question prediction languagemodel providing rich information and achieved im-proved speed and accuracy. Although related, ourwork is distinguished from theirs because we con-sider multiple fields whereas the others consideronly one field representing semantic roles. Ferrucciet al. (2010) presented IBM Watson taking a hybridapproach between NLP and IR, and advanced thequestion answering task to another level.Fader et al. (2013) proposed a paraphrase-drivenperceptron learning approach using a seed lexicon.Our learning process is similar; however, it is dis-tinguished in a way that we learn weights for indi-vidual fields instead of lexicons. Yih et al. (2014)introduced a semantic parsing framework for opendomain question answering, which used convolu-tional neural networks for measuring similaritiesbetween decomposed entities. Weston et al. (2015)presented the Memory Networks models designedto memorize information about known objects and a r X i v : . [ c s . C L ] A p r on t e x t Field ExtractorNLP Tools
Documents
Disk
Index Engine Q ue r y i n g Q questions Answer Ranker q u e s t i o n s A answersquestionscontext con t e x t Figure 1: The overall framework of our question answering system.actors. Our work is related to the this work; how-ever, memory networks are designed to store andmanipulate information about specific types of ob-jects while our framework is generalizable to anytype of objects induced from the context.
Figure 1 shows the overall framework. Our systemis designed in a modular architectural way, so anyfurther extension of fields can be easily integrated.The system takes input documents, generates lin-guistic structures using NLP tools, decomposesthem into multiple fields, and indexes those fields.Questions are processed in the same way. To an-swer a question, the system queries the index foreach field extracted from the question and measuresthe relevance score. All documents are ranked withrespect to the relevance scores and their weightsassociated with the fields, and the document withthe highest score is selected as the answer.
Our system consists of several modules closely con-nected together providing a fully working solutionfor the question answering selection task.
Documents provide the context where the questionsfind their answers from. Each document can con-tain one or more sentences, in which answers forcoming questions are annotated for training. Doc-uments may simply be Wikipedia articles, newsarticles, fictional stories, etc. Questions are treatedas regular documents containing only one sentence.
For the generation of syntactic and semantic struc-tures, we used the part-of-speech tagger (Choi andPalmer, 2012), the dependency parser (Choi andMcCallum, 2013), the semantic role labeler (Choiand Palmer, 2011), and the coreference resolutiontool in ClearNLP . Ensuring good and robust accu-racy for these NLP tools is important because allthe following modules depend on their output. The field extractor takes the linguistic structuresfrom the NLP tools and decomposes them intomultiple fields (Section 3.3). All fields extractedfrom the documents are passed to the index engine,whereas fields extracted from the questions are sentdirectly to the answer ranker module.
The index engine is a search server that receivesa list of fields decomposed by the field extractor,indexes terms in the fields, and responses to thequeries generated from questions with their rele-vance scores. We used Elastic Search , as it pro-vides a distributed, multi-tenancy-capable search. The answer ranker takes the decomposed fields ex-tracted from a question, converts them into queries,and builds a matrix of documents with their rel-evance scores across all fields through the indexengine (Section 3.4). It also uses different weightsfor individual fields trained by statistical modeling(Section 3.5). .3 Structural decomposition Julie is either in the school or the cinema
Field Extractor Indexed Documents … {julie_A1_is, school_A2_is, cinema_A2_is …}f Disk
Index {july_nsubj, is_root, either_preconj, …}f {julie, is, either, in, the, school, …}f {…}f n LexiconSyntaxSemantics
More Fields … Figure 2: The flow of the sentence,
Julie is eitherin the school or the cinema , through our system.Each sentence is represented by the index engineas a document with multiple fields grouped intocategories. Figure 2 shows an example of how thesentence is decomposed into multiple fields con-sisting of syntactic and semantic structures. Dueto the extensible nature of our field extractor, ad-ditional groups and fields can be easily integrated.Currently, our system supports 24 fields groupedinto the following three categories: • Lexical fields (e.g., word-forms, lemmas). • Syntactic fields (e.g., dependency labels). • Semantic fields (e.g., semantic roles, distancesbetween predicates).
When a question q is asked, it is decomposed intothe n -number of fields. Each field is transformedinto a query where certain words are replaced withwildcards (e.g., { where a1, is pred, she a2 } →{ * a1 is pred she a2 } ). Then, the relevance score r is measured between each field in the questionand the same field in each document d t ∈ D by theindex engine. The product of the relevance scores We set the elastic search results limit to 20. and individual weights for all fields are summed,and the document ˆ d with the highest score f istaken as the answer. Note that in our dataset, eachdocument contains only one sentence so that re-trieving a document is equivalent to retrieving asentence. The following equations describe howthe document ˆ d is selected by measuring the overallscore f ( q, d t ) using the relevance scores r ( q i , d ti ) and the weights λ i . ˆ d = arg max d t ∈ D f ( q, d t ) f ( q, d t ) = n (cid:88) i =1 λ i · r ( q i , d ti ) r ( q i , d ti ) = (cid:88) v ∈ q i ∩ d ti tf ti ( v ) · idf i ( v ) · norm ti ( v ) Algorithm 1 shows how the weights for all fieldsare learned during training. We adapt the averagedperceptron algorithm, which has been widely usedfor many NLP tasks. All the weights (cid:126)λ are initial-ized to 1. For each question q ∈ Q , it predicts thedocument ˆ d that most likely contains the answer. If ˆ d is incorrect, then it compares the relevance score r between ( q, ˆ d ) and ( q, d ) for each field, and up-dates the weight accordingly, where d is the truedocument from the oracle. This procedure is re-peated multiple times through iterations. Finally,the algorithm returns the averaged weights, whereeach dimension represents the weight for each field. Algorithm 1
Averaged perceptron training.
Input: D : document set , Q : question set . M : max-number of iterations, α : learning rate. Output:
The averaged weight vector.1: (cid:126)λ ← (cid:126)λ (cid:48) ← for iter ∈ [1 , M ] do foreach q ∈ Q do ˆ d = arg max d t ∈ D f ( q, d t ) if ˆ d (cid:54) = d then d is the oracle6: foreach i ∈ [1 , n ] do δ ← α · sign[ r ( q i , d i ) − r ( q i , ˆ d i )] λ i ← λ i + δ (cid:126)λ (cid:48) ← (cid:126)λ (cid:48) + (cid:126)λ return (cid:126)λ (cid:48) · M ∗| Q | All hyper-parameters were optimized on the devel-opment sets and evaluated on the test sets. Forour experiments, we used the following hyper-parameters: M = 40 , α = 0 . . ype Lexical Lexical + Syntax Lexical + Syntax + Semantics λ = 1 λ is learned λ = 1 λ is learned λ = 1 λ is learnedMAP MRR MAP MRR MAP MRR MAP MRR MAP MRR MAP MRR1 (qa1) 39.62 61.73 39.62 61.73 29.90 48.05 40.50 61.47 72.60 85.07 Avg. 44.45 61.25 44.63 61.37 45.16 60.34 48.41 63.76 59.60 73.70
Table 1: Results from our question-answering system on 8 types of questions in the bAbI tasks.
Our approach is evaluated on a subset of the bAbItasks (Weston et al., 2015). The original data con-tains 20 tasks, where each task represents a dif-ferent kind of question answering challenge. Weselect 8 tasks, in which answer for a single questionis located within a single sentence. For consistencyand replicability, we follow the same training, de-velopment, and evaluation set splits as provided,where every set contains 1,000 questions.For the evaluation metrics, we use mean averageprecision (
MAP ) and mean reciprocal rank (
MRR )of the top-3 predictions. The mean average preci-sion is measured by counting the number of ques-tions, for which sentences containing the answersare correctly selected as the best predictions. Thereciprocal rank of a query response is the multi-plicative inverse of the rank of the first correct an-swer. Mean reciprocal rank is the average of thereciprocal ranks of all question queries.
Table 1 shows the results from our system on dif-ferent types of questions. The
MAP and
MRR showclear correlation with respect to the number of ac-tive fields. For the majority of tasks, using only thelexical fields does not perform well. The fictionalstories included in this data often contain multipleoccurrences of the same lexicons, and the lexicalfields alone are not able to select the correct answer.Significantly lower accuracy for the last task is dueto a fact that besides an answer is located withina single sentence, multiple passages for the singlequestion are required to correctly locate the sen-tence with the answers. Lexical fields coupled withonly syntactic fields do not perform much better. Itmay be due to a fact that the syntactic fields con- taining ordinary dependency labels do not providesufficient context-wise information so that they donot generate enough features for statistical learn-ing to capture specific characteristic of the context.The significant improvement, however, is reachedwhen the semantics fields are added as they providedeeper understanding of the context.Not that this data set has also been used for eval-uating the Memory Networks approach to ques-tion answering (Weston et al., 2015). The authorsachieved high accuracy, reaching 100% in severaltasks; however, our work still finds its own valuebecause our approach is completely data-drivensuch that it can be easily adapted or extended toother types of questions. As a matter of fact, weare using the same system for all tasks with differ-ent trained models, yet still able to achieve highaccuracy for most tasks we evaluate on.
This paper presents a multi-field weighted indexingapproach for question answering. Our system de-composes linguistic structures into multiple fields,indexes terms of individual fields, and retrieves thedocuments containing the answers with respect tothe relevance scores weighted differently. We ob-serve significant improvement as we add more se-mantic fields and apply averaged perceptron learn-ing to statistically designate weights for the fields.In the future, we plan to extend our work by in-tegrating additional layers of fields (e.g., Freebase,WordNet). Furthermore, we plan to improve ourNLP tools to enable even deeper understanding ofthe context for more complex question answering.
References
Jinho D. Choi and Andrew McCallum. 2013.Transition-based Dependency Parsing with Selec-ional Branching. In
Proceedings of the 51st AnnualMeeting of the Association for Computational Lin-guistics (Volume 1: Long Papers) , ACL’13, pages1052–1062, Sofia, Bulgaria, August.Jinho D. Choi and Martha Palmer. 2011. Transition-based Semantic Role Labeling Using PredicateArgument Clustering. In
Proceedings of ACLworkshop on Relational Models of Semantics ,RELMS’11, pages 37–45.Jinho D. Choi and Martha Palmer. 2012. Fast and Ro-bust Part-of-Speech Tagging Using Dynamic ModelSelection. In
Proceedings of the 50th Annual Meet-ing of the Association for Computational Linguistics(Volume 2: Short Papers) , ACL’12, pages 363–367,Jeju Island, Korea, July.Anthony Fader, Luke Zettlemoyer, and Oren Etzioni.2013. Paraphrase-Driven Learning for Open Ques-tion Answering. In
Proceedings of the 51st AnnualMeeting of the Association for Computational Lin-guistics , ACL’13, pages 1608–1618.David A. Ferrucci, Eric W. Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya Kalyan-pur, Adam Lally, J. William Murdock, Eric Nyberg,John M. Prager, Nico Schlaefer, and Christopher A.Welty. 2010. Building Watson: An Overview of theDeepQA Project.
AI Magazine , 31(3):59–79.Ben Hixon, Peter Clark, and Hannaneh Hajishirzi.2015. Learning Knowledge Graphs for Ques-tion Answering through Conversational Dialog. In
Proceedings of the 2015 Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies ,NAACL’15, pages 851–861.Oleksandr Kolomiyets and Marie-Francine Moens.2011. A Survey on Question Answering Technologyfrom an Information Retrieval Perspective.
Informa-tion Sciences , 181(24):5412–5434.Alessandro Moschitti and Silvia Quarteroni. 2011.Linguistic kernels for answer re-ranking in ques-tion answering systems.
Information and Process-ing Management , 47(6):825–842.Luiz Augusto Pizzato and Diego Moll´a. 2008. In-dexing on Semantic Roles for Question Answering.In
Proceedings of the 2nd workshop on InformationRetrieval for Question Answering , IR4QA’08, pages74–81.Barry Schiffman, Kathleen McKeown, Ralph Grish-man, and James Allan. 2007. Question AnsweringUsing Integrated Information Retrieval and Informa-tion Extraction. In
The Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics , ACL’07, pages 532–539.Dan Shen and Mirella Lapata. 2007. Using Seman-tic Roles to Improve Question Answering. In
Pro-ceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Com-putational Natural Language Learning , EMNLP-CoNLL’07, pages 12–21.Jason Weston, Antoine Bordes, Sumit Chopra, andTomas Mikolov. 2015. Towards AI-Complete Ques-tion Answering: A Set of Prerequisite Toy Tasks. arXiv:1502.05698 .Wen-tau Yih, Ming-Wei Chang, Christopher Meek, andAndrzej Pastusiak. 2013. Question Answering Us-ing Enhanced Lexical Semantic Models. In
Pro-ceedings of the 51st Annual Meeting of the Associ-ation for Computational Linguistics , ACL’13, pages1744–1753.Wen-tau Yih, Xiaodong He, and Christopher Meek.2014. Semantic Parsing for Single-Relation Ques-tion Answering. In