Jori Mur
University of Groningen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jori Mur.
international conference on computational linguistics | 2004
Valentin Jijkoun; Maarten de Rijke; Jori Mur
We investigate the impact of the precision/recall trade-off of information extraction on the performance of an offline corpus-based question answering (QA) system. One of our findings is that, because of the robust final answer selection mechanism of the QA system, recall is more important. We show that the recall of the extraction component can be improved using syntactic parsing instead of more common surface text patterns, substantially increasing the number of factoid questions answered by the QA system.
cross language evaluation forum | 2005
Gosse Bouma; Jori Mur; Gertjan van Noord; Lonneke van der Plas; Jörg Tiedemann
Joost is a question answering system for Dutch which makes extensive use of dependency relations. It answers questions either by table look-up, or by searching for answers in paragraphs returned by IR. Syntactic similarity is used to identify and rank potential answers. Tables were constructed by mining the CLEF corpus, which has been syntactically analyzed in full.
international conference on computational linguistics | 2008
Jörg Tiedemann; Jori Mur
Passage retrieval is used in QA to filter large document collections in order to find text units relevant for answering given questions. In our QA system we apply standard IR techniques and index-time passaging in the retrieval component. In this paper we investigate several ways of dividing documents into passages. In particular we look at semantically motivated approaches (using coreference chains and discourse clues) compared with simple window-based techniques. We evaluate retrieval performance and the overall QA performance in order to study the impact of the different segmentation approaches. From our experiments we can conclude that the simple techniques using fixed-sized windows clearly outperform the semantically motivated approaches, which indicates that uniformity in size seems to be more important than semantic coherence in our setup.
cross language evaluation forum | 2008
Gosse Bouma; G. Kloosterman; Jori Mur; Gertjan van Noord; Lonneke van der Plas; Jörg Tiedemann
We describe our system for the monolingual Dutch and multilingual English to Dutch QA tasks. We describe the preprocessing of Wikipedia, inclusion of query expansion in IR, anaphora resolution in follow-up questions, and a question classification module for the multilingual task. Our best runs achieved 25.5% accuracy for the Dutch monolingual task, and 13.5% accuracy for the multilingual task.
Archive | 2010
Lonneke van der Plas; Gosse Bouma; Jori Mur; Chu-Ren Huang; Nicoletta Calzolari; Aldo Gangemi; Alessandro Lenci; Alessandro Oltramari; Laurent Prévot
Lexico-semantic knowledge is becoming increasingly important within the area of natural language processing, especially for applications, such as Word Sense Disambiguation, Information Extraction and Question Answering (QA). Although the coverage of handmade resources, such as WordNet (Fellbaum, 1998), in general is impressive, coverage problems still exist for those applications involving specific domains or languages other than English. We are interested in using lexico-semantic knowledge in an open-domain question answering system for Dutch. Obtaining such knowledge from existing resources is possible, but only to a certain extent. The most important resource for our research is the Dutch portion of EuroWord-Net (Vossen, 1998), however its size is only half of that of the English WordNet. Therefore, many of the lexical items used in the QA task of the Cross Language Evaluation Forum (CLEF 1) for Dutch cannot be found in EuroWordNet. In addition, information regarding the classes to which named entities belong, e.g. Narvik IS-A harbour, has been shown to be useful for QA, but such information is typically absent from hand-built resources. For these reasons, we are interested in investigating methods which acquire lexico-semantic knowledge automatically from text corpora.
Theory and Applications of Natural Language Processing | 2011
Gosse Bouma; I. Fahmi; Jori Mur
One of the most accurate methods in Question Answering (QA) uses off-line information extraction to find answers for frequently asked questions. It requires automatic extraction from text of all relation instances for relations that users frequently ask for. In this chapter, two methods are presented for learning relation instances for relations relevant in a closed and open domain (medical) QA system. Both methods try to learn automatic dependency paths that typically connect two arguments of a given relation. The first (lightly supervised) method starts from a seed list of argument instances, and extracts dependency paths from all sentences in which a seed pair occurs. This method works well for large text collections and for seeds which are easily identified, such as named entities, and is well-suited for open domain QA. A second experiment concentrates on medical relation extraction for the question answering module of the IMIX system. The IMIX corpus is relatively small and relation instances may contain complex noun phrases that do not occur frequently in the exact same form in the corpus. In this case, learning from annotated data is necessary. Dependency patterns enriched with semantic concept labels are shown to give accurate results for relations that are relevant for a medical QA system. Both methods improve the performance of the Dutch QA system Joost.
cross language evaluation forum | 2006
Gosse Bouma; I. Fahmi; Jori Mur; Gertjan van Noord; Lonneke van der Plas; Jörg Tiedemann
We describe the system of the University of Groningen for the monolingual Dutch and multilingual English to Dutch QA tasks. First, we give a brief outline of the architecture of our QA-system, which makes heavy use of syntactic information. Next, we describe the modules that were improved or developed especially for the CLEF tasks, among others incorporation of syntactic knowledge in IR, incorporation of lexical equivalences and coreference resolution, and a baseline multilingual (English to Dutch) QA system, which uses a combination of Systran and Wikipedia (for term recognition and translation) for question translation. For non-list questions, 31% (20%) of the highest ranked answers returned by the monolingual (multilingual) system were correct.
Traitement Automatique des Langues (TAL) | 2005
Gosse Bouma; I. Fahmi; Jori Mur; van Gerardus Noord; M.L.E. van der Plas; Jörg Tiedemann
CLEF (Working Notes) | 2006
Jori Mur; M.L.E. van der Plas; Jörg Tiedemann
Archive | 1995
Jori Mur; Lonneke van der Plas