Is this you? Create Your Porfile

Xabier Arregi

University of the Basque Country

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xabier Arregi is active.

Explore More

Publication

Featured researches published by Xabier Arregi.

Knowledge and Information Systems | 2015

Using knowledge-based relatedness for information retrieval

Arantxa Otegi; Xabier Arregi; Olatz Ansa; Eneko Agirre

Traditional information retrieval (IR) systems use keywords to index and retrieve documents. The limitations of keywords were recognized since the early days, specially when different but closely related words are used in the query and the relevant document. Query expansion techniques like pseudo-relevance feedback (PRF) and document clustering techniques rely on the target document set in order to bridge the gap between those words. This paper explores the use of knowledge-based semantic relatedness techniques to overcome the vocabulary mismatch between the query and documents, both on IR and Passage Retrieval for question answering. We performed query expansion and document expansion using WordNet, with positive effects over a language modeling baseline on three datasets, and over PRF on two of those datasets. Our analysis shows that our models and PRF are complementary; in that, PRF is better for easy queries, and our models are stronger for difficult queries and that our models generalize better to other collections, being more robust to parameter adjustments. In addition, we show that our method has a positive impact in an end-to-end question answering system for Basque and that it can be readily applied to other knowledge bases, as our good results using Wikipedia show, paving the way for the use of other knowledge structures such as medical ontologies and linked data repositories.

international conference on computational linguistics | 2000

A word-grammar based morphological analyzer for agglutinative languages

Itziar Aduriz; Eneko Agirre; Izaskun Aldezabal; Iñaki Alegria; Xabier Arregi; Jose Mari Arriola; Xabier Artola; Koldo Gojenola; A. Maritxalar; Kepa Sarasola; Miriam Urkia

Agglutinative languages present rich morphology and for some applications they need deep analysis at word level. The work here presented proposes a model for designing a full morphological analyzer.The model integrates the two-level formalism and a unification-based formalism. In contrast to other works, we propose to separate the treatment of sequential and non-sequential morphotactic constraints. Sequential constraints are applied in the segmentation phase, and non-sequential ones in the final feature-combination phase. Early application of sequential morphotactic constraints during the segmentation process makes feasible an efficient implementation of the full morphological analyzer.The result of this research has been the design and implementation of a full morphosyntactic analysis procedure for each word in unrestricted Basque texts.

conference of the european chapter of the association for computational linguistics | 1993

A morphological analysis based method for spelling correction

Itziar Aduriz; Eneko Agirre; Iñaki Alegria; Xabier Arregi; Jose Mari Arriola; Xabier Artola; A. Díaz de Ilarraza; Nerea Ezeiza; Montse Maritxalar; Kepa Sarasola; Miriam Urkia

Xuxen is a spelling checker/corrector for Basque which is going to be comercialized next year. The checker recognizes a word-form if a correct morphological breakdown is allowed. The morphological analysis is based on two-level morphology. The correction method distinguishes between orthographic errors and typographical errors. • Typographical errors (or misstypings) are uncognitive errors which do not follow linguistic criteria. • Orthographic errors are cognitive errors which occur when the writer does not know or has forgotten the correct spelling for a word. They are more persistent because of their cognitive nature, they leave worse impression and, finally, its treatment is an interesting application for language standardization purposes.

Literary and Linguistic Computing | 2001

An Assistant Tool for Verse-Making in Basque Based on Two-Level Morphology

Bertol Arrieta; Iñaki Alegria; Xabier Arregi

In this paper we present a specialized word generator, which has been designed as an assistant tool for Basque troubadours. Such a tool allows verse-writers to generate all the words that match with a given word termination. We deal with some interesting aspects, i.e. the dimension of the generated list and the need to establish an order of relevance among the listed items. This work can be seen as a way of reusing computational linguistic tools in the context of the Basque cultural means of expression. The technical foundations of this tool lie in a two-level morphological processor. The way in which words must be generated (starting from the end of the word) leads us to invert the generation process.

cross language evaluation forum | 2009

Elhuyar-IXA: semantic relatedness and cross-lingual passage retrieval

Eneko Agirre; Olatz Ansa; Xabier Arregi; Maddalen Lopez de Lacalle; Arantxa Otegi; Xabier Saralegi; Hugo Zaragoza

This article describes the participation of the joint Elhuyar-IXA group in the ResPubliQA exercise at QA&CLEF. In particular, we participated in the English-English monolingual task and in the Basque-English crosslingual one. Our focus has been threefold: (1) to check to what extent information retrieval (IR) can achieve good results in passage retrieval without question analysis and answer validation, (2) to check Machine Readable Dictionary (MRD) techniques for the Basque to English retrieval when faced with the lack of parallel corpora for Basque in this domain, and (3) to check the contribution of semantic relatedness based on WordNet to expand the passages to related words. Our results show that IR provides good results in the monolingual task, that our crosslingual system performs lower than the monolingual runs, and that semantic relatedness improves the results in both tasks (by 6 and 2 points, respectively).

Natural Language Engineering | 1996

Constructing an intelligent dictionary help system

Eneko Agirre; Xabier Arregi; Xabier Artola; A. Díaz de Ilarraza; Kepa Sarasola; Aitor Soroa

This paper discusses different issues in the construction and knowledge representation of an intelligent dictionary help system. The Intelligent Dictionary Help System (IDHS) is conceived as a monolingual (explanatory) dictionary system for human use (Artola and Evrard, 1992). The fact that it is intended for people instead of automatic processing distinguishes it from other systems dealing with the acquisition of semantic knowledge from conventional dictionaries. The system provides various access possibilities to the data, allowing to deduce implicit knowledge from the explicit dictionary information. IDHS deals with reasoning mechanisms analogous to those used by humans when they consult a dictionary. User level functionality of the system has been specified and a prototype has been implemented (Agirre et al., 1994a). A methodology for the extraction of semantic knowledge from a conventional dictionary is described. The method followed in the construction of the phrasal pattern hierarchies required by the parser (Alshawi, 1989) is based on an empirical study carried out on the structure of definition sentences. The results of its application to a real dictionary has shown that the parsing method is particularly suited to the analysis of short definition sentences, as it was the case of the source dictionary. As a result of this process, the characterization of the different lexical-semantic relations between senses is established by means of semantic rules (attached to the patterns); these rules are used for the initial construction of the Dictionary Knowledge Base (DKB). The representation schema proposed for the DKB (Agirre et al., 1994b) is basically a semantic network of frames representing word senses. After construction of the initial DKB, several enrichment processes are performed on the DKB to add new facts to it; these processes are based on the exploitation of the properties of lexical-semantic relations, and also on specially conceived deduction mechanisms. The result of the enrichment processes show the suitability of the representation schema chosen to deduce implicit knowledge. Erroneous deductions are mainly due to incorrect word sense disambiguation.

Natural Language Engineering | 1999

MLDS: A translator-oriented MultiLingual dictionary system

Eneko Agirre; Xabier Arregi; Xabier Artola; A. Díaz de Ilarraza; Kepa Sarasola; Aitor Soroa

This paper focuses on the design methodology of the MultiLingual Dictionary-System (MLDS), which is a human-oriented tool for assisting in the task of translating lexical units, oriented to translators and conceived from studies carried out with translators. We describe the model adopted for the representation of multilingual dictionary-knowledge. Such a model allows an enriched exploitation of the lexical-semantic relations extracted from dictionaries. In addition, MLDS is supplied with knowledge about the use of the dictionaries in the process of lexical translation, which was elicitated by means of empirical methods and specified in a formal language. The dictionary-knowledge along with the task-oriented knowledge are used to offer the translator active, anticipative and intelligent assistance.

language and technology conference | 2009

Valuable language resources and applications supporting the use of Basque

Iñaki Alegria; Maxux J. Aranzabe; Xabier Arregi; Xabier Artola; Arantza Díaz de Ilarraza; Aingeru Mayor; Kepa Sarasola

We present some Language Technology applications and resources that have proven to be valuable tools to promote the use of Basque, a low density language. We also present the strategy we have followed for almost twenty years to develop those tools and derived applications as the top of an integrated environment of language resources, language tools and other applications. In our opinion, if Basque is now in a quite good position in Language Technology is because those guidelines have been followed.

flexible query answering systems | 2006

An XML framework for a basque question answering system

Olatz Ansa; Xabier Arregi; Arantxa Otegi; Andoni Valverde

This paper presents a general platform for a Basque monolingual question answering (QA) system. It focuses on the architecture of the platform, paying special attention to: 1) the integration of the development and evaluation environments, and 2) the systematic use of XML declarative files to control the execution of the modules and the communication between them. Moreover, a first pilot experiment is discussed.

Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017) | 2017

Enriching Basque Coreference Resolution System using Semantic Knowledge sources

Ander Soraluze; Olatz Arregi; Xabier Arregi; Arantza Díaz de Ilarraza

In this paper we present a Basque coreference resolution system enriched with semantic knowledge. An error analysis carried out revealed the deficiencies that the system had in resolving coreference cases in which semantic or world knowledge is needed. We attempt to improve the deficiencies using two semantic knowledge sources, specifically Wikipedia and WordNet.

Explore More