Iñaki Alegria
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Iñaki Alegria.
cross language evaluation forum | 2008
Pamela Forner; Anselmo Peñas; Eneko Agirre; Iñaki Alegria; Corina Forăscu; Nicolas Moreau; Petya Osenova; Prokopis Prokopidis; Paulo Rocha; Bogdan Sacaleanu; Richard F. E. Sutcliffe; Erik F. Tjong Kim Sang
The general aim of the third CLEF Multilingual Question Answering Track was to set up a common and replicable evaluation framework to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages. Nine target languages and ten source languages were exploited to enact 8 monolingual and 73 cross-language tasks. Twenty-four groups participated in the exercise. Overall results showed a general increase in performance in comparison to last year. The best performing monolingual system irrespective of target language answered 64.5% of the questions correctly (in the monolingual Portuguese task), while the average of the best performances for each target language was 42.6%. The cross-language step instead entailed a considerable drop in performance. In addition to accuracy, the organisers also measured the relation between the correctness of an answer and a system’s stated confidence in it, showing that the best systems did not always provide the most reliable confidence score. We provide an overview of the 2005 QA track, detail the procedure followed to build the test sets and present a general analysis of the results.
cross language evaluation forum | 2009
Anselmo Peñas; Pamela Forner; Richard F. E. Sutcliffe; Álvaro Rodrigo; Corina Forăscu; Iñaki Alegria; Danilo Giampiccolo; Nicolas Moreau; Petya Osenova
This paper describes the first round of ResPubliQA, a Question Answering (QA) evaluation task over European legislation, proposed at the Cross Language Evaluation Forum (CLEF) 2009. The exercise consists of extracting a relevant paragraph of text that satisfies completely the information need expressed by a natural language question. The general goals of this exercise are (i) to study if the current QA technologies tuned for newswire collections and Wikipedia can be adapted to a new domain (law in this case); (ii) to move to a more realistic scenario, considering people close to law as users, and paragraphs as system output; (iii) to compare current QA technologies with pure Information Retrieval (IR) approaches; and (iv) to introduce in QA systems the Answer Validation technologies developed in the past three years. The paper describes the task in more detail, presenting the different types of questions, the methodology for the creation of the test sets and the new evaluation measure, and analyzing the results obtained by systems and the more successful approaches. Eleven groups participated with 28 runs. In addition, we evaluated 16 baseline runs (2 per language) based only in pure IR approach, for comparison purposes. Considering accuracy, scores were generally higher than in previous QA campaigns.
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing | 2004
Iñaki Alegria; Olatz Ansa; Xabier Artola; Nerea Ezeiza; Koldo Gojenola; Ruben Urizar
This paper describes the representation of Basque Multiword Lexical Units and the automatic processing of Multiword Expressions. After discussing and stating which kind of multiword expressions we consider to be processed at the current stage of the work, we present the representation schema of the corresponding lexical units in a general-purpose lexical database. Due to its expressive power, the schema can deal not only with fixed expressions but also with morphosyntactically flexible constructions. It also allows us to lemmatize word combinations as a unit and yet to parse the components individually if necessary. Moreover, we describe HABIL, a tool for the automatic processing of these expressions, and we give some evaluation results. This work must be placed in a general framework of written Basque processing tools, which currently ranges from the tokenization and segmentation of single words up to the syntactic tagging of general texts.
international conference on computational linguistics | 2009
Iñaki Alegria; Arantza Díaz de Ilarraza; Gorka Labaka; Mikel Lersundi; Aingeru Mayor; Kepa Sarasola
We present an open architecture we have designed in a project for machine translation from Spanish into Basque based on rules. The main objective has been the construction of an open, reusable and interoperable framework which can be improved in the next future combining it with the statistical model. The MT architecture reuses several open tools and it is based on an unique XML format for the flow between the different modules, which makes easer the interaction among different developers of tools and resources. Being Basque a resource-poor language this is a key feature in our aim for future improvements and extensions of the engine. The result is an open source software which can be downloaded from matxin.sourceforge.net,and we think it could be adapted to translating between other languages with few resources.
Machine Translation | 2011
Aingeru Mayor; Iñaki Alegria; Arantza Díaz de Ilarraza; Gorka Labaka; Mikel Lersundi; Kepa Sarasola
We present the first publicly available machine translation (MT) system for Basque. The fact that Basque is both a morphologically rich and less-resourced language makes the use of statistical approaches difficult, and raises the need to develop a rule-based architecture which can be combined in the future with statistical techniques. The MT architecture proposed reuses several open-source tools and is based on a unique XML format to facilitate the flow between the different modules, which eases the interaction among different developers of tools and resources. The result is the rule-based Matxin MT system, an open-source toolkit, whose first implementation translates from Spanish to Basque. We have performed innovative work on the following tasks: construction of a dependency analyser for Spanish, use of rich linguistic information to translate prepositions and syntactic functions (such as subject and object markers), construction of an efficient module for verbal chunk transfer, and design and implementation of modules for ordering words and phrases, independently of the source language.
international conference on computational linguistics | 2000
Itziar Aduriz; Eneko Agirre; Izaskun Aldezabal; Iñaki Alegria; Xabier Arregi; Jose Mari Arriola; Xabier Artola; Koldo Gojenola; A. Maritxalar; Kepa Sarasola; Miriam Urkia
Agglutinative languages present rich morphology and for some applications they need deep analysis at word level. The work here presented proposes a model for designing a full morphological analyzer.The model integrates the two-level formalism and a unification-based formalism. In contrast to other works, we propose to separate the treatment of sequential and non-sequential morphotactic constraints. Sequential constraints are applied in the segmentation phase, and non-sequential ones in the final feature-combination phase. Early application of sequential morphotactic constraints during the segmentation process makes feasible an efficient implementation of the full morphological analyzer.The result of this research has been the design and implementation of a full morphosyntactic analysis procedure for each word in unrestricted Basque texts.
Applied Soft Computing | 2011
Ana Zelaia; Iñaki Alegria; Olatz Arregi; Basilio Sierra
Abstract: This article presents a multiclassifier approach for multiclass/multilabel document categorization problems. For the categorization process, we use a reduced vector representation obtained by SVD for training and testing documents, and a set of k-NN classifiers to predict the category of test documents; each k-NN classifier uses a reduced database subsampled from the original training database. To perform multilabeling classifications, a new approach based on Bayesian weighted voting is also presented. The good results obtained in the experiments give an indication of the potential of the proposed approach.
international conference on implementation and application of automata | 2001
Iñaki Alegria; Maxux J. Aranzabe; Nerea Ezeiza; Aitzol Ezeiza; Ruben Urizar
This paper describes the components used in the design and implementation of NLP tools for Basque. These components are based on finite state technology and are devoted to the morphological analysis of Basque, an agglutinative pre-Indo-European language. We think that our design can be interesting for the treatment of other languages. The main components developed are a general and robust morphological analyser/generator and a spelling checker/corrector for Basque named Xuxen. The analyser is a basic tool for current and future work on NLP of Basque, such as the lemmatiser/tagger Euslem, an Intranet search engine or an assistant for verse-making.
finite-state methods and natural language processing | 2005
Iñaki Alegria; Arantza Díaz de Ilarraza; Gorka Labaka; Mikel Lersundi; Aingeru Mayor; Kepa Sarasola
We are developing an Spanish-Basque MT system using the traditional transfer model and based on shallow and dependency parsing. The project is based on the previous work of our group but integrated in OpenTrad initiative [2]. This abstract sumarizes the current status of development of an FST grammar for the structural transfer of verb chains. This task is quite complex due to the high distance between both languages. In the actual implementation we are using XRCE Finite States Tools [1].
conference of the european chapter of the association for computational linguistics | 1993
Itziar Aduriz; Eneko Agirre; Iñaki Alegria; Xabier Arregi; Jose Mari Arriola; Xabier Artola; A. Díaz de Ilarraza; Nerea Ezeiza; Montse Maritxalar; Kepa Sarasola; Miriam Urkia
Xuxen is a spelling checker/corrector for Basque which is going to be comercialized next year. The checker recognizes a word-form if a correct morphological breakdown is allowed. The morphological analysis is based on two-level morphology. The correction method distinguishes between orthographic errors and typographical errors. • Typographical errors (or misstypings) are uncognitive errors which do not follow linguistic criteria. • Orthographic errors are cognitive errors which occur when the writer does not know or has forgotten the correct spelling for a word. They are more persistent because of their cognitive nature, they leave worse impression and, finally, its treatment is an interesting application for language standardization purposes.