Mikel Iruskieta
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mikel Iruskieta.
language resources and evaluation | 2015
Mikel Iruskieta; Iria da Cunha; Maite Taboada
Explaining why the same passage may have different rhetorical structures when conveyed in different languages remains an open question. Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts may differ in their rhetorical structures. To achieve these aims we have carried out a contrastive analysis, comparing a corpus of parallel English, Spanish and Basque texts, using Rhetorical Structure Theory. We propose a method to describe the main linguistic differences among the rhetorical structures of the three languages in the two annotation stages (segmentation and rhetorical analysis). We show a new type of comparison that has important advantages with regard to the quantitative method usually employed: it provides an accurate measurement of inter-annotator agreement, and it pinpoints sources of disagreement among annotators. With the use of this new method, we show how translation strategies affect discourse structure.
Discourse Studies | 2010
Iria da Cunha; Mikel Iruskieta
The study we report in this article addresses the results of comparing the rhetorical trees from two different languages carried out by two annotators starting from the Rhetorical Structure Theory (RST). Furthermore, we investigate the methodology for a suitable evaluation, both quantitative and qualitative, of these trees. Our corpus contains abstracts of medical research articles written both in Spanish and Basque, and extracted from Gaceta Médica de Bilbao (‘Medical Journal of Bilbao’). The results demonstrate that almost half of the annotator disagreement is due to the use of translation strategies that notably affect rhetorical structures.
Corpus Linguistics and Linguistic Theory | 2015
Mikel Iruskieta; Arantza Díaz de Ilarraza; Mikel Lersundi
Abstract This article presents a discourse annotation methodology based on Rhetorical Structure Theory and an empirical study of annotating a corpus of specialized medical texts in Basque. The annotation process includes two phases: segmentation and annotation of rhetorical relations. Phase one entails an initial study which leads to establishing linguistic criteria for sentence-based segmentation; a second phase focuses on annotation of rhetorical relations. After establishing discourse segments and rhetorical relations, the annotation process is analyzed and evaluated by means of the method commonly used in RST (Marcu 2000). Inconsistencies detected in the evaluation method lead the authors to redefine some criteria of the evaluation method. As a result of this work, a small annotated Basque-language corpus is provided to scientific community.
international conference on computational linguistics | 2009
Larraitz Uria; Ainara Estarrona; Izaskun Aldezabal; María Jesús Aranzabe; Arantza Díaz de Ilarraza; Mikel Iruskieta
The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntactic annotation process. In this article we present the quantitative and qualitative results of this evaluation.
Procesamiento Del Lenguaje Natural | 2018
Kepa Bengoetxea; Mikel Iruskieta; Juliano D. Antonio
Understanding or writing properly the main idea or the Central Unit (CU) of a text is a very important task in exams. So, detecting automatically the CU may be of interest in language evaluation tasks. This paper presents a CU detector based on machine learning techniques for argumentative answer texts in Brazilian Portuguese. Results show that the detection of CUs following machine learning techniques in argumentative answer texts is better that those using rules.
Calidoscopio | 2010
Mikel Iruskieta; Iria da Cunha
This study presents our research on the potential of using rhetorical relations and superfi cial marks evidencing them to discriminate among specialized texts of different domains but with a high specialization level, in two very different languages as Basque and Spanish. For our analysis, we employ of the Rhetorical Structure Theory (RST). We compiled a parallel corpus of Spanish-Basque specialized texts that contains two subcorpora of medical and terminological texts. We marked these texts with RST rhetorical relations and we detected the discourse markers that evidence them. Finally, we noted that certain types of rhetorical relations and the amount of used discourse markers allow us to differentiate among specialized texts of different domains in both Spanish and Basque. Key words: Rhetorical Structure Theory, rhetorical relations, discourse markers, annotation, specialized text, contrastive study, Spanish, Basque.
international conference on computational linguistics | 2014
Mikel Iruskieta; Arantza Díaz de Ilarraza; Mikel Lersundi
international conference on computational linguistics | 2018
Shuyuan Cao; Iria da Cunha; Mikel Iruskieta
Procesamiento Del Lenguaje Natural | 2018
Kepa Bengoetxea; Mikel Iruskieta
Procesamiento Del Lenguaje Natural | 2017
Arantxa Otegi; Oier Imaz; Arantza Díaz de Ilarraza; Mikel Iruskieta; Larraitz Uria