Mikel Iruskieta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mikel Iruskieta is active.

Explore More

Publication

Featured researches published by Mikel Iruskieta.

language resources and evaluation | 2015

A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora

Mikel Iruskieta; Iria da Cunha; Maite Taboada

Explaining why the same passage may have different rhetorical structures when conveyed in different languages remains an open question. Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts may differ in their rhetorical structures. To achieve these aims we have carried out a contrastive analysis, comparing a corpus of parallel English, Spanish and Basque texts, using Rhetorical Structure Theory. We propose a method to describe the main linguistic differences among the rhetorical structures of the three languages in the two annotation stages (segmentation and rhetorical analysis). We show a new type of comparison that has important advantages with regard to the quantitative method usually employed: it provides an accurate measurement of inter-annotator agreement, and it pinpoints sources of disagreement among annotators. With the use of this new method, we show how translation strategies affect discourse structure.

Discourse Studies | 2010

Comparing rhetorical structures in different languages: The influence of translation strategies

Iria da Cunha; Mikel Iruskieta

The study we report in this article addresses the results of comparing the rhetorical trees from two different languages carried out by two annotators starting from the Rhetorical Structure Theory (RST). Furthermore, we investigate the methodology for a suitable evaluation, both quantitative and qualitative, of these trees. Our corpus contains abstracts of medical research articles written both in Spanish and Basque, and extracted from Gaceta Médica de Bilbao (‘Medical Journal of Bilbao’). The results demonstrate that almost half of the annotator disagreement is due to the use of translation strategies that notably affect rhetorical structures.

Corpus Linguistics and Linguistic Theory | 2015

Establishing criteria for RST-based discourse segmentation and annotation for texts in Basque

Mikel Iruskieta; Arantza Díaz de Ilarraza; Mikel Lersundi

Abstract This article presents a discourse annotation methodology based on Rhetorical Structure Theory and an empirical study of annotating a corpus of specialized medical texts in Basque. The annotation process includes two phases: segmentation and annotation of rhetorical relations. Phase one entails an initial study which leads to establishing linguistic criteria for sentence-based segmentation; a second phase focuses on annotation of rhetorical relations. After establishing discourse segments and rhetorical relations, the annotation process is analyzed and evaluated by means of the method commonly used in RST (Marcu 2000). Inconsistencies detected in the evaluation method lead the authors to redefine some criteria of the evaluation method. As a result of this work, a small annotated Basque-language corpus is provided to scientific community.

international conference on computational linguistics | 2009

Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque

Larraitz Uria; Ainara Estarrona; Izaskun Aldezabal; María Jesús Aranzabe; Arantza Díaz de Ilarraza; Mikel Iruskieta

The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntactic annotation process. In this article we present the quantitative and qualitative results of this evaluation.

Procesamiento Del Lenguaje Natural | 2018

Detecting the Central Units of Brazilian Portuguese argumentative answer texts

Kepa Bengoetxea; Mikel Iruskieta; Juliano D. Antonio

Understanding or writing properly the main idea or the Central Unit (CU) of a text is a very important task in exams. So, detecting automatically the CU may be of interest in language evaluation tasks. This paper presents a CU detector based on machine learning techniques for argumentative answer texts in Brazilian Portuguese. Results show that the detection of CUs following machine learning techniques in argumentative answer texts is better that those using rules.

Calidoscopio | 2010

El potencial de las relaciones retóricas para la discriminación de textos especializados de diferentes dominios en euskera y español

Mikel Iruskieta; Iria da Cunha

This study presents our research on the potential of using rhetorical relations and superfi cial marks evidencing them to discriminate among specialized texts of different domains but with a high specialization level, in two very different languages as Basque and Spanish. For our analysis, we employ of the Rhetorical Structure Theory (RST). We compiled a parallel corpus of Spanish-Basque specialized texts that contains two subcorpora of medical and terminological texts. We marked these texts with RST rhetorical relations and we detected the discourse markers that evidence them. Finally, we noted that certain types of rhetorical relations and the amount of used discourse markers allow us to differentiate among specialized texts of different domains in both Spanish and Basque. Key words: Rhetorical Structure Theory, rhetorical relations, discourse markers, annotation, specialized text, contrastive study, Spanish, Basque.

international conference on computational linguistics | 2014