Iria da Cunha
Pompeu Fabra University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Iria da Cunha.
mexican international conference on artificial intelligence | 2007
Iria da Cunha; Silvia Fernández; Patricia Velázquez Morales; Jorge Vivaldi; Eric SanJuan; Juan-Manuel Torres-Moreno
In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.
Polibits | 2010
Juan-Manuel Torres-Moreno; Horacio Saggion; Iria da Cunha; Eric SanJuan; Patricia Velázquez-Morales
We study a new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings. The research is carried out using a new content–based evaluation framework called Fresa to compute a variety of divergences among probability distributions. We apply our comparison framework to various well–established content–based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various text summarization tasks including generic multi–document summarization in English and French, focus–based multi–document summarization in English and generic single–document summarization in French and Spanish
mexican international conference on artificial intelligence | 2010
Iria da Cunha; Eric SanJuan; Juan-Manuel Torres-Moreno; Marina Lloberes; Irene Castellón
Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, obtaining promising results.
language resources and evaluation | 2015
Mikel Iruskieta; Iria da Cunha; Maite Taboada
Explaining why the same passage may have different rhetorical structures when conveyed in different languages remains an open question. Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts may differ in their rhetorical structures. To achieve these aims we have carried out a contrastive analysis, comparing a corpus of parallel English, Spanish and Basque texts, using Rhetorical Structure Theory. We propose a method to describe the main linguistic differences among the rhetorical structures of the three languages in the two annotation stages (segmentation and rhetorical analysis). We show a new type of comparison that has important advantages with regard to the quantitative method usually employed: it provides an accurate measurement of inter-annotator agreement, and it pinpoints sources of disagreement among annotators. With the use of this new method, we show how translation strategies affect discourse structure.
Expert Systems With Applications | 2012
Iria da Cunha; Eric San Juan; Juan-Manuel Torres-Moreno; Marina Lloberese; Irene Castellóne
Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, divided in a medical and a terminological subcorpus. We obtain promising results, which means that discourse segmentation is possible using shallow parsing.
mexican international conference on artificial intelligence | 2011
Alejandro Molina; Juan-Manuel Torres-Moreno; Eric SanJuan; Iria da Cunha; Gerardo Sierra; Patricia Velázquez-Morales
Earlier studies have raised the possibility of summarizing at the level of the sentence. This simplification should help in adapting textual content in a limited space. Therefore, sentence compression is an important resource for automatic summarization systems. However, there are few studies that consider sentence-level discourse segmentation for compression task; to our knowledge, none in Spanish. In this paper, we study the relationship between discourse segmentation and compression for sentences in Spanish. We use a discourse segmenter and observe to what extent the passages deleted by annotators fit in discourse structures detected by the system. The main idea is to verify whether the automatic discourse segmentation can serve as a basis in the identification of segments to be eliminated in the sentence compression task. We show that discourse segmentation could be a first solid step towards a sentence compression system.
Discourse Studies | 2010
Iria da Cunha; Mikel Iruskieta
The study we report in this article addresses the results of comparing the rhetorical trees from two different languages carried out by two annotators starting from the Rhetorical Structure Theory (RST). Furthermore, we investigate the methodology for a suitable evaluation, both quantitative and qualitative, of these trees. Our corpus contains abstracts of medical research articles written both in Spanish and Basque, and extracted from Gaceta Médica de Bilbao (‘Medical Journal of Bilbao’). The results demonstrate that almost half of the annotator disagreement is due to the use of translation strategies that notably affect rhetorical structures.
international conference on computational linguistics | 2013
Alejandro Molina; Juan-Manuel Torres-Moreno; Eric SanJuan; Iria da Cunha; Gerardo Sierra Martínez
This paper presents a method for automatic summarization by deleting intra-sentence discourse segments. First, each sentence is divided into elementary discourse units and, then, less informative segments are deleted. To analyze the results, we have set up an annotation campaign, thanks to which we have found interesting aspects regarding the elimination of discourse segments as an alternative to sentence compression task. Results show that the degree of disagreement in determining the optimal compressed sentence is high and increases with the complexity of the sentence. However, there is some agreement on the decision to delete discourse segments. The informativeness of each segment is calculated using textual energy, a method that has shown good results in automatic summarization.
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval | 2010
Jorge Vivaldi; Iria da Cunha; Javier Ramírez
In this paper we present REG, a graph approach to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2010 task (question-answering). To do it, we have extracted the terms and name entities from the queries, in order to obtain a list of terms and name entities related with the main topic of the question. Using this strategy, REG obtained good results regarding performance (measured with the automatic evaluation system FRESA) and readability (measured with human evaluation), being one of the seven best systems into the task.
International Workshop of the Initiative for the Evaluation of XML Retrieval | 2011
Jorge Vivaldi; Iria da Cunha
In this paper, our strategy and results for the INEX@QA 2011 question-answering task are presented. In this task, a set of 50 documents is provided by the search engine Indri, using some queries. The initial queries are titles associated with tweets. Reformulation of these queries is carried out using terminological and named entities information. To design the queries, the full process is divided into 2 steps: a) both titles and tweets are POS tagged, and b) queries are expanded or reformulated, using: terms and named entities included in the title, terms and named entities found in the tweet related to those ones, and Wikipedia redirected terms and named entities from those ones included in the title. In our work, the automatic summarization system REG is used to summarize the 50 documents obtained with these queries. The algorithm models a document as a graph to obtain weighted sentences. A single document is generated and it is considered the answer of the query. This strategy, combining summarization and question reformulation, obtains good results regarding informativeness and readability.