Irene Castellón
University of Barcelona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Irene Castellón.
Computers and The Humanities | 1998
Antonietta Alonge; Nicoletta Calzolari; Piek Vossen; Laura Bloksma; Irene Castellón; Maria Antònia Martí; Wim Peters
In this paper the linguistic design of the database under construction within the EuroWordNet project is described. This is mainly structured along the same lines as the Princeton WordNet, although some changes have been made to the WordNet overall design due to both theoretical and practical reasons. The most important reasons for such changes are the multilinguality of the EuroWordNet database and the fact that it is intended to be used in Language Engineering applications. Thus, i) some relations have been added to those identified in WordNet; ii) some labels have been identified which can be added to the relations in order to make their implications more explicit and precise; iii) some relations, already present in the WordNet design, have been modified in order to specify their role more clearly.
Machine Translation | 1995
Ann Gopestake; Ted Briscoe; Piek Vossen; Alicia Ageno; Irene Castellón; Francesc Ribas; German Rigau; Horacio Rodríguez; Anna Samiotou
In this paper we present a methodology for extracting information about lexical translation equivalences from the machine readable versions of conventional dictionaries (MRDs), and describe a series of experiments on semi-automatic construction of a linked multilinguallexical knowledge base for English, Dutch, and Spanish. We discuss the advantages and limitations of using MRDs that this has revealed, and some strategies we have developed to cover gaps where no direct translation can be found.
mexican international conference on artificial intelligence | 2010
Iria da Cunha; Eric SanJuan; Juan-Manuel Torres-Moreno; Marina Lloberes; Irene Castellón
Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, obtaining promising results.
conference on applied natural language processing | 1992
Alicia Agent; Irene Castellón; Maria Antònia Martí; German Rigau; Francese Ribas; Horaeio Rodriguez; Mariona Taulé; Felisa Verdejo
Knowledge Acquisition constitutes a main problem as regards the development of real Knowledge-based systems. This problem has been dealt with in a variety of ways. One of the most promising paradigms is based on the use of already existing sources in order to extract knowledge from them semiautomatically which will then be used in Knowledge-based applications. The Acquilex Project, within which we are working, follows this paradigm. The basic aim of Acquilex is the development of techniques and methods in order to use Machine Readable Dictionaries (MRD) * for building lexical components for Natural Language Processing Systems. SEISD (Sistema de Extracci6n de Informaci6n Semfintica de Diccionarios) is an environment for extracting semantic information from MRDs [Agent et al. 91b]. The system takes as its input a Lexical Database (LDB) where all the information contained in the MRD has been stored in an structured format. The extraction process is not fully automatic. To some extent, the choices made by the system must be both validated and confirmed by a human expert. Thus, an interactive environment must be used for performing such a task. One of the main contribution of our system lies in the way it guides the interactive process, focusing on the choice points and providing access to the information relevant to decision taking. System performance is controlled by a set of weighted heuristics that supplies the lack of algorithmic criteria or their vagueness in several crucial decision points. We will now summarize the most important characteristics of our system: • An underlying methodology for semantic extraction from lexical sources has been developped taking into account the characteristics of LDB and the intented semantic features to be extracted. • The Environment has been conceived as a support for the Methodology. • The Environment allows both interactive and batch modes of performance. • Great attention has been paid to reusability. The design and implementation of the system has involved an intensive
Computers and The Humanities | 1998
Piek Vossen; Laura Bloksma; Antonietta Alonge; Elisabetta Marinai; Carol Peters; Irene Castellón; Antònia Marti; German Rigau
This paper describes how the Euro WordNet project established a maximum level of consensus in the interpretation of relations, without loosing the possibility of encoding language-specific lexicalizations. Problematic cases arise due to the fact that each site re-used different resources and because the core vocabulary of the wordnets show complex properties. Many of these cases are discussed with respect to language internal and equivalence relations. Possible solutions are given in the form of additional criteria.
Lecture Notes in Computer Science | 2002
Laura Alonso; Irene Castellón; Karina Gibert; Lluís Padró
The problem of capturing discourse structure for complex NLP tasks has often been addressed by exploiting surface clues that can yield a partial structure of discourse. Discourse Markers (DMs) are among the most popular of these clues because they are both highly informative of discourse structure and have a very low processing cost. However, they present two main problems: first, there is a general lack of consensus about their appropriate characterisation for NLP applications, and secondly, their potential as an unexpensive source of discourse knowledge is weakened by the fact that information associated to them is usually hand-encoded. In this paper we will show how a combination of clustering techniques provides empirical evidence for a characterisation of DMs. This data-driven methodology provides generalisations helpful for reducing the cost of encoding the information associated to DMs, while increasing consistency of their characterisation.
Digital Scholarship in the Humanities | 2016
Elisabet Comelles; Victoria Arranz; Irene Castellón
Machine translation (MT) has become increasingly important and popular in the past decade, leading to the development of MT evaluation metrics aiming at automatically assessing MT output. Most of these metrics use reference translations to compare systems output, therefore, they should not only detect MT errors but also be able to identify correct equivalent expressions so as not to penalize them when those are not displayed in the reference translations. With the aim of improving MT evaluation metrics a study has been carried out of a wide panorama of linguistic features and their implications. For that purpose a Spanish and an English corpora containing hypothesis and reference translations have been analysed from a linguistic point of view, so that common errors can be detected and positive equivalencies highlighted. This article focuses on this qualitative analysis describing the linguistic phenomena that should be considered when developing an automatic MT evaluation metric. The results of this analysis have been used to develop an automatic MT evaluation metric that takes into account different dimensions of language. A brief review of the metric and its evaluation are also provided.
Rla-revista De Linguistica Teorica Y Aplicada | 2012
Irene Castellón; Salvador Climent; Marta Coll-Florit; Marina Lloberes; German Rigau
El presente articulo detalla la metodologia y el desarrollo del proyecto de desambiguacion semantica de los nucleos argumentales de SenSem, un corpus equilibrado constituido por 100 oraciones para cada uno de los 250 verbos mas frecuentes del espanol. El resultado, unido a desarrollos anteriores del proyecto, es un corpus ricamente etiquetado con informacion sintactica y semantica, conectado a una base de datos que recoge la informacion pertinente para cada sentido verbal, por lo que el recurso resultante es adecuado para estudios empiricos centrados en el verbo. Como resultado del proceso se presenta asimismo un analisis critico de WordNet 1.6 del espanol como recurso de anotacion lexico-semantica de corpus y una guia de criterios de anotacion, ambos de utilidad para tareas similares de etiquetado con WordNet.
Corpus Linguistics and Linguistic Theory | 2018
Lara Gil-Vallejo; Marta Coll-Florit; Irene Castellón; Jordi Turmo
Abstract Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives, the corpus view and the psycholinguistic view, articulate similarity between verb senses in Spanish. Specifically, we compare the similarity between verb senses based on their argument structure, which is captured through semantic roles, with their similarity defined by word associations. We address the question of whether verb argument structure, which reflects the expression of the events, and word associations, which are related to the speakers’ organization of the mental lexicon, shape similarity between verbs in a congruent manner, a topic which has not been explored previously. While we find significant correlations between verb sense similarities obtained from these two approaches, our findings also highlight some discrepancies between them and the importance of the degree of abstraction of the corpus annotation and psycholinguistic representations.
conference of the european chapter of the association for computational linguistics | 2003
Antoni Oliver; Irene Castellón; Lluís Màrquez
This paper presents a methodology for the automatic acquisition of lexical and morpho-syntactic information from raw corpora. The system uses information about the inflectional morphology declared by rules and is based on the co-occurrence of different forms of the same paradigm in the corpus. A direct application of this methodology gives very poor precision rates due to rule interaction between paradigms. We present a rule analysis algorithm that solves this problem, giving quite better precision rates, although recall decreases dramatically. Finally, we investigate some techniques to raise the recall, achieving recall rates around 67% with a precision of 92%.