Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mariona Taulé is active.

Publication


Featured researches published by Mariona Taulé.


meeting of the association for computational linguistics | 2007

SemEval-2007 Task 09: Multilevel Semantic Annotation of Catalan and Spanish

Lluís Màrquez; Lluis Villarejo; Maria Antònia Martí; Mariona Taulé

In this paper we describe SemEval-2007 task number 9 (Multilevel Semantic Annotation of Catalan and Spanish). In this task, we aim at evaluating and comparing automatic systems for the annotation of several semantic linguistic levels for Catalan and Spanish. Three semantic levels are considered: noun sense disambiguation, named entity recognition, and semantic role labeling.


conference on applied natural language processing | 1992

SEISD: An environment for extraction of Semantic Information from on-line dictionaries

Alicia Agent; Irene Castellón; Maria Antònia Martí; German Rigau; Francese Ribas; Horaeio Rodriguez; Mariona Taulé; Felisa Verdejo

Knowledge Acquisition constitutes a main problem as regards the development of real Knowledge-based systems. This problem has been dealt with in a variety of ways. One of the most promising paradigms is based on the use of already existing sources in order to extract knowledge from them semiautomatically which will then be used in Knowledge-based applications. The Acquilex Project, within which we are working, follows this paradigm. The basic aim of Acquilex is the development of techniques and methods in order to use Machine Readable Dictionaries (MRD) * for building lexical components for Natural Language Processing Systems. SEISD (Sistema de Extracci6n de Informaci6n Semfintica de Diccionarios) is an environment for extracting semantic information from MRDs [Agent et al. 91b]. The system takes as its input a Lexical Database (LDB) where all the information contained in the MRD has been stored in an structured format. The extraction process is not fully automatic. To some extent, the choices made by the system must be both validated and confirmed by a human expert. Thus, an interactive environment must be used for performing such a task. One of the main contribution of our system lies in the way it guides the interactive process, focusing on the choice points and providing access to the information relevant to decision taking. System performance is controlled by a set of weighted heuristics that supplies the lack of algorithmic criteria or their vagueness in several crucial decision points. We will now summarize the most important characteristics of our system: • An underlying methodology for semantic extraction from lexical sources has been developped taking into account the characteristics of LDB and the intented semantic features to be extracted. • The Environment has been conceived as a support for the Methodology. • The Environment allows both interactive and batch modes of performance. • Great attention has been paid to reusability. The design and implementation of the system has involved an intensive


Archive | 2009

First-mention definites:More than exceptional cases

Marta Recasens; M. Antònia Martí; Mariona Taulé

Traditional linguistic theories of definiteness have characterized the definite article in terms of uniqueness or familiarity, inclusiveness or identifiability. From this perspective, anaphoric uses of definite noun phrases (NPs) are seen as the paradigm case, while non-anaphoric or first-mention uses are treated as exceptions deserving no special attention. The main weaknesses of such approach are its tendency to be based on constructed examples and its focus on one single language, English. When natural data is taken into account, classical treatments of definites collapse.


language resources and evaluation | 2012

Annotating the argument structure of deverbal nominalizations in Spanish

Aina Peris; Mariona Taulé

Over recent years, there has been a growing interest in the computational treatment of nominalized Noun Phrases due to the rich semantic information they contain. These Noun Phrases can be understood as verbal paraphrases and, just like them, they can also denote argument and thematic-role relations. This paper presents the methodology followed to annotate the argument structure of deverbal nominalizations in the Spanish AnCora-Es corpus. We focus on the automated annotation process that is mostly based on the semantic information specified in a verbal lexicon but also on the syntactic and semantic information annotated in the corpus. The heuristic rules that make use of this information rely on linguistic assumptions that are also evaluated as we evaluate the reliability of the automated process. The automated annotation was manually checked in order to ensure the accuracy of the final resource. We demonstrate its feasibility (77% F-measure) and show that it facilitates corpus annotation, which is always a time-consuming and costly process. The result is the enrichment of the AnCora-Es corpus with the argument structure and thematic roles of deverbal nominalizations. It is the first Spanish corpus with this kind of information that is freely available.


cross language evaluation forum | 2015

Language Variety Identification Using Distributed Representations of Words and Documents

Marc Franco-Salvador; Francisco Rangel; Paolo Rosso; Mariona Taulé; M. Antònia Martít

Language variety identification is an author profiling subtask which aims to detect lexical and semantic variations in order to classify different varieties of the same language. In this work we focus on the use of distributed representations of words and documents using the continuous Skip-gram model. We compare this model with three recent approaches: Information Gain Word-Patterns, TF-IDF graphs and Emotion-labeled Graphs, in addition to several baselines. We evaluate the models introducing the Hispablogs dataset, a new collection of Spanish blogs from five different countries: Argentina, Chile, Mexico, Peru and Spain. Experimental results show state-of-the-art performance in language variety identification. In addition, our empirical analysis provides interesting insights on the use of the evaluated approaches.


Archive | 2016

Linguistic Correlates of Text Quality from Childhood to Adulthood

Naymé Salas; Anna Llauradó; Cristina Castillo; Mariona Taulé; M. Antònia Martí

Holistic scoring of written texts is a most favored procedure to evaluate text quality in both the teaching and research of writing. However, the text properties that educators take into account to perform those evaluations have rarely been investigated. In this paper we examined the extent to which a series of linguistic markers obtained from written narrative texts contributed to explaining variation in the holistic scores assigned by independent raters. The written texts were produced by 80 participants divided into four age groups (9-, 12, 16-year olds, and adults), who were asked to write about the topic of a silent video showing conflicts at school. Linguistic markers were organized into three domains: syntactic complexity, cohesion, and vocabulary use. Our findings suggest that linguistic features are fundamental to perceptions of text quality in Spanish, though only a few text-based measures contributed significantly to the models for each age group. Educators took into account modality and genre constraints, and adjusted their criteria to the educational level of the writers.


international conference on web engineering | 2015

Spanish Treebank Annotation of Informal Non-standard Web Text

Mariona Taulé; M. Antònia Martí; Ann Bies; Montserrat Nofre; Aina Garí; Zhiyi Song; Stephanie M. Strassel; Joe Ellis

This paper presents the Latin American Spanish Discussion Forum Treebank (LAS-DisFo). This corpus consists of 50,291 words and 2,846 sentences that are part-of-speech tagged, lemmatized and syntactically annotated with constituents and functions. We describe how it was built and the methodology followed for its annotation, the annotation scheme and criteria applied for dealing with the most problematic phenomena commonly encountered in this kind of informal unedited web text. This is the first available Latin American Spanish corpus of non-standard language that has been morphologically and syntactically annotated. It is a valuable linguistic resource that can be used for the training and evaluation of parsers and PoS taggers.


international conference on computational linguistics | 2013

LIAR c : labeling implicit ARguments in spanish deverbal nominalizations

Aina Peris; Mariona Taulé; Horacio Rodríguez; Manuel Bertran Ibarz

This paper deals with the automatic identification and annotation of the implicit arguments of deverbal nominalizations in Spanish. We present the first version of the LIAR system focusing on its classifier component. We have built a supervised Machine Learning feature based model that uses a subset of AnCora-Es as a training corpus. We have built four different models and the overall F-Measure is 89.9%, which means an increase F-Measure performance approximately 35 points over the baseline (55%). However, a detailed analysis of the feature performance is still needed. Future work will focus on using LIAR to automatically annotate the implicit arguments in the whole AnCora-Es.


Computational Linguistics | 2012

Empirical methods for the study of denotation in nominalizations in spanish

Aina Peris; Mariona Taulé; Horacio Rodríguez

This article deals with deverbal nominalizations in Spanish; concretely, we focus on the denotative distinction between event and result nominalizations. The goals of this work is twofold: first, to detect the most relevant features for this denotative distinction; and, second, to build an automatic classification system of deverbal nominalizations according to their denotation. We have based our study on theoretical hypotheses dealing with this semantic distinction and we have analyzed them empirically by means of Machine Learning techniques which are the basis of the ADN-Classifier. This is the first tool that aims to automatically classify deverbal nominalizations in event, result, or underspecified denotation types in Spanish. The ADN-Classifier has helped us to quantitatively evaluate the validity of our claims regarding deverbal nominalizations. We set up a series of experiments in order to test the ADN-Classifier with different models and in different realistic scenarios depending on the knowledge resources and natural language processors available. The ADN-Classifier achieved good results (87.20% accuracy).


language resources and evaluation | 2018

SFU ReviewSP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of negation patterns

Salud María Jiménez-Zafra; Mariona Taulé; M. Teresa Martín-Valdivia; L. Alfonso Ureña-López; M. Antònia Martí

AbstractIn this paper, we present SFU ReviewSP-NEG, the first Spanish corpus annotated with negation with a wide coverage freely available. We describe the methodology applied in the annotation of the corpus including the tagset, the linguistic criteria and the inter-annotator agreement tests. We also include a complete typology of negation patterns in Spanish. This typology has the advantage that it is easy to express in terms of a tagset for corpus annotation: the types are clearly defined, which avoids ambiguity in the annotation process, and they provide wide coverage (i.e. they resolved all the cases occurring in the corpus). We use the SFU ReviewSP as a base in order to make the annotations. The corpus consists of 400 reviews, 221,866 words and 9455 sentences, out of which 3022 sentences contain at least one negation structure.

Collaboration


Dive into the Mariona Taulé's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aina Peris

University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Lluís Màrquez

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Horacio Rodríguez

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Paolo Rosso

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar

German Rigau

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Gerard Escudero

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge