Maria Antònia Martí
University of Barcelona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maria Antònia Martí.
conference on computational natural language learning | 2009
Jan Hajiċ; Massimiliano Ciaramita; Richard Johansson; Daisuke Kawahara; Maria Antònia Martí; Lluís Màrquez; Adam Meyers; Joakim Nivre; Sebastian Padó; Jan Štėpánek; Pavel Straňák; Mihai Surdeanu; Nianwen Xue; Yi Zhang
For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syntactic and semantic dependencies in multiple languages. This shared task combines the shared tasks of the previous five years under a unique dependency-based formalism similar to the 2008 task. In this paper, we define the shared task, describe how the data sets were created and show their quantitative properties, report the results and summarize the approaches of the participating systems.
Computational Linguistics | 2013
Alberto Barrón-Cedeño; Marta Vila; Maria Antònia Martí; Paolo Rosso
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation.The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.
Computers and The Humanities | 1998
Antonietta Alonge; Nicoletta Calzolari; Piek Vossen; Laura Bloksma; Irene Castellón; Maria Antònia Martí; Wim Peters
In this paper the linguistic design of the database under construction within the EuroWordNet project is described. This is mainly structured along the same lines as the Princeton WordNet, although some changes have been made to the WordNet overall design due to both theoretical and practical reasons. The most important reasons for such changes are the multilinguality of the EuroWordNet database and the fact that it is intended to be used in Language Engineering applications. Thus, i) some relations have been added to those identified in WordNet; ii) some labels have been identified which can be added to the relations in order to make their implications more explicit and precise; iii) some relations, already present in the WordNet design, have been modified in order to specify their role more clearly.
meeting of the association for computational linguistics | 2007
Lluís Màrquez; Lluis Villarejo; Maria Antònia Martí; Mariona Taulé
In this paper we describe SemEval-2007 task number 9 (Multilevel Semantic Annotation of Catalan and Spanish). In this task, we aim at evaluating and comparing automatic systems for the annotation of several semantic linguistic levels for Catalan and Spanish. Three semantic levels are considered: noun sense disambiguation, named entity recognition, and semantic role labeling.
international conference natural language processing | 2006
Montserrat Civit; Maria Antònia Martí; Núria Bufí
In this paper we present the conversion of two treebanks (Cat3LB for Catalan, and Cast3LB for Spanish) from its original constituent format into dependencies. The process has been done automatically but by manually writing the head and the function table. The process has also been used to improve the quality of the first annotation and to modifiy the annotation for further extensions of the treebanks. Treebanks in both formats are freely available for research purposes.
conference on applied natural language processing | 1992
Alicia Agent; Irene Castellón; Maria Antònia Martí; German Rigau; Francese Ribas; Horaeio Rodriguez; Mariona Taulé; Felisa Verdejo
Knowledge Acquisition constitutes a main problem as regards the development of real Knowledge-based systems. This problem has been dealt with in a variety of ways. One of the most promising paradigms is based on the use of already existing sources in order to extract knowledge from them semiautomatically which will then be used in Knowledge-based applications. The Acquilex Project, within which we are working, follows this paradigm. The basic aim of Acquilex is the development of techniques and methods in order to use Machine Readable Dictionaries (MRD) * for building lexical components for Natural Language Processing Systems. SEISD (Sistema de Extracci6n de Informaci6n Semfintica de Diccionarios) is an environment for extracting semantic information from MRDs [Agent et al. 91b]. The system takes as its input a Lexical Database (LDB) where all the information contained in the MRD has been stored in an structured format. The extraction process is not fully automatic. To some extent, the choices made by the system must be both validated and confirmed by a human expert. Thus, an interactive environment must be used for performing such a task. One of the main contribution of our system lies in the way it guides the interactive process, focusing on the choice points and providing access to the information relevant to decision taking. System performance is controlled by a set of weighted heuristics that supplies the lack of algorithmic criteria or their vagueness in several crucial decision points. We will now summarize the most important characteristics of our system: • An underlying methodology for semantic extraction from lexical sources has been developped taking into account the characteristics of LDB and the intented semantic features to be extracted. • The Environment has been conceived as a support for the Methodology. • The Environment allows both interactive and batch modes of performance. • Great attention has been paid to reusability. The design and implementation of the system has involved an intensive
conference on intelligent text processing and computational linguistics | 2004
Iulia Nica; Maria Antònia Martí; Andrés Montoyo; Sonia Vázquez
In this paper we propose a mixed method for Word Sense Disambiguation, which combines lexical knowledge from EuroWordNet with corpora. The method tries to give a partial solution to the problem of the gap between lexicon and corpus by means of the approximation of the corpus to the lexicon. On the basis of the interaction that holds in natural language between the syntagmatic and the paradigmatic axes, we extract from corpus implicit information of paradigmatic type. On the information thus obtained we work with the information, also paradigmatic, contained in EWN. We evaluate the method and interpret the results.
applications of natural language to data bases | 2004
Iulia Nica; Andrés Montoyo; Sonia Vázquez; Maria Antònia Martí
The increasing flow of information requires advanced free text filtering. An important part of this task consists in eliminating word occurrences with an inappropriate sense, which corresponds to a Word Sense Disambiguation operation. In this paper we propose a completely automatic WSD method for Spanish – restricted to nouns – to be used as a module in a Natural Language Processing system for unlimited text. We call it the Commutative Test. This method exploits an adaptation of EuroWordNet, Sense Discriminators, that implicitly keeps all lexical-semantic relations of its nominal hierarchy. The only requirement is the availability of a large corpus and a part-of-speech tagger, without any need of previous sense-tagging. An evaluation of the method has been done on the Senseval test corpus. The method can be easily adapted to other languages that dispose of a corpus, a WordNet component and a part-of-speech tagger.
international conference natural language processing | 2004
Lluís Màrquez; Mariona Taulé; Lluís Padró; Luis Villarejo; Maria Antònia Martí
Word Sense Disambiguation (WSD) systems are usually evaluated by comparing their absolute performance, in a fixed experimental setting, to other alternative algorithms and methods. However, little attention has been paid to analyze the lexical resources and the corpora defining the experimental settings and their possible interactions with the overall results obtained. In this paper we present some experiments supporting the hypothesis that the quality of lexical resources used for tagging the training corpora of WSD systems partly determines the quality of the results. In order to verify this initial hypothesis we have developed two kinds of experiments. At the linguistic level, we have tested the quality of lexical resources in terms of the annotators’ agreement degree. From the computational point of view, we have evaluated how those different lexical resources affect the accuracy of the resulting WSD classifiers. We have carried out these experiments using three different lexical resources as sense inventories and a fixed WSD system based on Support Vector Machines.
language resources and evaluation | 2008
Mariona Taulé; Maria Antònia Martí; Marta Recasens