Carlos Ramisch
Universidade Federal do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Carlos Ramisch.
language resources and evaluation | 2010
Helena Medeiros de Caseli; Carlos Ramisch; Maria das Graças Volpe Nunes; Aline Villavicencio
Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of effort has been directed to the task of automatically identifying them, with considerable success. In this paper, we propose an approach for the identification of MWEs in a multilingual context, as a by-product of a word alignment process, that not only deals with the identification of possible MWE candidates, but also associates some multiword expressions with semantics. The results obtained indicate the feasibility and low costs in terms of tools and resources demanded by this approach, which could, for example, facilitate and speed up lexicographic work.
processing of the portuguese language | 2010
Carlos Ramisch; Helena de Medeiros Caseli; Aline Villavicencio; André Machado; Maria José Bocorny Finatto
Considerable attention has been given to the problem of Multiword Expression (MWE) identification and treatment, for NLP tasks like parsing and generation, to improve the quality of results. Statistical methods have been often employed for MWE identification, as an inexpensive and language independent way of finding co-occurrence patterns. On the other hand, more linguistically motivated methods for identification, which employ information such as POS filters and lexical alignment between languages, can produce more targeted candidate lists. In this paper we propose a hybrid approach that combines the strenghts of different sources of information using a machine learning algorithm to produce more robust and precise results. Automatic evaluation on gold standards shows that the performance of our hybrid method is superior to the individual results of statistical and alignment-based MWE extraction approaches for Portuguese and for English. This method can be used to aid lexicographic work by providing a more targeted MWE candidate list.
conference on computational natural language learning | 2008
Carlos Ramisch; Aline Villavicencio; Leonardo F. S. Moura; Marco Idiart
This paper investigates, in a first stage, some methods for the automatic acquisition of verb-particle constructions (VPCs) taking into account their statistical properties and some regular patterns found in productive combinations of verbs and particles. Given the limited coverage provided by lexical resources, such as dictionaries, and the constantly growing number of VPCs, possible ways of automatically identifying them are crucial for any NLP task that requires some degree of semantic interpretation. In a second stage we also study whether the combination of statistical and linguistic properties can provide some indication of the degree of idiomaticity of a given VPC. The results obtained show that such combination can successfully be used to detect VPCs and distinguish idiomatic from compositional cases.
meeting of the association for computational linguistics | 2016
Silvio Cordeiro; Carlos Ramisch; Marco Idiart; Aline Villavicencio
Distributional semantic models (DSMs) are often evaluated on artificial similarity datasets containing single words or fully compositional phrases. We present a large-scale multilingual evaluation of DSMs for predicting the degree of semantic compositionality of nominal compounds on 4 datasets for English and French. We build a total of 816 DSMs and perform 2,856 evaluations using word2vec, GloVe, and PPMI-based models. In addition to the DSMs, we compare the impact of different parameters, such as level of corpus preprocessing, context window size and number of dimensions. The results obtained have a high correlation with human judgments, being comparable to or outperforming the state of the art for some datasets (Spearmans ρ=.82 for the Reddy dataset).
empirical methods in natural language processing | 2014
Muntsa Padró; Marco Idiart; Aline Villavicencio; Carlos Ramisch
Much attention has been given to the impact of informativeness and similarity measures on distributional thesauri. We investigate the effects of context filters on thesaurus quality and propose the use of cooccurrence frequency as a simple and inexpensive criterion. For evaluation, we measure thesaurus agreement with WordNet and performance in answering TOEFL-like questions. Results illustrate the sensitivity of distributional thesauri to filters.
north american chapter of the association for computational linguistics | 2015
Alexandre C. Rondon; Helena de Medeiros Caseli; Carlos Ramisch
This paper introduces NEMWEL, a system that performs Never-Ending MultiWord Expressions Learning. Instead of using a static corpus and classifier, NEMWEL applies supervised learning on automatically crawled news texts. Moreover, it uses its own results to periodically retrain the classifier, bootstrapping on its own results. In addition to a detailed description of the system’s architecture and its modules, we report the results of a manual evaluation. It shows that NEMWEL is capable of learning new expressions over time with improved precision.
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) | 2017
Manon Scholivet; Carlos Ramisch
We present a simple and efficient tagger capable of identifying highly ambiguous multiword expressions (MWEs) in French texts. It is based on conditional random fields (CRF), using local context information as features. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a tree-bank. Moreover, we study how well the CRF can take into account external information coming from a lexicon.
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) | 2017
Natalie Vargas; Carlos Ramisch; Helena de Medeiros Caseli
We propose a method for joint unsu-pervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability , association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE using a linear combination of these features. Preliminary experiments on light verb constructions show promising results.
north american chapter of the association for computational linguistics | 2016
Silvio Cordeiro; Carlos Ramisch; Aline Villavicencio
This paper presents our approach towards the SemEval-2016 Task 10 - Detecting Minimal Semantic Units and their Meanings. Systems are expected to provide a representation of lexical semantics by (1) segmenting tokens into words and multiword units and (2) providing a supersense tag for segments that function as nouns or verbs. Our pipeline rule-based system uses no external resources and was implemented using the mwetoolkit. First, we extract and filter known MWEs from the training corpus. Second, we group input tokens of the test corpus based on this lexicon, with special treatment for non-contiguous expressions. Third, we use an MWE-aware predominant-sense heuristic for supersense tagging. We obtain an F-score of 51.48% for MWE identification and 49.98% for supersense tagging.
meeting of the association for computational linguistics | 2016
Carlos Ramisch; Silvio Cordeiro; Aline Villavicencio
This paper analyzes datasets with numerical scores that quantify the semantic compositionality of MWEs. We present the results of our analysis of crowdsourced compositionality judgments for noun compounds in three languages. Our goals are to look at the characteristics of the annotations in different languages; to examine intrinsic quality measures for such data; and to measure the impact of filters proposed in the literature on these measures. The cross-lingual results suggest that greater agreement is found for the extremes in the compositionality scale, and that outlier annotation removal is more effective than outlier annotator removal.