German Rigau
University of the Basque Country
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by German Rigau.
international conference on computational linguistics | 1996
Eneko Agirre; German Rigau
This paper present a method for the resolution of lexical ambiguity of nouns and its automatic evaluation over the Brown Corpus. The method relies on the use of the wide-coverage noun taxonomy of WordNet and the notion of conceptual distance among concepts, captured by a Conceptual Density formula developed for this purpose. This fully automatic method requires no hand coding of lexical entries, hand tagging of text nor any kind of training process. The results of the experiments have been automatically evaluted against SemCor, the sense-tagged version of the Brown Corpus.
european conference on machine learning | 2000
Gerard Escudero; Lluís Màrquez; German Rigau
In this paper Schapire and Singers AdaBoost. MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense-tagged corpus available containing 192, 800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.
international conference on computational linguistics | 2014
Eneko Agirre; Carmen Banea; Claire Cardie; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Rada Mihalcea; German Rigau; Janyce Wiebe
In Semantic Textual Similarity, systems rate the degree of semantic equivalence between two text snippets. This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity. For the English subtask, we exposed the systems to a diversity of testing scenarios, by preparing additional OntoNotesWordNet sense mappings and news headlines, as well as introducing new genres, including image descriptions, DEFT discussion forums, DEFT newswire, and tweet-newswire headline mappings. For Spanish, since, to our knowledge, this is the first time that official evaluations are conducted, we used well-formed text, by featuring sentences extracted from encyclopedic content and newswire. The annotations for both tasks leveraged crowdsourcing. The Spanish subtask engaged 9 teams participating with 22 system runs, and the English subtask attracted 15 teams with 38 system runs.
meeting of the association for computational linguistics | 1997
German Rigau; Jordi Atserias; Eneko Agirre
This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone, it is our belief that full-fledged lexical ambiguity resolution should combine several information sources and techniques. The set of techniques have been applied in a combined way to disambiguate the genus terms of two machine-readable dictionaries (MRD), enabling us to construct complete taxonomies for Spanish and French. Texted accuracy is above 80% overall and 95% for two-way ambiguous genus terms, showing that texonomy building is not limited to structured dictionaries such as LDOCE.
meeting of the association for computational linguistics | 2000
Jordi Daudé; Lluís Padró; German Rigau
We present a robust approach for linking already existing lexical/semantic hierarchies. We used a constraint satisfaction algorithm (relaxation labeling) to select - among a set of candidates- the node in a target taxonomy that bests matches each node in a source taxonomy. In particular, we use it to map the nominal part of WordNet 1.5 onto WordNet 1.6, with a very high precision and a very low remaining ambiguity.
north american chapter of the association for computational linguistics | 2015
Eneko Agirre; Carmen Banea; Claire Cardie; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Iñigo Lopez-Gazpio; Montse Maritxalar; Rada Mihalcea; German Rigau; Larraitz Uria; Janyce Wiebe
In semantic textual similarity (STS), systems rate the degree of semantic equivalence between two text snippets. This year, the participants were challenged with new datasets in English and Spanish. The annotations for both subtasks leveraged crowdsourcing. The English subtask attracted 29 teams with 74 system runs, and the Spanish subtask engaged 7 teams participating with 16 system runs. In addition, this year we ran a pilot task on interpretable STS, where the systems needed to add an explanatory layer, that is, they had to align the chunks in the sentence pair, explicitly annotating the kind of relation and the score of the chunk pair. The train and test data were manually annotated by an expert, and included headline and image sentence pairs from previous years. 7 teams participated with 29 runs.
empirical methods in natural language processing | 2000
Gerard Escudero; Lluís Màrquez; German Rigau
This paper describes a set of experiments carried out to explore the domain dependence of alternative supervised Word Sense Disambiguation algorithms. The aim of the work is threefold: studying the performance of these algorithms when tested on a different corpus from that they were trained on; exploring their ability to tune to new domains, and demonstrating empirically that the Lazy-Boosting algorithm outperforms state-of-the-art supervised WSD algorithms in both previous situations.
Journal of Artificial Intelligence Research | 2005
Andrés Montoyo; Armando Suárez; German Rigau; Manuel Palomar
In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We present two WSD methods based on two main methodological approaches in this research area: a knowledge-based method and a corpus-based method. Our hypothesis is that word-sense disambiguation requires several knowledge sources in order to solve the semantic ambiguity of the words. These sources can be of different kinds-- for example, syntagmatic, paradigmatic or statistical information. Our approach combines various sources of knowledge, through combinations of the two WSD methods mentioned above. Mainly, the paper concentrates on how to combine these methods and sources of information in order to achieve good results in the disambiguation. Finally, this paper presents a comprehensive study and experimental work on evaluation of the methods and their combinations.
north american chapter of the association for computational linguistics | 2016
Eneko Agirre; Carmen Banea; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Rada Mihalcea; German Rigau; Janyce Wiebe
Comunicacio presentada al 10th International Workshop on Semantic Evaluation (SemEval-2016), celebrat els dies 16 i 17 de juny de 2016 a San Diego, California.
empirical methods in natural language processing | 2006
Montse Cuadros; German Rigau
This paper presents an empirical evaluation of the quality of publicly available large-scale knowledge resources. The study includes a wide range of manually and automatically derived large-scale knowledge resources. In order to establish a fair and neutral comparison, the quality of each knowledge resource is indirectly evaluated using the same method on a Word Sense Disambiguation task. The evaluation framework selected has been the Senseval-3 English Lexical Sample Task. The study empirically demonstrates that automatically acquired knowledge resources surpass both in terms of precision and recall the knowledge resources derived manually, and that the combination of the knowledge contained in these resources is very close to the most frequent sense classifier. As far as we know, this is the first time that such a quality assessment has been performed showing a clear picture of the current state-of-the-art of publicly available wide coverage semantic resources.