Armando Suárez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Armando Suárez is active.

Explore More

Publication

Featured researches published by Armando Suárez.

Journal of Artificial Intelligence Research | 2005

Combining knowledge- and corpus-based word-sense-disambiguation methods

Andrés Montoyo; Armando Suárez; German Rigau; Manuel Palomar

In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We present two WSD methods based on two main methodological approaches in this research area: a knowledge-based method and a corpus-based method. Our hypothesis is that word-sense disambiguation requires several knowledge sources in order to solve the semantic ambiguity of the words. These sources can be of different kinds-- for example, syntagmatic, paradigmatic or statistical information. Our approach combines various sources of knowledge, through combinations of the two WSD methods mentioned above. Mainly, the paper concentrates on how to combine these methods and sources of information in order to achieve good results in the disambiguation. Finally, this paper presents a comprehensive study and experimental work on evaluation of the methods and their combinations.

data and knowledge engineering | 2007

Combining data-driven systems for improving Named Entity Recognition

Zornitsa Kozareva; Óscar Ferrández; Andrés Montoyo; Rafael Muñoz; Armando Suárez; Jaime Gómez

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. All these tasks greatly benefit from involving a Named Entity Recognizer (NER) in the preprocessing stage. This paper proposes a completely automatic NER system. The NER task involves not only the identification of proper names (Named Entities) in natural language text, but also their classification into a set of predefined categories, such as names of persons, organizations (companies, government organizations, committees, etc.), locations (cities, countries, rivers, etc.) and miscellaneous (movie titles, sport events, etc.). Throughout the paper, we examine the differences between language models learned by different data-driven classifiers confronted with the same NLP task, as well as ways to exploit these differences to yield a higher accuracy than the best individual classifier. Three machine learning classifiers (Hidden Markov Model, Maximum Entropy and Memory Based Learning) are trained on the same corpus in order to resolve the NE task. After comparison, their output is combined using voting strategies. A comprehensive study and experimental work on the evaluation of our system, as well as a comparison with other systems has been carried out within the framework of two specialized scientific competitions for NER, CoNLL-2002 and HAREM-2005. Finally, this paper describes the integration of our NER system in different NLP applications, in concrete Geographic Information Retrieval and Conceptual Modelling.

international conference on computational linguistics | 2002

A maximum entropy-based word sense disambiguation system

Armando Suárez; Manuel Palomar

In this paper, a supervised learning system of word sense disambiguation is presented. It is based on conditional maximum entropy models. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. Several types of features have been analyzed using the SENSEVAL-2 data for the Spanish lexical sample task. Such analysis shows that instead of training with the same kind of information for all words, each one is more effectively learned using a different set of features. This best-feature-selection is used to build some systems based on different maximum entropy classifiers, and a voting system helped by a knowledge-based method.

meeting of the association for computational linguistics | 2009

An Empirical Study on Class-Based Word Sense Disambiguation

Rubén Izquierdo; Armando Suárez; German Rigau

As empirically demonstrated by the last SensEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. One possible reason could be the use of inappropriate set of meanings. In fact, WordNet has been used as a de-facto standard repository of meanings. However, to our knowledge, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained class level. We suspect that selecting the appropriate level of abstraction could be on between both levels. We use a very simple method for deriving a small set of appropriate meanings using basic structural properties of WordNet. We also empirically demonstrate that this automatically derived set of meanings groups senses into an adequate level of abstraction in order to perform class-based Word Sense Disambiguation, allowing accuracy figures over 80%.

international conference natural language processing | 2005

Combining data-driven systems for improving named entity recognition

Zornitsa Kozareva; Óscar Ferrández; Andrés Montoyo; Rafael Muñoz; Armando Suárez

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. An important preprocessing tool of these tasks consists of name entities recognition, which corresponds to a Name Entity Recognition (NER) task. In this paper we propose a completely automatic NER which involves identification of proper names in texts, and classification into a set of predefined categories of interest as Person names, Organizations (companies, government organizations, committees, etc.) and Locations (cities, countries, rivers, etc). We examined the differences in language models learned by different data-driven systems performing the same NLP tasks and how they can be exploited to yield a higher accuracy than the best individual system. Three NE classifiers (Hidden Markov Models, Maximum Entropy and Memory-based learner) are trained on the same corpus data and after comparison their outputs are combined using voting strategy. Results are encouraging since 98.5% accuracy for recognition and 84.94% accuracy for classification of NE for Spanish language were achieved.

meeting of the association for computational linguistics | 2007

GPLSI: Word Coarse-grained Disambiguation aided by Basic Level Concepts

Rubén Izquierdo; Armando Suárez; German Rigau

We present a corpus-based supervised learning system for coarse-grained sense disambiguation. In addition to usual features for training in word sense disambiguation, our system also uses Base Level Concepts automatically obtained from WordNet. Base Level Concepts are some synsets that generalize a hyponymy sub-hierarchy, and provides an extra level of abstraction as well as relevant information about the context of a word to be disambiguated. Our experiments proved that using this type of features results on a significant improvement of precision. Our system has achieved almost 0.8 F1 (fifth place) in the coarse--grained English all-words task using a very simple set of features plus Base Level Concepts annotation.

mexican international conference on artificial intelligence | 2006

Spanish all-words semantic class disambiguation using Cast3LB corpus

Rubén Izquierdo-Beviá; Lorenza Moreno-Monteagudo; Borja Navarro; Armando Suárez

In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.

text speech and dialogue | 2004

Identifying Semantic Roles Using Maximum Entropy Models

Paloma Moreda; Manuel Fernández; Manuel Palomar; Armando Suárez

In this paper, a supervised learning method of semantic role labeling is presented. It is based on maximum entropy conditional probability models. This method acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. Several types of features have been analyzed for a few words selected from sections of the Wall Street Journal part of the Penn Treebank corpus.

international conference on computational linguistics | 2002

Feature Selection Analysis for Maximum Entropy-Based WSD

Armando Suárez; Manuel Palomar

Supervised learning on a corpus-based Word Sense Disambiguation (WSD) system uses a previously classified set of linguistic contexts. In order to perform the training of the system, it is usual to define a set of functions that inform of any linguistic feature in each example. It is usual to look for the same kind of information for each word too, at least on words of the same part-of-speech.In this paper, a study of feature selection in a supervised learning method of WSD based on corpus, Maximum Entropy conditional probability models, is presented. For a few words selected from the DSO corpus, the behaviour of several types of features has been analyzed in order to identify their contribution to gains in accuracy and to determine the influence of sense frequency in that corpus. This paper shows that not all words are better disambiguated with the same combination of features. Moreover, an improved definition of features in order to increase efficiency is presented as well.

Journal of Artificial Intelligence Research | 2015

Word vs. class-based word sense disambiguation

Rubén Izquierdo; Armando Suárez; German Rigau

As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last SensEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet has been used as a defacto standard repository of word meanings in most of these tasks. Thus, instead of using the word senses defined in WordNet, some approaches have derived semantic classes representing groups of word senses. However, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained semantic class level (also called SuperSenses). We suspect that an appropriate level of abstraction could be on between both levels. The contributions of this paper are manifold. First, we propose a simple method to automatically derive semantic classes at intermediate levels of abstraction covering all nominal and verbal Word-Net meanings. Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. Finally, we also demonstrate the robustness of our supervised semantic class-based WSD system when tested on out of domain corpus.

Explore More