Horacio Rodríguez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Horacio Rodríguez is active.

Explore More

Publication

Featured researches published by Horacio Rodríguez.

Computers and The Humanities | 1998

The top-down strategy for building EuroWordNet: vocabulary coverage, base concepts and top ontology

Horacio Rodríguez; Salvador Climent; Piek Vossen; Laura Bloksma; Wim Peters; Antonietta Alonge; Francesca Bertagna; Adriana Roventini

This paper describes two fundamental aspects in the process of building of the EuroWordNet database. In EuroWordNet we have chosen for a flexible design in which local wordnets are built relatively independently as language-specific structures, which are linked to an Inter-Lingual-Index (ILI). To ensure compatibility between the wordnets, a core set of common concepts has been defined that has to be covered by every language. Furthermore, these concepts have been classified via the ILI in terms of a Top Ontology of 63 fundamental semantic distinctions used in various semantic theories and paradigms. This paper first discusses the process leading to the definition of the set of Base Concepts, and the structure and the rationale of the Top Ontology.

european conference on machine learning | 2001

Improving Term Extraction by System Combination Using Boosting

Jordi Vivaldi; Lluís Màrquez; Horacio Rodríguez

Term extraction is the task of automatically detecting, from textual corpora, lexical units that designate concepts in thematically restricted domains (e.g. medicine). Current systems for term extraction integrate linguistic and statistical cues to perform the detection of terms. The best results have been obtained when some kind of combination of simple base term extractors is performed [14]. In this paper it is shown that this combination can be further improved by posing an additional learning problem of how to find the best combination of base term extractors. Empirical results, using AdaBoost in the metalearning step, show that the ensemble constructed surpasses the performance of all individual extractors and simple voting schemes, obtaining significantly better accuracy figures at all levels of recall.

Machine Translation | 1995

Acquisition of lexical translation relations from MRDS

Ann Gopestake; Ted Briscoe; Piek Vossen; Alicia Ageno; Irene Castellón; Francesc Ribas; German Rigau; Horacio Rodríguez; Anna Samiotou

In this paper we present a methodology for extracting information about lexical translation equivalences from the machine readable versions of conventional dictionaries (MRDs), and describe a series of experiments on semi-automatic construction of a linked multilinguallexical knowledge base for English, Dutch, and Spanish. We discuss the advantages and limitations of using MRDs that this has revealed, and some strategies we have developed to cover gaps where no direct translation can be found.

meeting of the association for computational linguistics | 1998

Building Accurate Semantic Taxonomies from Monolingual MRDs

German Rigau; Horacio Rodríguez; Eneko Agirre

This paper presents a method that conbines a set of unsupervised algorithms in order to accurately build large taxonomies from any machine-readable dictionary (MRD). Our aim is to profit from conventional MRDs, with no explicit semantic coding. We propose a system that 1) performs fully automatic extraction of taxonomic links from MRD entries and 2) ranks the extracted relations in a way that selective manual refinement is allowed. Tested accuracy can reach around 100% depending on the degree of coverage selected, showing that taxonomy building is not limited to structured dictionaries such as LDOCE.

european conference on machine learning | 1998

Part-of-Speech Tagging Using Decision Trees

Lluís Màrquez; Horacio Rodríguez

We have applied inductive learning of statistical decision trees to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). Previous work showed that the acquired language models are independent enough to be easily incorporated, as a statistical core of rules, in any flexible tagger. They are also complete enough to be directly used as sets of POS disambiguation rules. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. In this paper we basically address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. In addition we also face the problem of dealing with unknown words under the same conditions of lacking training examples. In this case some comparative results and comments about close related work are reported. This research has been partially funded by the Spanish Research Department (CI-CYTs ITEM project TIC96-1243-C03-02), by the EU Commission (EuroWordNet LE4003) and by the Catalan Research Department (CIRITs quality research group 1995SGR 00566). All tags appearing in the paper are from the Penn Treebank tag set. They are described in figure 2. For a complete description see for instance (Marcus et a1.93).

Machine Learning | 2000

A Machine Learning Approach to POS Tagging

Lluís Màrquez; Lluís Padró; Horacio Rodríguez

We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.

meeting of the association for computational linguistics | 2007

Support Vector Machines for Query-focused Summarization trained and evaluated on Pyramid data

María del Mar Moya Fuentes; Enrique Alfonseca; Horacio Rodríguez

This paper presents the use of Support Vector Machines (SVM) to detect relevant information to be included in a query-focused summary. Several SVMs are trained using information from pyramids of summary content units. Their performance is compared with the best performing systems in DUC-2005, using both ROUGE and autoPan, an automatic scoring method for pyramid evaluation.

international conference on machine learning | 2005

The “FAME” interactive space

Florian Metze; Petra Gieselmann; Hartwig Holzapfel; Tobias Kluge; Ivica Rogina; Alex Waibel; Matthias Wölfel; James L. Crowley; Patrick Reignier; Dominique Vaufreydaz; François Bérard; Bérangère Cohen; Joëlle Coutaz; Sylvie Rouillard; Victoria Arranz; Manuel Bertran; Horacio Rodríguez

This paper describes the FAME multi-modal demonstrator, which integrates multiple communication modes - vision, speech and object manipulation - by combining the physical and virtual worlds to provide support for multi-cultural or multi-lingual communication and problem solving. The major challenges are automatic perception of human actions and understanding of dialogs between people from different cultural or linguistic backgrounds. The system acts as an information butler, which demonstrates context awareness using computer vision, speech and dialog modeling. The integrated computer-enhanced human-to-human communication has been publicly demonstrated at the FORUM2004 in Barcelona and at IST2004 in The Hague. Specifically, the Interactive Space described features an Augmented Table for multi-cultural interaction, which allows several users at the same time to perform multi-modal, cross-lingual document retrieval of audio-visual documents previously recorded by an Intelligent Cameraman during a week-long seminar.

meeting of the association for computational linguistics | 2007

Machine Learning with Semantic-Based Distances Between Sentences for Textual Entailment

Daniel Ferrés; Horacio Rodríguez

This paper describes our experiments on Textual Entailment in the context of the Third Pascal Recognising Textual Entailment (RTE-3) Evaluation Challenge. Our system uses a Machine Learning approach with Support Vector Machines and AdaBoost to deal with the RTE challenge. We perform a lexical, syntactic, and semantic analysis of the entailment pairs. From this information we compute a set of semantic-based distances between sentences. The results look promising specially for the QA entailment task.

MLQA '06 Proceedings of the Workshop on Multilingual Question Answering | 2006

Experiments adapting an open-domain question answering system to the geographical domain using scope-based resources

Daniel Ferrés; Horacio Rodríguez

This paper describes an approach to adapt an existing multilingual Open-Domain Question Answering (ODQA) system for factoid questions to a Restricted Domain, the Geographical Domain. The adaptation of this ODQA system involved the modification of some components of our system such as: Question Processing, Passage Retrieval and Answer Extraction. The new system uses external resources like GNS Gazetteer for Named Entity (NE) Classification and Wikipedia or Google in order to obtain relevant documents for this domain. The system focuses on a Geographical Scope: given a region, or country, and a language we can semi-automatically obtain multilingual geographical resources (e.g. gazetteers, trigger words, groups of place names, etc.) of this scope. The system has been trained and evaluated for Spanish in the scope of the Spanish Geography. The evaluation reveals that the use of scope-based Geographical resources is a good approach to deal with multilingual Geographical Domain Question Answering.

Explore More