Juan Otero | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juan Otero is active.

Explore More

Publication

Featured researches published by Juan Otero.

Information Processing and Management | 2011

Managing misspelled queries in IR applications

Jesús Vilares; Manuel Vilares; Juan Otero

Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted. A first strategy involves those approaches based on correcting the misspelled query, thus requiring the integration of linguistic information in the system. This solution has been studied from complementary standpoints, according to whether contextual information of a linguistic nature is integrated in the process or not, the former implying a higher degree of complexity. A second strategy involves the use of character n-grams as the basic indexing unit, which guarantees the robustness of the information retrieval process whilst at the same time eliminating the need for a specific query correction stage. This is a knowledge-light and language-independent solution which requires no linguistic information for its application. Both strategies have been subjected to experimental testing, with Spanish being used as the case in point. This is a language which, unlike English, has a great variety of morphological processes, making it particularly sensitive to spelling errors. The results obtained demonstrate that stemming-based approaches are highly sensitive to misspelled queries, particularly with short queries. However, such a negative impact can be effectively reduced by the use of correction mechanisms during querying, particularly in the case of context-based correction, since more classical approaches introduce too much noise when query length is increased. On the other hand, our n-gram based strategy shows a remarkable robustness, with average performance losses appreciably smaller than those for stemming.

computer aided systems theory | 2007

Contextual spelling correction

Juan Otero; Jorge Graña; Manuel Vilares

Spelling correction is commonly a critical task for a variety of NLP tools. Some systems assist users by offering a set of possible corrections for a given misspelt word. An automatic spelling correction system would be able to choose only one or, at least, to rank them according to a certain criterion. We present a dynamic framework which allows us to combine spelling correction and Part-of-Speech tagging tasks in an efficient way. The result is a system capable of ranking the set of possible corrections taking the context of the erroneous words into account.

string processing and information retrieval | 2004

On Asymptotic Finite-State Error Repair

Manuel Vilares; Juan Otero; Jorge Graña

A major issue when defining the efficiency of a spelling corrector is how far we need to examine the input string to validate the repairs. We claim that regional techniques provide a performance and quality comparable to that attained by global criteria, with a significant saving in time and space.A major issue when defining the efficiency of a spelling corrector is how far we need to examine the input string to validate the repairs. We claim that regional techniques provide a performance and quality comparable to that attained by global criteria, with a significant saving in time and space.

international conference on implementation and application of automata | 2004

Regional finite-state error repair

Manuel Vilares; Juan Otero; Jorge Graña

We describe an algorithm to deal with error repair over finite-state architectures. Such a technique is of interest in spelling correction as well as approximate string matching in a variety of applications related to natural language processing, such as information extraction/recovery or answer searching, where error-tolerant recognition allows misspelled input words to be integrated in the computational process. Our proposal relies on a regional least-cost repair strategy, dynamically gathering all relevant information in the context of the error location. The system guarantees asymptotic equivalence with global repair strategies.

international conference natural language processing | 2004

Automatic Spelling Correction in Galician

Manuel Vilares; Juan Otero; Fco. Mario Barcala; E. Domı́nguez

We describe a proposal on spelling correction intended to be applied on Galician, a Romance language. Our aim is to put into evidence the flexibility of a novelty technique that provides a quality equivalent to global strategies, but with a significantly minor computational cost. To do it, we take advantage of the grammatical background present in the recognizer, which allows us to dynamically gather information to the right and to the left of the point at which the recognition halts in a word, as long as this information could be considered as relevant for the repair process. The experimental tests prove the validity of our approach in relation to previous ones, focusing on both performance and costs.

improving non english web searching | 2008

Corrupted queries in Spanish text retrieval: error correction vs. N-Grams

Juan Otero; Jesús Vilares; Manuel Vilares Ferro

In this paper, we propose and evaluate two different alternatives to deal with degraded queries on Spanish IR applications. The first one is an n-gram-based strategy which has no dependence on the degree of available linguistic knowledge. On the other hand, we propose two spelling correction techniques, one of which has a strong dependence on a stochastic model that must be previously built from a POS-tagged corpus. In order to study their validity, a testing framework has been formally designed and applied on both approaches.

ibero american conference on ai | 2008

Text Retrieval through Corrupted Queries

Juan Otero; Jesús Vilares; Manuel Vilares

Our work relies on the design and evaluation of experimental information retrieval systems able to cope with textual misspellings in queries. In contrast to previous proposals, commonly based on the consideration of spelling correction strategies and a word language model, we also report on the use of character n-grams as indexing support.

international conference on implementation and application of automata | 2005

Robust spelling correction

Manuel Vilares; Juan Otero; Jesús Vilares

The paper introduces a robust spelling correction technique to deal with ill-formed input strings, including unknown parts of unknown length. In contrast to previous works, we derive profit from a finer dynamic programming construction, which takes advantage of the underlying grammatical structure, leading to an improved computational behavior and error repair quality. The formal description applies a deductive approach in order to simplify this task, separating it from the interpretation strategy, and including cut-off facilities.

international conference on computational linguistics | 2005

Regional versus global finite-state error repair

Manuel Vilares; Juan Otero; Jorge Graña

We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finite-state architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind.We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind.

computer aided systems theory | 2005

Spelling correction on technical documents

Manuel Vilares; Juan Otero; Jorge Graña

We describe a novel approach to spelling correction applied on technical documents, a task that requires a number of especific properties such as efficiency, safety and maintenance. In opposite to previous works, we explore the region close to the point at which the recognition halts, gathering all relevant information for the repair process in order to avoid the phenomenom of errors in cascade. Our approach seems to reach the same quality provided by the most performance classic techniques, but with a significant reduction on both time and space costs.We describe a novel approach to spelling correction applied on technical documents, a task that requires a number of especific properties such as efficiency, safety and maintenance. In opposite to previous works, we explore the region close to the point at which the recognition halts, gathering all relevant information for the repair process in order to avoid the phenomenom of errors in cascade. Our approach seems to reach the same quality provided by the most performance classic techniques, but with a significant reduction on both time and space costs.

Explore More