Juan Otero
University of Vigo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Juan Otero.
Information Processing and Management | 2011
Jesús Vilares; Manuel Vilares; Juan Otero
Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted. A first strategy involves those approaches based on correcting the misspelled query, thus requiring the integration of linguistic information in the system. This solution has been studied from complementary standpoints, according to whether contextual information of a linguistic nature is integrated in the process or not, the former implying a higher degree of complexity. A second strategy involves the use of character n-grams as the basic indexing unit, which guarantees the robustness of the information retrieval process whilst at the same time eliminating the need for a specific query correction stage. This is a knowledge-light and language-independent solution which requires no linguistic information for its application. Both strategies have been subjected to experimental testing, with Spanish being used as the case in point. This is a language which, unlike English, has a great variety of morphological processes, making it particularly sensitive to spelling errors. The results obtained demonstrate that stemming-based approaches are highly sensitive to misspelled queries, particularly with short queries. However, such a negative impact can be effectively reduced by the use of correction mechanisms during querying, particularly in the case of context-based correction, since more classical approaches introduce too much noise when query length is increased. On the other hand, our n-gram based strategy shows a remarkable robustness, with average performance losses appreciably smaller than those for stemming.
computer aided systems theory | 2007
Juan Otero; Jorge Graña; Manuel Vilares
Spelling correction is commonly a critical task for a variety of NLP tools. Some systems assist users by offering a set of possible corrections for a given misspelt word. An automatic spelling correction system would be able to choose only one or, at least, to rank them according to a certain criterion. We present a dynamic framework which allows us to combine spelling correction and Part-of-Speech tagging tasks in an efficient way. The result is a system capable of ranking the set of possible corrections taking the context of the erroneous words into account.
string processing and information retrieval | 2004
Manuel Vilares; Juan Otero; Jorge Graña
A major issue when defining the efficiency of a spelling corrector is how far we need to examine the input string to validate the repairs. We claim that regional techniques provide a performance and quality comparable to that attained by global criteria, with a significant saving in time and space.A major issue when defining the efficiency of a spelling corrector is how far we need to examine the input string to validate the repairs. We claim that regional techniques provide a performance and quality comparable to that attained by global criteria, with a significant saving in time and space.
international conference on implementation and application of automata | 2004
Manuel Vilares; Juan Otero; Jorge Graña
We describe an algorithm to deal with error repair over finite-state architectures. Such a technique is of interest in spelling correction as well as approximate string matching in a variety of applications related to natural language processing, such as information extraction/recovery or answer searching, where error-tolerant recognition allows misspelled input words to be integrated in the computational process. Our proposal relies on a regional least-cost repair strategy, dynamically gathering all relevant information in the context of the error location. The system guarantees asymptotic equivalence with global repair strategies.
international conference natural language processing | 2004
Manuel Vilares; Juan Otero; Fco. Mario Barcala; E. Domı́nguez
We describe a proposal on spelling correction intended to be applied on Galician, a Romance language. Our aim is to put into evidence the flexibility of a novelty technique that provides a quality equivalent to global strategies, but with a significantly minor computational cost. To do it, we take advantage of the grammatical background present in the recognizer, which allows us to dynamically gather information to the right and to the left of the point at which the recognition halts in a word, as long as this information could be considered as relevant for the repair process. The experimental tests prove the validity of our approach in relation to previous ones, focusing on both performance and costs.
improving non english web searching | 2008
Juan Otero; Jesús Vilares; Manuel Vilares Ferro
In this paper, we propose and evaluate two different alternatives to deal with degraded queries on Spanish IR applications. The first one is an n-gram-based strategy which has no dependence on the degree of available linguistic knowledge. On the other hand, we propose two spelling correction techniques, one of which has a strong dependence on a stochastic model that must be previously built from a POS-tagged corpus. In order to study their validity, a testing framework has been formally designed and applied on both approaches.
ibero american conference on ai | 2008
Juan Otero; Jesús Vilares; Manuel Vilares
Our work relies on the design and evaluation of experimental information retrieval systems able to cope with textual misspellings in queries. In contrast to previous proposals, commonly based on the consideration of spelling correction strategies and a word language model, we also report on the use of character n-grams as indexing support.
international conference on implementation and application of automata | 2005
Manuel Vilares; Juan Otero; Jesús Vilares
The paper introduces a robust spelling correction technique to deal with ill-formed input strings, including unknown parts of unknown length. In contrast to previous works, we derive profit from a finer dynamic programming construction, which takes advantage of the underlying grammatical structure, leading to an improved computational behavior and error repair quality. The formal description applies a deductive approach in order to simplify this task, separating it from the interpretation strategy, and including cut-off facilities.
international conference on computational linguistics | 2005
Manuel Vilares; Juan Otero; Jorge Graña
We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finite-state architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind.We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind.
computer aided systems theory | 2005
Manuel Vilares; Juan Otero; Jorge Graña
We describe a novel approach to spelling correction applied on technical documents, a task that requires a number of especific properties such as efficiency, safety and maintenance. In opposite to previous works, we explore the region close to the point at which the recognition halts, gathering all relevant information for the repair process in order to avoid the phenomenom of errors in cascade. Our approach seems to reach the same quality provided by the most performance classic techniques, but with a significant reduction on both time and space costs.We describe a novel approach to spelling correction applied on technical documents, a task that requires a number of especific properties such as efficiency, safety and maintenance. In opposite to previous works, we explore the region close to the point at which the recognition halts, gathering all relevant information for the repair process in order to avoid the phenomenom of errors in cascade. Our approach seems to reach the same quality provided by the most performance classic techniques, but with a significant reduction on both time and space costs.