Jesús Vilares | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jesús Vilares is active.

Explore More

Publication

Featured researches published by Jesús Vilares.

Information Retrieval | 2009

Current research issues and trends in non-English Web searching

Fotis Lazarinis; Jesús Vilares; John Tait; Efthimis N. Efthimiadis

With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to make researchers aware of the existing problems in monolingual non-English Web retrieval by providing an overview of open issues. A significant number of papers are reviewed and the research issues investigated in these studies are categorized in order to identify the research questions and solutions proposed in these papers. Further research is proposed at the end of each section.

international conference on computational linguistics | 2001

Applying Productive Derivational Morphology to Term Indexing of Spanish Texts

Jesús Vilares; David Cabrero; Miguel A. Alonso

This paper deals with the application of natural language processing techniques to the field of information retrieval. To be precise, we propose the application of morphological families for single term conflation in order to reduce the linguistic variety of indexed documents written in Spanish. A system for automatic generation of morphological families by means of Productive Derivational Morphology is discussed. The main characteristics of this system are the use of a minimum of linguistic resources, a low computational cost, and the independence with respect to the indexing engine.

database and expert systems applications | 2002

Tokenization and proper noun recognition for information retrieval

F.M. Barcala; Jesús Vilares; Miguel A. Alonso; Jorge Graña; Manuel Vilares

In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show the results of several experiments performed in order to study the impact of the strategy chosen for the recognition of proper nouns.

Information Processing and Management | 2011

Managing misspelled queries in IR applications

Jesús Vilares; Manuel Vilares; Juan Otero

Our work concerns the design of robust information retrieval environments that can successfully handle queries containing misspelled words. Our aim is to perform a comparative analysis of the efficacy of two possible strategies that can be adopted. A first strategy involves those approaches based on correcting the misspelled query, thus requiring the integration of linguistic information in the system. This solution has been studied from complementary standpoints, according to whether contextual information of a linguistic nature is integrated in the process or not, the former implying a higher degree of complexity. A second strategy involves the use of character n-grams as the basic indexing unit, which guarantees the robustness of the information retrieval process whilst at the same time eliminating the need for a specific query correction stage. This is a knowledge-light and language-independent solution which requires no linguistic information for its application. Both strategies have been subjected to experimental testing, with Spanish being used as the case in point. This is a language which, unlike English, has a great variety of morphological processes, making it particularly sensitive to spelling errors. The results obtained demonstrate that stemming-based approaches are highly sensitive to misspelled queries, particularly with short queries. However, such a negative impact can be effectively reduced by the use of correction mechanisms during querying, particularly in the case of context-based correction, since more classical approaches introduce too much noise when query length is increased. On the other hand, our n-gram based strategy shows a remarkable robustness, with average performance losses appreciably smaller than those for stemming.

Information Processing and Management | 2008

Extraction of complex index terms in non-English IR: A shallow parsing based approach

Jesús Vilares; Miguel A. Alonso; Manuel Vilares

The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.

database and expert systems applications | 2004

Phrase Similarity through the Edit Distance

Manuel Vilares; Francisco J. Ribadas; Jesús Vilares

This work intends to capture the concept of similarity between phrases. The algorithm is based on a dynamic programming approach integrating both the edit distance between parse trees and single-term similarity. Our work stresses the use of the underlying grammatical structure, which serves as a guide in the computation of semantic similarity between words. This proposal allows us to obtain a more accurate notion of semantic proximity at sentence level, without increasing the complexity of the pattern-matching algorithm on which it is based.

database and expert systems applications | 2004

Morphological and Syntactic Processing for Text Retrieval

Jesús Vilares; Miguel A. Alonso; Manuel Vilares

This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic dependencies detected by the parser are evaluated. Though this article focuses on Spanish, this approach is extensible to other languages by simply adapting the grammar used by the parser.

database and expert systems applications | 2001

Towards the Development of Heuristics for Automatic Query Expansion

Jesús Vilares; Manuel Vilares Ferro; Miguel A. Alonso

In this paper we study the performance of linguistically motivated conflation techniques for Information Retrieval in Spanish. In particular, we have studied the application of productive derivational morphology for single word term conflation and the extraction of syntactic dependency pairs for multi-word term conflation. These techniques have been tested on several search engines implementing different indexing models. The aim of this study is to find the strong and weak points of each technique in order to develop heuristics for automatic query expansion.

Information Processing and Management | 2016

Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval

Jesús Vilares; Miguel A. Alonso; Yerai Doval; Manuel Vilares

We study the effects of misspelled queries on the performance of CLIR systems.Word-based approaches (as both indexing and translation units) are highly sensitive to the presence of misspellings.The use of correction mechanisms can significantly reduce their negative effects.Classical techniques are suitable for shorter queries while context-based corrections are suitable for longer queries.Our approach based on character n-grams (as both indexing and translation units) shows remarkable strength. In contrast with their monolingual counterparts, little attention has been paid to the effects that misspelled queries have on the performance of Cross-Language Information Retrieval (CLIR) systems. The present work makes a first attempt to fill this gap by extending our previous work on monolingual retrieval in order to study the impact that the progressive addition of misspellings to input queries has, this time, on the output of CLIR systems. Two approaches for dealing with this problem are analyzed in this paper. Firstly, the use of automatic spelling correction techniques for which, in turn, we consider two algorithms: the first one for the correction of isolated words and the second one for a correction based on the linguistic context of the misspelled word. The second approach to be studied is the use of character n-grams both as index terms and translation units, seeking to take advantage of their inherent robustness and language-independence. All these approaches have been tested on a from-Spanish-to-English CLIR system, that is, Spanish queries on English documents. Real, user-generated spelling errors have been used under a methodology that allows us to study the effectiveness of the different approaches to be tested and their behavior when confronted with different error rates. The results obtained show the great sensitiveness of classic word-based approaches to misspelled queries, although spelling correction techniques can mitigate such negative effects. On the other hand, the use of character n-grams provides great robustness against misspellings.

string processing and information retrieval | 2004

Dealing with Syntactic Variation Through a Locality-Based Approach

Jesús Vilares; Miguel A. Alonso

To date, attempts for applying syntactic information in the document-based retrieval model dominant have led to little practical improvement, mainly due to the problems associated with the integration of this kind of information into the model. In this article we propose the use of a locality-based retrieval model for reranking, which deals with syntactic linguistic variation through similarity measures based on the distance between words. We study two approaches whose effectiveness has been evaluated on the CLEF corpus of Spanish documents.

Explore More