Rodrigo Paredes | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rodrigo Paredes is active.

Explore More

Publication

Featured researches published by Rodrigo Paredes.

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms | 2006

Practical construction of k -nearest neighbor graphs in metric spaces

Rodrigo Paredes; Edgar Chávez; Karina Figueroa; Gonzalo Navarro

Let U be a set of elements and d a distance function defined among them. Let NN k (u) be the k elements in U - {u} having the smallest distance to u. The k-nearest neighbor graph (kNNG) is a weighted directed graph G(U,E) such that E = {(u,v),v ∈ NN k (u)}. Several kNNG construction algorithms are known, but they are not suitable to general metric spaces. We present a general methodology to construct kNNGS that exploits several features of metric spaces. Experiments suggest that it yields costs of the form c 1 n 1.27 distance computations for low and medium dimensional spaces, and c 2 n 1.90 for high dimensional ones.

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms | 2006

On the least cost for proximity searching in metric spaces

Karina Figueroa; Edgar Chávez; Gonzalo Navarro; Rodrigo Paredes

Proximity searching consists in retrieving from a database those elements that are similar to a query. As the distance is usually expensive to compute, the goal is to use as few distance computations as possible to satisfy queries. Indexes use precomputed distances among database elements to speed up queries. As such, a baseline is AESA, which stores all the distances among database objects, but has been unbeaten in query performance for 20 years. In this paper we show that it is possible to improve upon AESA by using a radically different method to select promising database elements to compare against the query. Our experiments show improvements of up to 75% in document databases. We also explore the usage of our method as a probabilistic algorithm that may lose relevant answers. On a database of faces where any exact algorithm must examine virtually all elements, our probabilistic version obtains 85% of the correct answers by scanning only 10% of the database.

string processing and information retrieval | 2005

Using the k -nearest neighbor graph for proximity searching in metric spaces

Rodrigo Paredes; Edgar Chávez

Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an offline index to quickly satisfy online queries. The ultimate goal is to use as few distance computations as possible to satisfy queries, since the distance is considered expensive to compute. Proximity searching is central to several applications, ranging from multimedia indexing and querying to data compression and clustering. In this paper we present a new approach to solve the proximity searching problem. Our solution is based on indexing the database with the k-nearest neighbor graph (knng), which is a directed graph connecting each element to its k closest neighbors. We present two search algorithms for both range and nearest neighbor queries which use navigational and metrical features of the knng graph. We show that our approach is competitive against current ones. For instance, in the document metric space our nearest neighbor search algorithms perform 30% more distance evaluations than AESA using only a 0.25% of its space requirement. In the same space, the pivot-based technique is completely useless.

Journal of Discrete Algorithms | 2009

Solving similarity joins and range queries in metric spaces with the list of twin clusters

Rodrigo Paredes; Nora Reyes

The metric space model abstracts many proximity or similarity problems, where the most frequently considered primitives are range and k-nearest neighbor search, leaving out the similarity join, an extremely important primitive. In fact, despite the great attention that this primitive has received in traditional and even multidimensional databases, little has been done for general metric databases. We solve two variants of the similarity join problem: (1) range joins: Given two sets of objects and a distance threshold r, find all the object pairs (one from each set) at distance at most r; and (2) k-closest pair joins: Find the k closest object pairs (one from each set). For this sake, we devise a new metric index, coined List of Twin Clusters (LTC), which indexes both sets jointly, instead of the natural approach of indexing one or both sets independently. Finally, we show how to use the LTC in order to solve classical range queries. Our results show significant speedups over the basic quadratic-time naive alternative for both join variants, and that the LTC is competitive with the original list of clusters when solving range queries. Furthermore, we show that our technique has a great potential for improvements.

string processing and information retrieval | 2002

t-Spanners as a Data Structure for Metric Space Searching

Gonzalo Navarro; Rodrigo Paredes; Edgar Chávez

A t-spanner, a subgraph that approximates graph distances within a precision factor t, is a well known concept in graph theory. In this paper we use it in a novel way, namely as a data structure for searching metric spaces. The key idea is to consider the t-spanner as an approximation of the complete graph of distances among the objects, and use it as a compact device to simulate the large matrix of distances required by successful search algorithms like AESA [Vidal 1986]. The t-spanner provides a time-space tradeoff where full AESA is just one extreme. We show that the resulting algorithm is competitive against current approaches, e.g., 1.5 times the time cost of AESA using only 3.21% of its space requirement, in a metric space of strings; and 1.09 times the time cost of AESA using only 3.83 % of its space requirement, in a metric space of documents. We also show that t-spanners provide better space-time tradeoffs than classical alternatives such as pivot-based indexes. Furthermore, we show that the concept of t-spanners has potential for large improvements.

ACM Journal of Experimental Algorithms | 2009

Speeding up spatial approximation search in metric spaces

Karina Figueroa; Edgar Chávez; Gonzalo Navarro; Rodrigo Paredes

Proximity searching consists of retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speedup query processing. Among all the known indices, the baseline for performance for about 20 years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this article, we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESA-like algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60% of the distance evaluations of AESA in a database of documents, a very important and difficult real-life instance of the problem. For the probabilistic algorithm, we perform in a database of faces up to 40% of the comparisons made by the best alternative algorithm to retrieve the same percentage of the correct answer. Based on the empirical results, we conjecture that the new probabilistic AESA-like algorithms will become, as AESA had been for exact algorithms, a reference point establishing, in practice, a lower bound on how good a probabilistic proximity search algorithm can be.

similarity search and applications | 2010

Enlarging nodes to improve dynamic spatial approximation trees

Marcelo Barroso; Nora Reyes; Rodrigo Paredes

The metric space model allows abstracting many similarity search problems. Similarity search has multiple applications especially in the multimedia databases area. The idea is to index the database so as to accelerate similarity queries. Although there are several promising indices, few of them are dynamic, i.e., once created very few allow to perform insertions and deletions of elements at a reasonable cost. The Dynamic Spatial Approximation Trees (DSA--trees) have shown to be a suitable data structure for searching high dimensional metric spaces or queries with low selectivity (i.e., large radius), and are also completely dynamic. The performance of DSA--trees is directly related to the amount of backtracking in search time. To boost the performance in this data structure a sufficient condition is to maintain in the nodes elements close-to-each-other. In this work we propose to obtain a new data structure for searching in metric spaces, based on the DSA--trees, which holds its virtues and takes advantage of element clusters, which are present in many metric spaces, and can also make better use of available memory to improve searches. In fact, we use these element clusters to improve the spatial approximation.

similarity search and applications | 2013

List of Clustered Permutations for Proximity Searching

Karina Figueroa; Rodrigo Paredes

The permutation based algorithm has been proved unbeatable in high dimensional spaces, requiring O|i��| distance evaluations when solving similarity queries where i�� is the set of permutants; but needs n evaluations of the permutant distance to compute the order to review the metric dataset, requires On|i��| space, and does not take much benefit from low dimensionality. There have been several proposals to avoid the n computations of the permutant distance, however all of them lost precision. Inspired in the list of cluster, in this paper we group the permutations and establish a criterion to discard whole clusters according the permutation of their centers. As a consequence of our proposal, we now reduce not only the space of the index and the number of distance evaluations but also the cpu time required when comparing the permutations themselves. Also, we can use the permutations in low dimensions.

mexican conference on pattern recognition | 2014

An Effective Permutant Selection Heuristic for Proximity Searching in Metric Spaces

Karina Figueroa; Rodrigo Paredes

The permutation based index has shown to be very effective in medium and high dimensional metric spaces, even in difficult problems such as solving reverse k-nearest neighbor queries. Nevertheless, currently there is no study about which are the desirable features one can ask to a permutant set, or how to select good permutants. Similar to the case of pivots, our experimental results show that, compared with a randomly chosen set, a good permutant set yields to fast query response or to reduce the amount of space used by the index. In this paper, we start by characterizing permutants and studying their predictive power; then we propose an effective heuristic to select a good set of permutant candidates. We also show empirical evidence that supports our technique.

mexican conference on pattern recognition | 2011

Efficient group of permutants for proximity searching

Karina Figueroa Mora; Rodrigo Paredes; Roberto Rangel

Modeling proximity searching problems in a metric space allows one to approach many problems in different areas, e.g. pattern recognition, multimedia search, or clustering. Recently there was proposed the permutation based approach, a novel technique that is unbeatable in practice but difficult to compress. In this article we introduce an improvement on that metric space search data structure. Our technique shows that we can compress the permutation based algorithm without loosing precision. We show experimentally that our technique is competitive with the original idea and improves it up to 46% in real databases.

Explore More