Is this you? Create Your Porfile

Karina Figueroa

Universidad Michoacana de San Nicolás de Hidalgo

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karina Figueroa is active.

Explore More

Publication

Featured researches published by Karina Figueroa.

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms | 2006

Practical construction of k -nearest neighbor graphs in metric spaces

Rodrigo Paredes; Edgar Chávez; Karina Figueroa; Gonzalo Navarro

Let U be a set of elements and d a distance function defined among them. Let NN k (u) be the k elements in U - {u} having the smallest distance to u. The k-nearest neighbor graph (kNNG) is a weighted directed graph G(U,E) such that E = {(u,v),v ∈ NN k (u)}. Several kNNG construction algorithms are known, but they are not suitable to general metric spaces. We present a general methodology to construct kNNGS that exploits several features of metric spaces. Experiments suggest that it yields costs of the form c 1 n 1.27 distance computations for low and medium dimensional spaces, and c 2 n 1.90 for high dimensional ones.

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms | 2006

On the least cost for proximity searching in metric spaces

Karina Figueroa; Edgar Chávez; Gonzalo Navarro; Rodrigo Paredes

Proximity searching consists in retrieving from a database those elements that are similar to a query. As the distance is usually expensive to compute, the goal is to use as few distance computations as possible to satisfy queries. Indexes use precomputed distances among database elements to speed up queries. As such, a baseline is AESA, which stores all the distances among database objects, but has been unbeaten in query performance for 20 years. In this paper we show that it is possible to improve upon AESA by using a radically different method to select promising database elements to compare against the query. Our experiments show improvements of up to 75% in document databases. We also explore the usage of our method as a probabilistic algorithm that may lose relevant answers. On a database of faces where any exact algorithm must examine virtually all elements, our probabilistic version obtains 85% of the correct answers by scanning only 10% of the database.

mexican international conference on artificial intelligence | 2005

Proximity searching in high dimensional spaces with a proximity preserving order

Edgar Chávez; Karina Figueroa; Gonzalo Navarro

Kernel based methods (such as k-nearest neighbors classifiers) for AI tasks translate the classification problem into a proximity search problem, in a space that is usually very high dimensional. Unfortunately, no proximity search algorithm does well in high dimensions. An alternative to overcome this problem is the use of approximate and probabilistic algorithms, which trade time for accuracy. In this paper we present a new probabilistic proximity search algorithm. Its main idea is to order a set of samples based on their distance to each element. It turns out that the closeness between the order produced by an element and that produced by the query is an excellent predictor of the relevance of the element to answer the query. The performance of our method is unparalleled. For example, for a full 128-dimensional dataset, it is enough to review 10% of the database to obtain 90% of the answers, and to review less than 1% to get 80% of the correct answers. The result is more impressive if we realize that a full 128-dimensional dataset may span thousands of dimensions of clustered data. Furthermore, the concept of proximity preserving order opens a totally new approach for both exact and approximated proximity searching.

mexican international conference on artificial intelligence | 2004

Faster Proximity Searching in Metric Data

Edgar Chávez; Karina Figueroa

A number of problems in computer science can be solved efficiently with the so called memory based or kernel methods. Among this problems (relevant to the AI community) are multimedia indexing, clustering, non supervised learning and recommendation systems. The common ground to this problems is satisfying proximity queries with an abstract metric database.

similarity search and applications | 2009

Speeding Up Permutation Based Indexing with Indexing

Karina Figueroa; Kimmo Frediksson

A recent probabilistic approach for searching in high dimensional metric spaces is based on predicting the distances between database elements according to how they order their distances towards some set of distinguished elements, called permutants. In the preprocessing phase a set of permutants is chosen, and are sorted (permuted) by their distances against every database element. The permutations form the index. When a query is given, its corresponding permutation is computed, and --- as similar elements will (probably) have a similar permutation --- the database is compared in the order induced by the similarity between permutations. This works well but has relatively high CPU time due to computing the distances between permutations and (partially) sorting the database by the similarity. We improve this by identifying and solving this as another metric space problem. This avoids many distance computations between the permutants. The experimental results show that this works extremely well in practice.

similarity search and applications | 2013

Extreme Pivots for Faster Metric Indexes

Guillermo Ruiz; Edgar Chávez; Karina Figueroa; Eric Sadit Tellez

Pivot tables are popular for exact metric indexing. It is well known that a large pivot table produces faster indexes. The rule of thumb is to use as many pivots as the available memory allows for a given application. To further speedup searches, redundant pivots can be eliminated or the scope of the pivots the number of database objects covered by a pivot can be reduced. In this paper, we apply a different technique to speedup searches. We assign objects to pivots while, at the same time, enforcing proper coverage of the database objects. This increases the discarding power of pivots and in turn leads to faster searches. The central idea is to select a set of essential pivots without redundancy covering the entire database. We call our technique extreme pivoting EP. A nice additional property of EP is that it balances performance and memory usage. For example; using the same amount of memory, EP is faster than the List of Clusters and the Spatial Approximation Tree. Moreover, EP is faster than LAESA even when it uses less memory. The EP technique was formally modeled allowing performance prediction without an actual implementation. Performance and memory usage depend on two parameters of EP, which are characterized with a wide range of experiments. Also, we provide automatic selection of one parameter fixing the other. The formal model was empirically tested with real world and synthetic datasets finding high consistency between the predicted and the actual performance.

ACM Journal of Experimental Algorithms | 2009

Speeding up spatial approximation search in metric spaces

Karina Figueroa; Edgar Chávez; Gonzalo Navarro; Rodrigo Paredes

Proximity searching consists of retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speedup query processing. Among all the known indices, the baseline for performance for about 20 years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this article, we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESA-like algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60% of the distance evaluations of AESA in a database of documents, a very important and difficult real-life instance of the problem. For the probabilistic algorithm, we perform in a database of faces up to 40% of the comparisons made by the best alternative algorithm to retrieve the same percentage of the correct answer. Based on the empirical results, we conjecture that the new probabilistic AESA-like algorithms will become, as AESA had been for exact algorithms, a reference point establishing, in practice, a lower bound on how good a probabilistic proximity search algorithm can be.

similarity search and applications | 2013

List of Clustered Permutations for Proximity Searching

Karina Figueroa; Rodrigo Paredes

The permutation based algorithm has been proved unbeatable in high dimensional spaces, requiring O|i��| distance evaluations when solving similarity queries where i�� is the set of permutants; but needs n evaluations of the permutant distance to compute the order to review the metric dataset, requires On|i��| space, and does not take much benefit from low dimensionality. There have been several proposals to avoid the n computations of the permutant distance, however all of them lost precision. Inspired in the list of cluster, in this paper we group the permutations and establish a criterion to discard whole clusters according the permutation of their centers. As a consequence of our proposal, we now reduce not only the space of the index and the number of distance evaluations but also the cpu time required when comparing the permutations themselves. Also, we can use the permutations in low dimensions.

WEA'07 Proceedings of the 6th international conference on Experimental algorithms | 2007

Simple space-time trade-offs for AESA

Karina Figueroa; Kimmo Fredriksson

We consider indexing and range searching in metric spaces. The best method known is AESA, in practice requiring the fewest number of distance evaluations to answer range queries. The problem with AESA is its space complexity, requiring storage for Θ(n2) distance values to index n objects.We give several methods to reduce this cost. The main observation is that exact distance values are not needed, but lower and upper bounds suffice. The simplest of our methods need only Θ(n2) bits (as opposed to words) of storage, but the price to pay is more distance evaluations, the exact cost depending on the dimension, as compared to AESA. To reduce this efficiency gap we extend our method to use b distance bounds, requiring Θ(n2 log2(b)) bits of storage. The scheme uses also Θ(b) or Θ(bn) words of auxiliary space. We experimentally show that using b ∈ {1, . . . , 16} (depending on the problem instance) gives good results. Our preprocessing and side computation costs are the same as for AESA. We propose several improvements, achieving e.g. O(n1+α) construction cost for some 0 < α < 1, and a variant using even less space.

mexican conference on pattern recognition | 2014

An Effective Permutant Selection Heuristic for Proximity Searching in Metric Spaces

Karina Figueroa; Rodrigo Paredes

The permutation based index has shown to be very effective in medium and high dimensional metric spaces, even in difficult problems such as solving reverse k-nearest neighbor queries. Nevertheless, currently there is no study about which are the desirable features one can ask to a permutant set, or how to select good permutants. Similar to the case of pivots, our experimental results show that, compared with a randomly chosen set, a good permutant set yields to fast query response or to reduce the amount of space used by the index. In this paper, we start by characterizing permutants and studying their predictive power; then we propose an effective heuristic to select a good set of permutant candidates. We also show empirical evidence that supports our technique.

Explore More