Josep Aguilar-saborit

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Josep Aguilar-saborit is active.

Explore More

Publication

Featured researches published by Josep Aguilar-saborit.

international conference on management of data | 2006

Dynamic count filters

Josep Aguilar-saborit; Pedro Trancoso; Victor Muntés-Mulero; Josep-lluis Larriba-pey

Bloom filters are not able to handle deletes and inserts on multisets over time. This is important in many situations when streamed data evolve rapidly and change patterns frequently. Counting Bloom Filters (CBF) have been proposed to overcome this limitation and allow for the dynamic evolution of Bloom filters. The only dynamic approach to a compact and efficient representation of CBF are the Spectral Bloom Filters (SBF).In this paper we propose the Dynamic Count Filters (DCF) as a new dynamic and space-time efficient representation of CBF. Although DCF does not make a compact use of memory, it shows to be faster and more space efficient than any previous proposal. Results show that the proposed data structure is more efficient independently of the incoming data characteristics.

international conference on computational science | 2006

CGO: a sound genetic optimizer for cyclic query graphs

Victor Muntés-Mulero; Josep Aguilar-saborit; Calisto Zuzarte; Josep-lluis Larriba-pey

The increasing number of applications requiring the use of large join queries reinforces the search for good methods to determine the best execution plan. This is especially true, when the large number of joins occurring in a query prevent traditional optimizers from using dynamic programming. In this paper we present the Carquinyoli Genetic Optimizer (CGO). CGO is a sound optimizer based on genetic programming that uses a subset of the cost-model of IBM®DB2®Universal DatabaseTM(DB2 UDB) for selection in order to produce new generations of query plans. Our study shows that CGO is very competitive either as a standalone optimizer or as a fast post-optimizer. In addition, CGO takes into account the inherent characteristics of query plans like their cyclic nature.

data and knowledge engineering | 2008

Dynamic adaptive data structures for monitoring data streams

Josep Aguilar-saborit; Pedro Trancoso; Victor Muntés-Mulero; Josep-lluis Larriba-pey

The monitoring of data streams is a very important issue in many different areas. Aspects such as accuracy, the speed of response, the use of memory and the adaptability to the changing nature of data may vary in importance depending on the situation. Examples such as Web page access monitoring, approximate aggregation in relational queries or IP message routing are clear examples of a varied range of those needs. There are different data structures that deal with this problem such as the counting bloom filters, the spectral bloom filters and the dynamic count filters. Those data structures range from static to complex dynamic representations of the data stream that keep an approximate count of the number of occurrences for each data value. In this paper, we focus on three main aspects. First, we analyze the problem in perspective and review the existing static and dynamic solutions. Second, we propose and analyze in depth a simple yet powerful partitioning strategy that reinforces the advantages of the methods proposed up to now solving most of their drawbacks. Finally, using real executions and mathematical models, we evaluate the existing methods alone and in combination with our partitioning strategy. We show that with our partitioning strategy, it is possible to reduce the memory requirements and average response time, improving the adaptiveness to changing data characteristics and leaving the accuracy of the partitioned dynamic data structures intact.

european conference on parallel processing | 2003

Pushing Down Bit Filters in the Pipelined Execution of Large Queries

Josep Aguilar-saborit; Victor Muntés-Mulero; Josep-lluis Larriba-pey

We propose a new strategy to use Bit Filters for complex pipelined queries on large databases that we call Pushed Down Bit Filters. The objective of the strategy is to make use of the Bit Filters already created for upper nodes of the execution plan, in the leaves of the plan. The aim of this strategy is to reduce the traffic between the nodes of the execution plan. When traffic is reduced, the amount of CPU work is reduced and, in most of the cases, I/O is also reduced. In addition, this technique shows no degradation in cases with little effectiveness.

conference on information and knowledge management | 2008

Cache-aware load balancing for question answering

David Dominguez-Sal; Mihai Surdeanu; Josep Aguilar-saborit; Josep Lluis Larriba-Pey

The need for high performance and throughput Question Answering (QA) systems demands for their migration to distributed environments. However, even in such cases it is necessary to provide the distributed system with cooperative caches and load balancing facilities in order to achieve the desired goals. Until now, the literature on QA has not considered such a complex system as a whole. Currently, the load balancer regulates the assignment of tasks based only on the CPU and I/O loads without considering the status of the system cache. This paper investigates the load balancing problem proposing two novel algorithms that take into account the distributed cache status, in addition to the CPU and I/O load in each processing node. We have implemented, and tested the proposed algorithms in a fully fledged distributed QA system. The two algorithms show that the choice of using the status of the cache was determinant in achieving good performance, and high throughput for QA systems.

data and knowledge engineering | 2007

Star join revisited: Performance internals for cluster architectures

Josep Aguilar-saborit; Victor Muntés-Mulero; Calisto Zuzarte; Josep-lluis Larriba-pey

Data warehouse workloads are crucial for the support of on-line analytical processing (OLAP). The strategy to cope with OLAP queries on such huge amounts of data calls for the use of large parallel computers. The trend today is to use cluster architectures that show a reasonable balance between cost and performance. In such cases, it is necessary to tune the applications in order to minimize the amount of I/O and communication, such that the global execution time is reduced as much as possible. In this paper, we model and analyze the most up-to-date strategies for ad hoc star join query processing in a cluster of computers. We show that, for ad hoc query processing and assuming a limited amount of resources available, these strategies still have room for improvement both in terms of I/O and inter-node data traffic communication. Our analysis concludes with the proposal of a hybrid solution that improves these two aspects compared to the previous techniques, and shows near optimal results in a broad spectrum of cases.

parallel, distributed and network-based processing | 2006

Dynamic out of core join processing in symmetric multiprocessors

Josep Aguilar-saborit; Victor Muntés-Mulero; Calisto Zuzarte; Adriana Zubiri; Josep-lluis Larriba-pey

The use of clusters of symmetric multiprocessor (SMP) configurations in database processing has become a key factor in allowing greater scalability. It has also posed many challenges in the implementation of one of the most costly operations within relational algebra: the join operation. When massive data is involved, usually the join cannot be performed in-memory and is processed out of core. In this case, performance depends on an effective use of the memory hierarchy, such that I/O and memory contention are minimized. In this paper, we propose a parallel algorithm for out of core join processing that dynamically adapts its behavior to the resources available in the system. We evaluate and compare our proposal against other parallel approaches in a real SMP cluster in a major commercial database, the IBM/spl reg/ DB2P Universal Database7 product (DB2 UDB). Results show that our proposal outperforms previous work significantly.

database and expert systems applications | 2006

Parameterizing a genetic optimizer

Victor Muntés-Mulero; Marta Pérez-Casany; Josep Aguilar-saborit; Calisto Zuzarte; Josep-lluis Larriba-pey

Genetic programming has been proposed as a possible although still intriguing approach for query optimization. There exist two main aspects which are still unclear and need further investigation, namely, the quality of the results and the speed to converge to an optimum solution. In this paper we tackle the first aspect and present and validate a statistical model that, for the first time in the literature, lets us state that the average cost of the best query execution plan (QEP) obtained by a genetic optimizer is predictable. Also, it allows us to analyze the parameters that are most important in order to obtain the best possible costed QEP. As a consequence of this analysis, we demonstrate that it is possible to extract general rules in order to parameterize a genetic optimizer independently from the random effects of the initial population.

british national conference on databases | 2006

Analyzing the genetic operations of an evolutionary query optimizer

Victor Muntés-Mulero; Josep Aguilar-saborit; Calisto Zuzarte; Volker Markl; Josep-lluis Larriba-pey

In this paper we analyze the importance of the operations in a genetic programming-based optimizer. Among the several conclusions, we show that crossover operations have a larger impact on the quality of the best obtained execution plan than mutation operations.

data warehousing and knowledge discovery | 2005

Ad hoc star join query processing in cluster architectures

Josep Aguilar-saborit; Victor Muntés-Mulero; Calisto Zuzarte; Josep-lluis Larriba-pey

Processing of large amounts of data in data warehouses is increasingly being done in cluster architectures to achieve scalability. In this paper we look into the problem of ad hoc star join query processing in clusters architectures. We propose a new technique, the Star Hash Join (SHJ), which exploits a combination of multiple bit filter strategies in such architectures. SHJ is a generalization of the Pushed Down Bit Filters for clusters. The objectives of the technique are to reduce (i) the amount of data communicated, (ii) the amount of data spilled to disk during the execution of intermediate joins in the query plan, and (iii) amount of memory used by auxiliary data structures such as bit filters.

Explore More