Alexandre A. B. Lima | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexandre A. B. Lima is active.

Explore More

Publication

Featured researches published by Alexandre A. B. Lima.

Distributed and Parallel Databases | 2009

Parallel OLAP query processing in database clusters with data replication

Alexandre A. B. Lima; Camille Furtado; Patrick Valduriez; Marta Mattoso

We consider the problem of improving the performance of OLAP applications in a database cluster (DBC), which is a low cost and effective parallel solution for query processing. Current DBC solutions for OLAP query processing provide for intra-query parallelism only, at the cost of full replication of the database. In this paper, we propose more efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication. We also propose a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas. Our experimental validation is based on the implementation of our solution on the SmaQSS DBC middleware prototype. Our experimental results using the TPC-H benchmark and a 32-node cluster show very good speedup.

symposium on computer architecture and high performance computing | 2005

Physical and virtual partitioning in OLAP database clusters

Camille Furtado; Alexandre A. B. Lima; Esther Pacitti; Patrick Valduriez; Marta Mattoso

On-line analytical processing (OLAP) applications require high performance database support to achieve good response time (crucial for decision making). Database clusters provide a cost-effective alternative to parallel database systems. For OLAP applications, that typically use heavy weight queries, intra-query parallelism yields better performance as it reduces the execution time of individual queries. Intra-query parallelism is based on processing the same query on different subsets of the query table. Combining physical and virtual partitioning to define table subsets provides flexibility in intra-query parallelism while optimizing disk space usage and data availability. Experiments with our partitioning technique using TPC-H benchmark queries on a 32-dual node cluster gave linear and super-linear speedup, thereby reducing significantly the time of typical OLAP heavy weight queries.

high performance distributed computing | 2010

Data parallelism in bioinformatics workflows using Hydra

Fábio Coutinho; Eduardo S. Ogasawara; Daniel de Oliveira; Vanessa Braganholo; Alexandre A. B. Lima; Alberto M. R. Dávila; Marta Mattoso

Large scale bioinformatics experiments are usually composed by a set of data flows generated by a chain of activities (programs or services) that may be modeled as scientific workflows. Current Scientific Workflow Management Systems (SWfMS) are used to orchestrate these workflows to control and monitor the whole execution. It is very common in bioinformatics experiments to process very large datasets. In this way, data parallelism is a common approach used to increase performance and reduce overall execution time. However, most of current SWfMS still lack on supporting parallel executions in high performance computing (HPC) environments. Additionally keeping track of provenance data in distributed environments is still an open, yet important problem. Recently, Hydra middleware was proposed to bridge the gap between the SWfMS and the HPC environment, by providing a transparent way for scientists to parallelize workflow executions while capturing distributed provenance. This paper analyzes data parallelism scenarios in bioinformatics domain and presents an extension to Hydra middleware through a specific cartridge that promotes data parallelism in bioinformatics workflows. Experimental results using workflows with BLAST show performance gains with the additional benefits of distributed provenance support.

european conference on parallel processing | 2004

OLAP Query Processing in a Database Cluster

Alexandre A. B. Lima; Marta Mattoso; Patrick Valduriez

The efficient execution of OLAP queries, which are typically read-only and heavy-weight, is a hard problem which has been traditionally solved using tightly-coupled multiprocessors. Considering a database cluster as a cost-effective alternative, we propose an efficient, yet simple, solution, called fined-grained virtual partitioning to OLAP parallel query processing. We designed this solution for a shared-nothing database cluster architecture that can scale up to very large configurations and support black-box DBMS using non intrusive, simple techniques. To validate our solution, we implemented a Java prototype on a 16 node cluster system and ran experiments with typical queries of the TPC-H benchmark. The results show that our solution yields linear, and sometimes super-linear, speedup. With 16 nodes, it outperforms traditional virtual partitioning by a factor of 6.

high performance computing for computational science (vector and parallel processing) | 2008

High-Performance Query Processing of a Real-World OLAP Database with ParGRES

Melissa Paes; Alexandre A. B. Lima; Patrick Valduriez; Marta Mattoso

Typical OLAP queries take a long time to be processed so speeding up the execution of each single query is imperative to decision making. ParGRES is an open-source database cluster middleware for high performance OLAP query processing. By exploiting intra-query parallelism on PC clusters, ParGRES has shown excellent performance using the TPC-H benchmark. In this paper, we evaluate ParGRES on a real-world OLAP database. Through adaptive virtual partitioning of the database, ParGRES yields linear and very often super-linear speedup for frequent queries. This shows that ParGRES is a very cost-effective solution for OLAP query processing in real settings.

Concurrency and Computation: Practice and Experience | 2011

Many task computing for orthologous genes identification in protozoan genomes using Hydra

Fábio Coutinho; Eduardo S. Ogasawara; Daniel de Oliveira; Vanessa Braganholo; Alexandre A. B. Lima; Alberto M. R. Dávila; Marta Mattoso

One of the main advantages of using a scientific workflow management system (SWfMS) is to orchestrate data flows among scientific activities and register provenance of the whole workflow execution. Nevertheless, the execution control of distributed activities in high performance computing environments by SWfMS presents challenges such as steering control and provenance gathering. Such challenges may become a complex task to be accomplished in bioinformatics experiments, particularly in Many Task Computing scenarios. This paper presents a data parallelism solution for a bioinformatics experiment supported by Hydra, a middleware that bridges SWfMS and high performance computing to enable workflow parallelization with provenance gathering. Hydra Many Task Computing parallelization strategies can be registered and reused. Using Hydra, provenance may also be uniformly gathered. We have evaluated Hydra using an Orthologous Gene Identification workflow. Experimental results show that a systematic approach for distributing parallel activities is viable, sparing scientist time and diminishing operational errors, with the additional benefits of distributed provenance support. Copyright

RED'10 Proceedings of the Third international conference on Resource Discovery | 2010

Athena: text mining based discovery of scientific workflows in disperse repositories

Flavio Costa; Daniel de Oliveira; Eduardo S. Ogasawara; Alexandre A. B. Lima; Marta Mattoso

Scientific workflows are abstractions used to model and execute in silico scientific experiments. They represent key resources for scientists and are enacted and managed by engines called Scientific Workflow Management Systems (SWfMS). Each SWfMS has a particular workflow language. This heterogeneity of languages and formats poses as complex scenario for scientists to search or discover workflows in distributed repositories for reuse. The existing workflows in these repositories can be used to leverage the identification and construction of families of workflows (clusters) that aim at a particular goal. However it is hard to compare the structure of these workflows since they are modeled in different formats. One alternative way is to compare workflow metadata such as natural language descriptions (usually found in workflow repositories) instead of comparing workflow structure. In this scenario, we expect that the effective use of classical text mining techniques can cluster a set of workflows in families, offering to the scientists the possibility of finding and reusing existing workflows, which may decrease the complexity of modeling a new experiment. This paper presents Athena, a cloud-based approach to support workflow clustering from disperse repositories using their natural language descriptions, thus integrating these repositories and providing a facilitated form to search and reuse workflows.

extending database technology | 2006

Apuama: combining intra-query and inter-query parallelism in a database cluster

Bernardo Miranda; Alexandre A. B. Lima; Patrick Valduriez; Marta Mattoso

Database clusters provide a cost-effective solutionn for high performance query processing. By using either inter- or intra-query parallelism on replicated data, they can accelerate individual queries and increase throughput. However, there is no database cluster that combines inter- and intra-query parallelism while supporting intensive update transactions. C-JDBC is a successful database cluster that offers inter-query parallelism and controls database replica consistency but cannot accelerate individual heavy-weight queries, typical of OLAP. In this paper, we propose the Apuama Engine, which adds intra-query parallelism to C-JDBC. The result is an open-source package that supports both OLTP and OLAP applications. We validated Apuama on a 32-node cluster running OLAP queries of the TPC-H benchmark on top of PostgreSQL. Our tests show that the Apuama Engine yields super-linear speedup and scale-up in read-only environments. Furthermore, it yields excellent performance under data update operations.

international conference on data management in grid and p2p systems | 2010

Continuous timestamping for efficient replication management in DHTs

Reza Akbarinia; Mounir Tlili; Esther Pacitti; Patrick Valduriez; Alexandre A. B. Lima

Distributed Hash Tables (DHTs) provide an efficient solution for data location and lookup in large-scale P2P systems. However, it is up to the applications to deal with the availability of the data they store in the DHT, e.g. via replication. To improve data availability, most DHT applications rely on data replication. However, efficient replication management is quite challenging, in particular because of concurrent and missed updates. In this paper, we propose an efficient solution to data replication in DHTs. We propose a new service, called Continuous Timestamp based Replication Management (CTRM), which deals with the efficient storage, retrieval and updating of replicas in DHTs. To perform updates on replicas, we propose a new protocol that stamps update actions with timestamps generated in a distributed fashion. Timestamps are not only monotonically increasing but also continuous, i.e. without gap. The property of monotonically increasing allows applications to determine a total order on updates. The other property, i.e. continuity, enables applications to deal with missed updates. We evaluated the performance of our solution through simulation and experimentation. The results show its effectiveness for replication management in DHTs.

ieee international conference on high performance computing data and analytics | 2008

Adaptive hybrid partitioning for OLAP query processing in a database cluster

Camille Furtado; Alexandre A. B. Lima; Esther Pacitti; Patrick Valduriez; Marta Mattoso

We consider the use of a database cluster for high-performance support of Online Analytical Processing (OLAP) applications. OLAP intra-query parallelism can be obtained by partitioning the database tables across cluster nodes. We propose to combine physical and virtual partitioning into a partitioning scheme called Adaptive Hybrid Partitioning (AHP). AHP requires less disk space while allowing for load balancing. We developed a prototype for OLAP parallel query processing in database clusters using AHP. Our experiments on a 32-node database cluster using the TPC-H benchmark demonstrate linear and super-linear speedup. Thus, AHP can reduce significantly the execution time of typical OLAP queries.

Explore More