Jorge Buenabad-Chávez

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jorge Buenabad-Chávez is active.

Explore More

Publication

Featured researches published by Jorge Buenabad-Chávez.

very large data bases | 2009

Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options

Norman W. Paton; Jorge Buenabad-Chávez; Mengsong Chen; Vijayshankar Raman; Garret Swart; Inderpal Narang; Daniel M. Yellin; Alvaro A. A. Fernandes

Writing parallel programs that can take advantage of non-dedicated processors is much more difficult than writing such programs for networks of dedicated processors. In a non-dedicated environment such programs must use autonomic techniques to respond to the unpredictable load fluctuations that prevail in the computational environment. In adaptive query processing (AQP), several techniques have been proposed for dynamically redistributing processor load assignments throughout a computation to take account of varying resource capabilities, but we know of no previous study that compares their performance. This paper presents a simulation-based evaluation of these autonomic parallelization techniques in a uniform environment and compares how well they improve the performance of the computation. Four published strategies are compared with a new algorithm that seeks to overcome some weaknesses identified in the existing approaches. In addition, we explore the use of techniques from online algorithms to provide a firm foundation for determining when to adapt in two of the existing algorithms. The evaluations identify situations in which each strategy may be used effectively and in which it should be avoided.

Workshop on Logic Programming | 2013

A Datalog Engine for GPUs

Carlos Alberto Martinez-Angeles; Inês de Castro Dutra; Vítor Santos Costa; Jorge Buenabad-Chávez

We present the design and evaluation of a Datalog engine for execution in Graphics Processing Units (GPUs). The engine evaluates recursive and non-recursive Datalog queries using a bottom-up approach based on typical relational operators. It includes a memory management scheme that automatically swaps data between memory in the host platform (a multicore) and memory in the GPU in order to reduce the number of memory transfers. To evaluate the performance of the engine, four Datalog queries were run on the engine and on a single CPU in the multicore host. One query runs up to 200 times faster on the (GPU) engine than on the CPU.

international conference on data engineering | 2008

Probabilistic adaptive load balancing for parallel queries

Daniel M. Yellin; Jorge Buenabad-Chávez; Norman W. Paton

In the context of adaptive query processing (AQP), several techniques have been proposed for dynamically adapting/redistributing processor load assignments throughout a computation to take account of varying resource capabilities. The effectiveness of these techniques depends heavily on when and to what they adapt processor load assignments, particularly in the presence of varying load imbalance. This paper presents a probabilistic approach to decide when and to what to adapt processor load assignments. Using a simulation based evaluation, it is compared to two other approaches already reported. These two approaches are simpler in their decision making than the probabilistic approach, but the latter performs better under several scenarios of load imbalance.

International Journal of Parallel Programming | 2016

Relational Learning with GPUs: Accelerating Rule Coverage

Carlos Alberto Martinez-Angeles; Haicheng Wu; Inês de Castro Dutra; Vítor Santos Costa; Jorge Buenabad-Chávez

Relational learning algorithms mine complex databases for interesting patterns. Usually, the search space of patterns grows very quickly with the increase in data size, making it impractical to solve important problems. In this work we present the design of a relational learning system, that takes advantage of graphics processing units (GPUs) to perform the most time consuming function of the learner, rule coverage. To evaluate performance, we use four applications: a widely used relational learning benchmark for predicting carcinogenesis in rodents, an application in chemo-informatics, an application in opinion mining, and an application in mining health record data. We compare results using a single and multiple CPUs in a multicore host and using the GPU version. Results show that the GPU version of the learner is up to eight times faster than the best CPU version.

Lecture Notes in Computer Science | 2004

Easing Message-Passing Parallel Programming Through a Data Balancing Service

Graciela Román-Alonso; Miguel A. Castro-García; Jorge Buenabad-Chávez

The message passing model is now widely used for parallel computing, but is still difficult to use with some applications. The explicit data distribution or the explicit dynamic creation of parallel tasks can require a complex algorithm. In this paper, in order to avoid explicit data distribution, we propose a programming approach based on a data load balancing service for MPI-C. Using a parallel version of the merge sort algorithm, we show how our service avoids explicit data distribution completely, easing parallel programming. Some performance results are presented which compare our approach to a version of merge sort with explicit data distribution.

parallel computing | 2003

Virtual Memory on Data Diffusion Architectures

Jorge Buenabad-Chávez; Henk L. Muller; Paul W. A. Stallard; David H. D. Warren

Data diffusion architectures (also known as cache only memory architectures) provide, a shared address space on top of distributed memory. Their distinctive feature is that data diffuses, or migrates and replicates, in main memory according to whichever processors are using the data. This requires an associative organisation of main memory, which decouples each address and its data item from any physical location. A data item can thus be placed and replicated where it is needed. Also, the physical address space does not have to be fixed and contiguous. It can be any set of addresses within the address range of the processors, possibly varying over time, provided it is smaller than the size of main memory. This flexibility is similar to that of a virtual address space, and offers new possibilities to organise a virtual memory system.We present an analysis of possible organisations of virtual memory on such architectures, and propose two main alternatives: traditional virtual memory (TVM) is organised around a fixed and contiguous physical address space using a traditional mapping; associative memory virtual memory (AMVM) is organised around a variable and non-contiguous physical address space using a simpler mapping.To evaluate TVM and AMVM, we extended a multiprocessor emulation of a data diffusion architecture to include part of the Mach operating system virtual memory. This extension implements TVM; a slightly modified version implements AMVM. On applications tested, AMVM shows a marginal performance gain over TVM. We argue that AMVM will offer greater advantages with higher degrees of parallelism or larger data sets.

international conference on electrical engineering, computing science and automatic control | 2011

Reducing communication overhead under parallel list processing in multicore clusters

Jorge Buenabad-Chávez; Miguel A. Castro-García; Jose Luis Quiroz-Fabian; Edgar F. Hernández-Ventura; Graciela Román-Alonso; Daniel M. Yellin; Manuel Aguilar-Cornejo

The Data List Management Library (DLML) processes data lists in parallel, balancing the workload transparently to programmers. Its first design was targeted at clusters of uniprocessor nodes, and based on multiprocess parallelism and on message-passing communication. This paper presents a multithreaded design of DLML aimed at clusters of multicore nodes to better capitalise on intra-node parallelism. On applications tested, MultiCore DLML runs twice as fast as DLML when message-passing communication is not excessive. Good performance was achieved only after addressing issues relating to MPI communication overhead, cache locality and memory consumption.

international conference of the ieee engineering in medicine and biology society | 2007

Segmentation of Brain Image Volumes Using the Data List Management Library

G. Roman-Alonso; J.R. Jimenez-Alaniz; Jorge Buenabad-Chávez; Miguel A. Castro-García; A.H. Vargas-Rodriguez

The segmentation of head images is useful to detect neuroanatomical structures and to follow and quantify the evolution of several brain lesions. 2D images correspond to brain slices. The more images are used the higher the resolution obtained is, but more processing power is required and parallelism becomes desirable. We present a new approach to segmentation of brain image volumes using DLML (data list management library), a tool developed by our team. We organise the integer numbers identifying images into a list, and our DLML version process them both in parallel and with dynamic load balancing transparently to the programmer. We compare the performance of our DLML version to other typical parallel approaches developed with MPI (master-slave and static data distribution), using cluster configurations with 4-32 processors.

inductive logic programming | 2015

Processing Markov Logic Networks with GPUs: Accelerating Network Grounding

Carlos Alberto Martinez-Angeles; Inês de Castro Dutra; Vítor Santos Costa; Jorge Buenabad-Chávez

Markov Logic is an expressive and widely used knowledge representation formalism that combines logic and probabilities, providing a powerful framework for inference and learning tasks. Most Markov Logic implementations perform inference by transforming the logic representation into a set of weighted propositional formulae that encode a Markov network, the ground Markov network. Probabilistic inference is then performed over the grounded network.

international conference on electrical engineering, computing science and automatic control | 2013

Greedily using GPU capacity for data list processing in multicore-GPU platforms

Carlos Alberto Martinez-Angeles; Jorge Buenabad-Chávez; Miguel A. Castro-García; Jose Luis Quiroz-Fabian

We have designed data list processing for multicore-GPU platforms and significantly improved the performance of both numerical and symbolic applications. For the latter, a novel aspect of our design was the management and processing of new data dynamically generated within GPUs. This paper presents various optimisations to our first design [1] aimed to use more the GPU, through reducing communication between the host (a multicore) and the GPU, in order to improve performance further. We present experimental results for three applications with different granularities and access patterns. Performance was improved again, significantly in some cases; using multicore-GPU platforms efficiently may involve complex changes to software.

Explore More