Henrique Mongelli
Federal University of Mato Grosso do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Henrique Mongelli.
international conference on conceptual structures | 2014
Henrique Fingler; Edson Norberto Cáceres; Henrique Mongelli; Siang W. Song
Abstract The Multidimensional Knapsack Problem (MKP) is a generalization of the basic Knapsack Problem, with two or more constraints. It is an important optimization problem with many real-life applications. To solve this NP-hard problem we use a metaheuristic algorithm based on ant colony optimization (ACO). Since several steps of the algorithm can be carried out concurrently, we propose a parallel implementation under the GPGPU paradigm (General Purpose Graphics Processing Units) using CUDA. To use the algorithm presented in this paper, it is necessary to balance the number of ants, number of rounds used, and whether local search is used or not, depending on the quality of the solution desired. In other words, there is a compromise between time and quality of solution. We obtained very promising experimental results and we compared our implementation with those in the literature. The results obtained show that ant colony optimization is a viable approach to solve MKP efficiently, even for large instances, with the parallel approach.
international parallel processing symposium | 1999
Henrique Mongelli; Siang W. Song
Given an array of n real numbers A=(a 1, a 2, ..., a n ), define MIN(i, j) = min {a i , ..., a j }. The range minima problem consists of preprocessing array A such that queries MIN(i,j), for any 1≤i≤j≤n, can be answered in constant time. Range minima is a basic problem that appears in many other important problems such as lowest common ancestor, Euler tour, pattern matching with scaling, etc. In this work we present a parallel algorithm under the CGM model (Coarse Grained Multicomputer), that solves the range minima problem in O(n/p) time and constant number of communication rounds.
international conference on high performance computing and simulation | 2010
Edson Norberto Cáceres; Henrique Mongelli; Christiane Nishibe; Siang W. Song
Dehne et al. present a BSP/CGM algorithm for computing a spanning tree and the connected components of a graph, that requires O(log p) communication rounds, where p is the number of processors. It requires the solution of the Euler tour problem which in turn is based on the solution of the list ranking problem. In this paper we present experimental results of a parallel algorithm that does not depend on the solution of the Euler tour or the list ranking problem. The proposed algorithm has the practical advantage of avoiding the list ranking computation and is based on the integer sorting algorithm which can be implemented efficiently on the BSP/CGM model. We implemented the proposed algorithm on a Beowulf cluster and on a grid running the InteGrade middleware. We obtained encouraging albeit modest speedup on a small Beowulf cluster and expect good speedups on the grid for larger size graphs and clusters.
international conference on parallel processing | 2012
E.N. C'ceres; H. Fingler; Henrique Mongelli; Siang W. Song
The NP-hard Quadratic Assignment Problem (QAP) was proposed in 1957. Until this date, it remains one of the hardest problems to solve in any reasonable amount of time, even for small instances. Even using parallel computation and assuming small instances of the problem, some naive and deterministic algorithms require too much time to obtain the solution. In some cases an approximate approach would be satisfactory and heuristic and approximation algorithms have been proposed. In this paper we use heuristic techniques based on ant colony system to find approximate solutions for the QAP using GPGPUs (General-Purpose Computing on Graphics Processing Units). We review two methods to solve this problem: Hybrid Ant System (HAS-QAP algorithm) and the cunning Ant System (cASQAP algorithm). We parallelize both algorithms to run on a GPU, using CUDA. The parallelized HAS version (called CUDAHAS) outperforms the parallel cAS algorithm and we present performance results of this parallel algorithm with respect to its sequential counterpart. We used four well-known input instances (named els19, nug30, sko72 and wil100), of sizes ranging from 19 to 100, whose best solutions are known. Our results show the power of the GPU algorithm CUDA-HAS in the case when the required error margin is small (0.1% error), where our CUDA-HAS algorithm was able to present a speedup of 103 with respect to the sequential execution time by the CPU alone, for the instance nug30. The HAS-GPU algorithm is indicated when we wish to solve the QAP for large sizes (up to 100) with a solution that is close to the optimum.
european conference on parallel processing | 2004
Edson Norberto Cáceres; Frank K. H. A. Dehne; Henrique Mongelli; Siang W. Song; Jayme Luiz Szwarcfiter
Dehne et al. present a BSP/CGM algorithm for computing a spanning tree and the connected components of a graph, that requires O(log p) communication rounds, where p is the number of processors. It requires the solution of the Euler tour problem which in turn is based on the solution of the list ranking problem. In this paper we present experimental results of a parallel algorithm that does not depend on the solution of the Euler tour or the list ranking problem. The proposed algorithm has the practical advantage of avoiding the list ranking computation and is based on the integer sorting algorithm which can be implemented efficiently on the BSP/CGM model. We implemented the proposed algorithm on a Beowulf cluster and on a grid running the InteGrade middleware. We obtained encouraging albeit modest speedup on a small Beowulf cluster and expect good speedups on the grid for larger size graphs and clusters.
symposium on computer architecture and high performance computing | 2017
Jucele F. A. Vasconcellos; Edson Norberto Cáceres; Henrique Mongelli; Siang W. Song
Computing a minimum spanning tree (MST) of a graph is a fundamental problem in Graph Theory and arises as a subproblem in many applications. In this paper, we propose a parallel MST algorithm and implement it on a GPU (Graphics Processing Unit). One of the steps of previous parallel MST algorithms is a heavy use of parallel list ranking. Besides the fact that list ranking is present in several parallel libraries, it is very time-consuming. Using a different graph decomposition, called strut, we devised a new parallel MST algorithm that does not make use of the list ranking procedure. Based on the BSP/CGM model we proved that our algorithm is correct and it finds the MST after O(log p) iterations (communication and computation rounds). To show that our algorithm has a good performance onreal parallel machines, we have implemented it on GPU. The way that we have designed the parallel algorithm allowed us to exploit the computing power of the GPU. The efficiency of the algorithm was confirmed by our experimental results. The tests performed show that, for randomly constructed graphs, with vertex numbers varying from 10,000 to 30,000 and density between 0.02 and 0.2, the algorithm constructs an MST in a maximum of six iterations. When the graph is not very sparse, our implementation achieved a speedup of more than 50, for some instances as high 296, over a minimum spanning tree sequential algorithm previously proposed in the literature.
Parallel Processing Letters | 2001
Henrique Mongelli; Siang W. Song
Given a text and a pattern, the problem of pattern matching consists of determining all the positions of the text where the pattern occurs. When the text and the pattern are matrices, the matching is termed bidimensional. There are variations of this problem where we allow the matching using a somehow modified pattern. A modification that we will allow is that the pattern can be scaled. We propose a new parallel algorithm for this problem, under the CGM (Coarse Grained Multicomputer) model. This algorithm requires linear local computing time in the input, linear memory and uses only one communication round, during which at most a linear amount of data is exchanged. To be the best of our knowledge, there are no known parallel algorithms for the bidimensional pattern matching problem with scaling in the literature. This proposed algorithm was implemented in C, using the PVM interface and was executed on a Parsytec PowerXplorer parallel machine. The experimental results obtained were very promising and showed significant speedups.
International Journal of High Performance Computing Applications | 2018
Jucele F. A. Vasconcellos; Edson Norberto Cáceres; Henrique Mongelli; Siang W. Song; Frank K. H. A. Dehne; Jayme Luiz Szwarcfiter
Computing a spanning tree (ST) and a minimum ST (MST) of a graph are fundamental problems in graph theory and arise as a subproblem in many applications. In this article, we propose parallel algorithms to these problems. One of the steps of previous parallel MST algorithms relies on the heavy use of parallel list ranking which, though efficient in theory, is very time-consuming in practice. Using a different approach with a graph decomposition, we devised new parallel algorithms that do not make use of the list ranking procedure. We proved that our algorithms are correct, and for a graph G = ( V , E ) , | V | = n , and | E | = m , the algorithms can be executed on a Bulk Synchronous Parallel/Coarse Grained Multicomputer (BSP/CGM) model using O ( log p ) communications rounds with O ( n + m p ) computation time for each round. To show that our algorithms have good performance on real parallel machines, we have implemented them on graphics processing unit. The obtained speedups are competitive and showed that the BSP/CGM model is suitable for designing general purpose parallel algorithms.
Lecture Notes in Computer Science | 2004
Erik J. Hanashiro; Henrique Mongelli; Siang W. Song
In many applications NP-complete problems need to be solved exactly. One promising method to treat some intractable problems is by considering the so-called Parameterized Complexity that divides the problem input into a main part and a parameter. The main part of the input contributes polynomially on the total complexity of the problem, while the parameter is responsible for the combinatorial explosion. We consider the parallel FPT algorithm of Cheetham et al. to solve the k-Vertex Cover problem, using the CGM model. Our contribution is to present a refined and improved implementation. In our parallel experiments, we obtained better results and obtained smaller cover sizes for some input data. The key idea for these results was the choice of good data structures and use of the backtracking technique. We used 5 graphs that represent conflict graphs of amino acids, the same graphs used also by Cheetham et al. in their experiments. For two of these graphs, the times we obtained were approximately 115 times better, for one of them 16 times better, and, for the remaining graphs, the obtained times were slightly better. We must also emphasize that we used a computational environment that is inferior than that used in the experiments of Cheetham et al.. Furthermore, for three graphs, we obtained smaller sizes for the cover.
International Journal of Foundations of Computer Science | 1999
Henrique Mongelli; Siang W. Song
Given an array of n real numbers A=(a0, a1, …, an-1), define MIN(i,j)=min{ai,…,aj}. The range minima problem consists of preprocessing array A such that queries MIN(i,j), for any 0≤i≤n-1 can be answered in constant time. Range minima is a basic problem that appears in many other important graph problems such as lowest common ancestor, Euler tour, etc. In this work we present a parallel algorithm under the CGM model (coarse grained multicomputer), that solves the range minima problem in O(n/p) time and constant number of communication rounds. The communication overhead involves the transmission of p numbers (independent of n). We show promising experimental results with speedup curves approximating the optimal for large n.