Is this you? Create Your Porfile

Luiz Gustavo Fernandes

Pontifícia Universidade Católica do Rio Grande do Sul

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luiz Gustavo Fernandes is active.

Explore More

Publication

Featured researches published by Luiz Gustavo Fernandes.

symposium on computer architecture and high performance computing | 2009

Memory Affinity for Hierarchical Shared Memory Multiprocessors

Christiane Pousa Ribeiro; Jean-François Méhaut; Alexandre Carissimi; Márcio Castro; Luiz Gustavo Fernandes

Currently, parallel platforms based on large scale hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA) are becoming a trend in scientific High Performance Computing (HPC). Due to their memory access constraints, these platforms require a very careful data distribution. Many solutions were proposed to resolve this issue. However, most of these solutions did not include optimizations for numerical scientific data (array data structures) and portability issues. Besides, these solutions provide a restrict set of memory policies to deal with data placement. In this paper, we describe an user-level interface named Memory Affinity interface (MAi), which allows memory affinity control on Linux based cache-coherent NUMA (ccNUMA) platforms. Its main goals are, fine data control, flexibility and portability. The performance of MAi is evaluated on three ccNUMA platforms using numerical scientific HPC applications, the NAS Parallel Benchmarks and a Geophysics application. The results show important gains (up to 31\%) when compared to Linux default solution.

Electronic Notes in Theoretical Computer Science | 2005

Performance Models For Master/Slave Parallel Programs

Lucas Baldo; Leonardo Brenner; Luiz Gustavo Fernandes; Paulo Fernandes; Afonso Sales

This paper proposes the use of Stochastic Automata Networks (SAN) to develop models that can be efficiently applied to a large class of parallel implementations: master/slave (m/s) programs. We focus our technique in the description of the communication between master and slave nodes considering two standard behaviors: synchronous and asynchronous interactions. Although the SAN models may help the pre-analysis of implementations, the main contribution of this paper is to point out advantages and problems of the proposed modeling technique.

international parallel and distributed processing symposium | 2009

NUMA-ICTM: A parallel version of ICTM exploiting memory placement strategies for NUMA machines

Márcio Castro; Luiz Gustavo Fernandes; Christiane Pousa; Jean-François Méhaut; Marilton Sanchotene de Aguiar

In geophysics, the appropriate subdivision of a region into segments is extremely important. ICTM (Interval Categorizer Tesselation Model) is an application that categorizes geographic regions using information extracted from satellite images. The categorization of large regions is a computational intensive problem, what justifies the proposal and development of parallel solutions in order to improve its applicability. Recent advances in multiprocessor architectures lead to the emergence of NUMA (Non-Uniform Memory Access) machines. In this work, we present NUMA-ICTM: a parallel solution of ICTM for NUMA machines. First, we parallelize ICTM using OpenMP. After, we improve the OpenMP solution using the MAI (Memory Affinity Interface) library, which allows a control of memory allocation in NUMA machines. The results show that the optimization of memory allocation leads to significant performance gains over the pure OpenMP parallel solution.

acm symposium on applied computing | 2006

High performance XSL-FO rendering for variable data printing

Fabio Giannetti; Luiz Gustavo Fernandes; Rogerio Timmers; Thiago Nunes; Mateus Raeder; Márcio Castro

High volume print jobs are getting more common due to the growing demand for personalized documents. In this context, Variable Data Printing (VDP) has become a useful tool for marketers who need to customize messages for each customer in promotion materials or marketing campaigns. VDP allows the creation of documents based on a template with variable and static portions. The rendering engine must be capable of transforming the variable portion into a resulting composed format, or PDL (Page Description Language) such as PDF, PS or SVG. The amount of variable content in a document is dependant on the publication layout. In addition, the features and the amount of the content to be rendered may vary according to the data loaded from the database. Therefore, the rendering process is invoked repeatedly and it can quickly become a bottleneck, especially in a production environment, compromising the entire document generation. In this scenario, high performance techniques appear to be an interesting alternative to increase the rendering phase throughput. This paper introduces a portable and scalable parallel solution for the Apaches rendering tool FOP (Formatting Objects Processor) which is used to render variable content expressed in XSL-FO (eXtensible Stylesheet Language-Formatting Objects). XSL-FO is extracted from a print job expressed in PPML (Personalized Print Markup Language), which is, in turn, obtained by the merging variable data in a template. The VDP Template is expressed using PPML/T (Personalized Print Markup Language Template).

european conference on parallel processing | 2004

Parallel PEPS Tool Performance Analysis Using Stochastic Automata Networks

Lucas Baldo; Luiz Gustavo Fernandes; Paulo Roisenberg; Pedro Velho; Thais Webber

This paper presents a theoretical performance analysis of a parallel implementation of a tool called Performance Evaluation for Parallel Systems (PEPS). This software tool is used to analyze Stochastic Automata Networks (SAN) models. In its sequential version, the execution time becomes impracticable when analyzing large SAN models. A parallel version of PEPS using distributed memory is proposed and modelled with SAN formalism. After, the sequential PEPS itself is applied to predict the performance of this model.

parallel, distributed and network-based processing | 2011

Analysis and Tracing of Applications Based on Software Transactional Memory on Multicore Architectures

Márcio Castro; Kiril Georgiev; Vania Marangozova-Martin; Jean-François Méhaut; Luiz Gustavo Fernandes; Miguel Santana

Transactional Memory (TM) is a new programming paradigm that offers an alternative to traditional lock-based concurrency mechanisms. It offers a higher-level programming interface and promises to greatly simplify the development of correct concurrent applications on multicore architectures. However, simplicity often comes with an important performance deterioration and given the variety of TM implementations it is still a challenge to know what kind of applications can really take advantage of TM. In order to gain some insight on these issues, helping developers to understand and improve the performance of TM applications, we propose a generic approach for collecting and tracing relevant information about transactions. Our solution can be applied to different Software Transactional Memory (STM) libraries and applications as it does not modify neither the target application nor the STM library source codes. We show that the collected information can be helpful in order to comprehend the performance of TM applications.

International Journal of Parallel Programming | 2008

Dense linear system: a parallel self-verified solver

Mariana Luderitz Kolberg; Luiz Gustavo Fernandes; Dalcidio Moraes Claudio

This article presents a parallel self-verified solver for dense linear systems of equations. This kind of solver is commonly used in many different kinds of real applications which deal with large matrices. Nevertheless, two key problems appear to limit the use of linear system solvers to a more extensive range of real applications: solution correctness and high computational cost. In order to solve the first one, verified computing would be an interesting choice. An algorithm that uses this concept is able to find a highly accurate and automatically verified result providing more reliability. However, the performance of these algorithms quickly becomes a drawback. Aiming at a better performance, parallel computing techniques were employed. Two main parts of this method were parallelized: the computation of the approximate inverse of matrix A and the preconditioning step. The results obtained show that these optimizations increase significantly the overall performance.

parallel computing | 2006

Optimizing a parallel self-verified method for solving linear systems

Mariana Luderitz Kolberg; Lucas Baldo; Pedro Velho; Luiz Gustavo Fernandes; Dalcidio Moraes Claudio

Solvers for linear equation systems are commonly used in many different kinds of real applications, which deal with large matrices. Nevertheless, two key problems appear to limit the use of linear system solvers to a more extensive range of real applications: computing power and solution correctness. In a previous work, we proposed a method that employs high performance computing techniques together with verified computing techniques in order to eliminate the problems mentioned above. This paper presents an optimization of a previously proposed parallel self-verified method for solving dense linear systems of equations. Basically, improvements are related to the way communication primitives were employed and to the identification of the points in the algorithm in which mathematical accuracy is needed to achieve reliable results.

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface | 2007

An improved parallel XSL-FO rendering for personalized documents

Luiz Gustavo Fernandes; Thiago Nunes; Mateus Raeder; Fabio Giannetti; Alexis Cabeda; Guilherme Bedin

The use of personalized documents has become a helpful practice on the digital printing area. Automatic procedures to create and transform these documents have become necessary to deal with the growing market demand. Languages such as XSL-FO (eXtensible Stylesheet Language-Formatting Objects) and PPML (Personalized Print Markup Language) have been developed to facilitate the way variable content is inserted within a document. However, these languages have brought together an increasing computational cost of those documents rendering process. Considering that printing shops need to render jobs with thousands of personalized documents within a short period of time, high performance techniques appear as an interesting alternative to improve this rendering process throughput. In this work, we present improvements and new results of a MPI solution previously developed for the FOP (Formatting Objects Processor) tool. FOP is the Apache project rendering tool for personalized documents and its parallel version was optimized in order to allow the computation in parallel of larger input jobs composed of thousands of personalized documents.

parallel, distributed and network-based processing | 2016

Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack

Adriano Vogel; Dalvan Griebler; Carlos A. F. Maron; Claudio Schepke; Luiz Gustavo Fernandes

Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications.

Explore More