Gerardo Bandera | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gerardo Bandera is active.

Explore More

Publication

Featured researches published by Gerardo Bandera.

international parallel processing symposium | 1999

Sparse matrix block-cyclic redistribution

Gerardo Bandera; Emilio L. Zapata

Run-time support for the CYCLIC(k) redistribution on the SPMD computation model is presently very relevant for the scientific community. This work is focused to the characterization of the sparse matrix redistribution and its associate problematic due to the use of compressed representations. Two main improvements about the buffering and the coordinates calculation modify the original algorithm. Our solutions contain a collecting, a communication and mixing stage with different influence in the execution time depending on the sparsity of the matrix and the number of processors. Experimental results have been carried out on a Cray T3E for real matrices and different redistribution parameters.

application-specific systems, architectures, and processors | 2002

Polynomial evaluation on multimedia processors

Julio Villalba; Gerardo Bandera; Mario A. González; Javier Hormigo; Emilio L. Zapata

In this paper we deal with polynomial evaluation based on new processor architectures for multimedia applications. We introduce some algorithms to take advantage of the new attributes of multimedia processors, such as VLIW (very long instruction word) and SIMD (single instruction multiple data architecture) architectures. Algorithms to support polynomial evaluation based only in addition/shift operations and other different algorithms with MAC (multiply-and-add) instructions are analyzed and tailored to subword parallelism units of the new processors. Both potential instruction-level and machine-level parallelism are fully exploited through concurrent use of all functional units.

international parallel processing symposium | 1997

The sparse cyclic distribution against its dense counterparts

Gerardo Bandera; Manuel Ujaldon; María A. Trenas; Emilio L. Zapata

Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real applications, however, deal with both kinds of data jointly. The paper presents techniques for integrating dense and sparse array accesses in a way that optimizes locality and further allows an efficient loop partitioning within a data-parallel compiler. The approach is evaluated through an experimental survey with several compilers and parallel platforms. The results prove the benefits of the BRS sparse distribution when combined with CYCLIC in mixed algorithms and the poor efficiency achieved by well-known distribution schemes when sparse elements arise in the source code.

international parallel and distributed processing symposium | 2004

Evaluation of elementary functions using multimedia features

Gerardo Bandera; Mario A. González; Julio Villalba; Javier Hormigo; Emilio L. Zapata

Summary form only given. Most current computers include multimedia features. We use these extensions to compute elementary functions based on polynomial approximations. Hence, we present several alternatives taking advantage of the new attributes on multimedia processors, such as VLIW and SIMD architectures. Our algorithms support the polynomial evaluation in two different ways: the first one is only based in addition/shift operations; while the second uses MAC instructions. Both approximations are analyzed and tailored to subword parallelism units of the new processors. Potential instruction-level and machine-level parallelism are fully exploited through concurrent use of all functional units. A combined approximation using MAC units and addition and shifts is also presented as a third approximation. Two new instructions are also presented here to improve the execution of some of our algorithms.

merged international parallel processing symposium and symposium on parallel and distributed processing | 1998

Local enumeration techniques for sparse algorithms

Gerardo Bandera; Pablo P. Trabado; Emilio L. Zapata

Several methods have been proposed in the literature for the local enumeration of dense references for arrays distributed by the CYCLIC(k) data-distribution in High Performance Fortran. These methods deal only with loops without any irregular references. However existing techniques are not enough when the code includes sparse references. In this work, some methods for enumeration of references are proposed and tested for some linear sparse algebra algorithms. We use the BRS(k) distribution for sparse matrices, which is a generalization of the dense CYCLIC(k) distribution. Efficiency evaluation for the proposed methods has been performed on different processors.

international parallel and distributed processing symposium | 2001

Data locality exploitation in algorithms including sparse communications

Gerardo Bandera; Emilio L. Zapata

Complexity in real codes is sometimes due to the utilization of multi-vector data structures, but there are not many compile-time approaches dealing with this problem, Moreover current compilation techniques only analyze single vectors. This paper describes how the performance can be improved if semantical bindings are taken into account during the parallelization. Our approach is a first step to converge from the data-parallel paradigm to the automatic parallelization, by reducing the number of directives on code. We apply a multi-loop analysis and a sparse privatization to replace the owner computes rule. Additionally, our support will be able to parallelize loops with some of levels of indirections on a left-hand side. In this paper, we also present three alternatives to store the sending information, and two algorithms to calculate coordinates from pointers. Both issues have a critical importance when the parallelized algorithm requires a sparse communication.

The Journal of Supercomputing | 2000

Compile and Run-Time Support for the Parallelization of Sparse Matrix Updating Algorithms

Gerardo Bandera; Manuel Ujaldon; Emilio L. Zapata

This work presents a survey of the capabilities that the sparse computation offers for improving performance when parallelized, either automatically or through a data-parallel compiler. The characterization of a sparse code gets more complicated as code length increases: Access patterns change from loop to loop, thus making necessary to redefine the parallelization strategy. While dense computation solely offers the possibility of redistributing data structures, several other factors influence the performance of a code excerpt in the sparse field, like source data representation on file, compressed data storage in memory, the creation of new nonzeroes at run-time (fill-in) or the number of processors available. We analize the alternatives that arise from each issue, providing a guideline for the underlying compilation work and illustrating our techniques with examples on the Cray T3E.

international conference on computational science | 2004

Applying Loop Tiling and Unrolling to a Sparse Kernel Code

Ezequiel Herruzo; Gerardo Bandera; Oscar G. Plata

Code transformations to optimize the performance work well where a very precise data dependence analysis can be done at compile time. However, current compilers usually do not optimize irregular codes, because they contain input dependent and/or dynamic memory access patterns. This paper presents how we can adapt two representative loop transformations, tiling and unrolling, to codes with irregular computations, obtaining a significant performance improvement over the original non-transformed code. Experiments of our proposals are conducted on three different hardware platforms. A very known sparse kernel code is used as an example code to show performance improvements.

european conference on parallel processing | 2000

Improving the Sparse Parallelization Using Semantical Information at Compile-Time

Gerardo Bandera; Emilio L. Zapata

This work presents a novel strategy for the parallelization of applications containing sparse references. Our approach is a first step to converge from the data-parallel to the automatic parallelization by taking into account the semantical relationship of vectors composing a higher-level data structure. Applying a sparse privatization and a multiloops analysis at compile-time we enhance the performance and reduce the number of extra code annotations. The building/updating of a sparse matrix at run-time is also studied in this paper, solving the problem of using pointers and some levels of indirections on the left hand side. The evaluation of the strategy has been performed on a Cray T3E with the matrix transposition algorithm, using different temporary buffers for the sparse communication.

parallel computing | 2005