Eladio Gutiérrez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eladio Gutiérrez is active.

Explore More

Publication

Featured researches published by Eladio Gutiérrez.

international conference on supercomputing | 2000

A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Eladio Gutiérrez; Oscar G. Plata; Emilio L. Zapata

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further assigned to the cooperating threads of computation. Iterations belonging to the same set are chosen in such a way that update different entries in the reduction array. That is, the loop distribution implies a conflict-free write distribution of the reduction array. The iteration sets are set up by building a loop-index prefetching data structure that allows to reorder properly the loop iterations. The proposed method is general, scalable, and easy to implement on a compiler. In addition it deals in a uniform way with one and multiple subscript arrays. In case of multiple indirection arrays, writes on the reduction array affecting different sets are solved by defining conflict-free supersets. A performance evaluation is presented. From the experimental results and performance analysis, the proposed method appears as a clear alternative to the array expansion and privatized buffer techniques, used on state-of-the-art parallelizing compilers, like Polaris or SUIF. The scalability problem that those techniques exhibit is missing in our method, as the memory overhead presented does not depend on the number of processors.

Computers in Education | 2010

A new Moodle module supporting automatic verification of VHDL-based assignments

Eladio Gutiérrez; María A. Trenas; Julián Ramos; Francisco Corbera; Sergio Romero

This work describes a new Moodle module developed to give support to the practical content of a basic computer organization course. This module goes beyond the mere hosting of resources and assignments. It makes use of an automatic checking and verification engine that works on the VHDL designs submitted by the students. The module automatically keeps up to date information about their state, and significantly reduces the overload that a continuous assessment demands to the teacher. Additionally, this new module is oriented to promote a collaborative teamwork allowing to define student teams in a more operative way than built-in Moodle groups. The module has been designed according to the Moodle philosophy and its application can be extended to other similar subjects.

Computer Physics Communications | 2010

Quantum computer simulation using the CUDA programming model

Eladio Gutiérrez; Sergio Romero; María A. Trenas; Emilio L. Zapata

Quantum computing emerges as a field that captures a great theoretical interest. Its simulation represents a problem with high memory and computational requirements which makes advisable the use of parallel platforms. In this work we deal with the simulation of an ideal quantum computer on the Compute Unified Device Architecture (CUDA), as such a problem can benefit from the high computational capacities of Graphics Processing Units (GPU). After all, modern GPUs are becoming very powerful computational architectures which is causing a growing interest in their application for general purpose. CUDA provides an execution model oriented towards a more general exploitation of the GPU allowing to use it as a massively parallel SIMT (Single-Instruction Multiple-Thread) multiprocessor. A simulator that takes into account memory reference locality issues is proposed, showing that the challenge of achieving a high performance depends strongly on the explicit exploitation of memory hierarchy. Several strategies have been experimentally evaluated obtaining good performance results in comparison with conventional platforms.

international conference on parallel architectures and compilation techniques | 2009

Improving Signatures by Locality Exploitation for Transactional Memory

Ricardo Quislant; Eladio Gutiérrez; Oscar G. Plata; Emilio L. Zapata

Writing multithreaded programs is a fairly complex task that poses a major obstacle to exploit multicore processors. Transactional Memory (TM) emerges as an alternative to the conventional multithreaded programming to ease the writing of concurrent programs. Hardware Transactional Memory (HTM) implements most of the required mechanisms of TM at the core level, e.g. conflict detection. Signatures are designed to support the detection of conflicts amongst concurrent transactions, and are usually implemented as per-thread Bloom filters in HTM. Basically, signatures use fixed hardware to summarize an unbounded amount of read and write memory addresses at the cost of false conflicts (detection of non-existing conflicts). In this paper, a novel signature design that exploit locality is proposed to reduce the number of false conflicts. We show how that reduction translates into a performance improvement in the execution of concurrent transactions. Our signatures are based on address mappings of the hash functions that reduce the number of bits inserted in the filter for those addresses nearby located. This is specially favorable for large transactions, that usually exhibit some amount of spatial locality. Furthermore, the implementation do not require extra hardware. Our proposal was experimentally evaluated using the Wisconsin GEMS simulator and all codes from the STAMP benchmark suite. Results show a significant performance improvement in many cases, specially for those codes with long-running, large-data transactions.

parallel computing | 2000

Automatic parallelization of irregular applications

Eladio Gutiérrez; Rafael Asenjo; Oscar G. Plata; Emilio L. Zapata

Abstract Parallel computers are present in a variety of fields, having reached a high degree of architectural maturity. However, there is still a lack of convenient software support for implementing efficient parallel applications. This is specially true for the class of irregular applications, whose computational constructs hardly fit current parallel architectures. In fact, contemporary automatic parallelizers produce, in general, poor parallel code from these applications. This paper discusses techniques and methods to help improve the quality of automatic parallel programs. We focus on two issues: parallelism detection and parallelism implementation. The first issue refers to the detection of specific irregular computation constructs or data access patterns. The second issue considers the case that some frequent construct has been detected but has been sub-optimally parallelized. Both issues are dealt with in depth and in the context of sparse computations (for the first issue) and irregular histogram reductions (for the second issue).

Computer Applications in Engineering Education | 2013

E-assessment of Matlab assignments in Moodle: Application to an introductory programming course for engineers

Julián Ramos; María A. Trenas; Eladio Gutiérrez; Sergio Romero

This article introduces a novel extension for Moodle supporting the automatic verification of codes written in Matlab. It has been applied when teaching the basics of imperative programming in a course aimed at chemical engineering students. The extension derives from the module CTPracticals, originally developed by the authors to enable the automatic assessment of VHDL assignments in Moodle. Several major changes have been made, mainly in the automatic verification engine, in the core of the system, and in several user interfaces. The module partially frees teachers from the repetitive task of verifying assignments, allowing them to invest more time assisting students and tackling new pedagogical objectives. An anonymous student survey proved that students are satisfied with the system because they find the feedback and the constantly updated view of the status of their assignments helpful.

IEEE Transactions on Education | 2011

Use of a New Moodle Module for Improving the Teaching of a Basic Course on Computer Architecture

María A. Trenas; Julián Ramos; Eladio Gutiérrez; Sergio Romero; Francisco Corbera

This paper describes how a new Moodle module, called CTPracticals, is applied to the teaching of the practical content of a basic computer organization course. In the core of the module, an automatic verification engine enables it to process the VHDL designs automatically as they are submitted. Moreover, a straightforward modification of this engine would make it possible to extend its application to other programming languages. The module provides students with real-time knowledge of the state of their work by their accessing the result of the automatic assessment or feedback messages. Teachers have a constant global view of the status of their class and have available multiple options such as sending feedback messages to students, obtaining statistics, launching additional verifications in batch, and so on. Likewise, the module substantially improves some organizational aspects, and its design may help teachers to encourage teamwork. Its use partially frees teachers from certain routine work, saving time that can be devoted to teaching objectives and tutoring activities.

high performance computing for computational science (vector and parallel processing) | 2008

Memory Locality Exploitation Strategies for FFT on the CUDA Architecture

Eladio Gutiérrez; Sergio Romero; María A. Trenas; Emilio L. Zapata

Modern graphics processing units (GPU) are becoming more and more suitable for general purpose computing due to its growing computational power. These commodity processors follow, in general, a parallel SIMD execution model whose efficiency is subject to a right exploitation of the explicit memory hierarchy, among other factors. In this paper we analyze the implementation of the Fast Fourier Transform using the programming model of the Compute Unified Device Architecture (CUDA) recently released by NVIDIA for its new graphics platforms. Within this model we propose an FFT implementation that takes into account memory reference locality issues that are crucial in order to achieve a high execution performance. This proposal has been experimentally tested and compared with other well known approaches such as the manufacturers FFT library.

european conference on parallel processing | 1999

On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems

Eladio Gutiérrez; Oscar G. Plata; Emilio L. Zapata

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared-memory multiprocessors. The mapping of computations is based on the conflict-free write distribution of the reduction vector across the processors. The proposed method is general, scalable, and easy to implement on a compiler. A performance evaluation and comparison with other existing techniques is presented. From the experimental results, the proposed method is a clear alternative to the array expansion and privatized buffer methods, usual on state-of-the-art parallelizing compilers, like Polaris or SUIF.

Journal of Parallel and Distributed Computing | 2005

Parallel techniques in irregular codes: cloth simulation as case of study

Eladio Gutiérrez; Sergio Romero; Luis F. Romero; Oscar G. Plata; Emilio L. Zapata

When parallelizing irregular applications on ccNUMA machines several issues should be taken into account in order to achieve high code performance. These factors include locality exploitation and parallelism, as well as careful use of memory resources (memory overhead). An important number of numerical simulation codes are clear examples of irregular applications. Frequently these kinds of codes include reduction operations in their core, so that an important fraction of the computational time is spent on such operations. Specifically, cloth simulation belongs to this class of applications, being a topic of increasing interest in diverse areas, like in the multimedia industry. Moreover, when real time simulation is the aim, its parallelization becomes an important option. This paper discusses and compares different irregular reduction parallelization techniques on ccNUMA share memory machines. Broadly speaking, we may classify them into two groups: privatization-based and data partitioning-based methods. In this paper we describe a framework, based on data affinity, that permits to develop various algorithms inside the group of the data partitioning-based techniques. All these techniques and approaches are analyzed and adapted to the computational structure of a real, physically based, cloth simulator.

Explore More