Oscar G. Plata | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oscar G. Plata is active.

Explore More

Publication

Featured researches published by Oscar G. Plata.

international conference on supercomputing | 2000

A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Eladio Gutiérrez; Oscar G. Plata; Emilio L. Zapata

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further assigned to the cooperating threads of computation. Iterations belonging to the same set are chosen in such a way that update different entries in the reduction array. That is, the loop distribution implies a conflict-free write distribution of the reduction array. The iteration sets are set up by building a loop-index prefetching data structure that allows to reorder properly the loop iterations. The proposed method is general, scalable, and easy to implement on a compiler. In addition it deals in a uniform way with one and multiple subscript arrays. In case of multiple indirection arrays, writes on the reduction array affecting different sets are solved by defining conflict-free supersets. A performance evaluation is presented. From the experimental results and performance analysis, the proposed method appears as a clear alternative to the array expansion and privatized buffer techniques, used on state-of-the-art parallelizing compilers, like Polaris or SUIF. The scalability problem that those techniques exhibit is missing in our method, as the memory overhead presented does not depend on the number of processors.

international conference on parallel architectures and compilation techniques | 2009

Improving Signatures by Locality Exploitation for Transactional Memory

Ricardo Quislant; Eladio Gutiérrez; Oscar G. Plata; Emilio L. Zapata

Writing multithreaded programs is a fairly complex task that poses a major obstacle to exploit multicore processors. Transactional Memory (TM) emerges as an alternative to the conventional multithreaded programming to ease the writing of concurrent programs. Hardware Transactional Memory (HTM) implements most of the required mechanisms of TM at the core level, e.g. conflict detection. Signatures are designed to support the detection of conflicts amongst concurrent transactions, and are usually implemented as per-thread Bloom filters in HTM. Basically, signatures use fixed hardware to summarize an unbounded amount of read and write memory addresses at the cost of false conflicts (detection of non-existing conflicts). In this paper, a novel signature design that exploit locality is proposed to reduce the number of false conflicts. We show how that reduction translates into a performance improvement in the execution of concurrent transactions. Our signatures are based on address mappings of the hash functions that reduce the number of bits inserted in the filter for those addresses nearby located. This is specially favorable for large transactions, that usually exhibit some amount of spatial locality. Furthermore, the implementation do not require extra hardware. Our proposal was experimentally evaluated using the Wisconsin GEMS simulator and all codes from the STAMP benchmark suite. Results show a significant performance improvement in many cases, specially for those codes with long-running, large-data transactions.

The Computer Journal | 2001

A Data-Parallel Formulation for Divide and Conquer Algorithms

Margarita Amor; Francisco Argüello; Juan Torres López; Oscar G. Plata; Emilio L. Zapata

This paper presents a general data-parallel formulation for a class of problems based on the divide and conquer strategy. A combination of three techniques—mapping vectors, index-digit permutations and space-filling curves—are used to reorganize the algorithmic dataflow, providing great flexibility to efficiently exploit data locality and to reduce and optimize communications. In addition, these techniques allow the easy translation of the reorganized dataflows into HPF (High Performance Fortran) constructs. Finally, experimental results on the Cray T3E validate our method.

parallel computing | 2000

Automatic parallelization of irregular applications

Eladio Gutiérrez; Rafael Asenjo; Oscar G. Plata; Emilio L. Zapata

Abstract Parallel computers are present in a variety of fields, having reached a high degree of architectural maturity. However, there is still a lack of convenient software support for implementing efficient parallel applications. This is specially true for the class of irregular applications, whose computational constructs hardly fit current parallel architectures. In fact, contemporary automatic parallelizers produce, in general, poor parallel code from these applications. This paper discusses techniques and methods to help improve the quality of automatic parallel programs. We focus on two issues: parallelism detection and parallelism implementation. The first issue refers to the detection of specific irregular computation constructs or data access patterns. The second issue considers the case that some frequent construct has been detected but has been sub-optimally parallelized. Both issues are dealt with in depth and in the context of sparse computations (for the first issue) and irregular histogram reductions (for the second issue).

Journal of Parallel and Distributed Computing | 1991

Modified Gram-Schmidt QR factorization on hypercube SIMD computers

E.L. Zapata; J. A. Lamas; Francisco F. Rivera; Oscar G. Plata

Abstract QR factorization is a popular calculation method in matrix algebra due to its usefulness in the solution of problems such as estimating least squares and calculating eigenvalues. In this paper, we describe a parallel algorithm for the calculation of the QR factorization on a hypercube architecture of the SIMD type with distributed memory. We have chosen the modified Gram-Schmidt method with pivoting to determine the QR factorization as it is characterized by good numerical stability. As an application of the QR factorization, we analyze the problem of least squares, developing a complementary parallel algorithm for solving it. Both algorithms are general; they are not limited by the size of the problem or the dimension of the hypercube. Finally, we analyze the algorithmic complexities of both parallel algorithms.

parallel computing | 1989

Parallel fuzzy clustering on fixed size hypercube SIMD computers

E.L. Zapata; Francisco F. Rivera; Oscar G. Plata; M. A. Ismail

Abstract This article presents PFCM, a parallel algorithm for fuzzy clustering of large data sets. Being a generalization of FCM, the algorithm enables arbitrary numbers of data points, features and clusters to be handled cost-optimally by hypercube SIMD computers of arbitrary cube dimension, the only limitation being the size of the local memories of the processors. Speedup responds optimally to enlarging the hypercube. PFCM owes its flexibility to the technique employed in its derivation from the sequential fuzzy C-means algorithm FCM: the association of each of the three dimensions of the problem (numbers of data points, features and clusters) with a distinct subset of hypercube dimensions.

european conference on parallel processing | 1999

On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems

Eladio Gutiérrez; Oscar G. Plata; Emilio L. Zapata

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared-memory multiprocessors. The mapping of computations is based on the conflict-free write distribution of the reduction vector across the processors. The proposed method is general, scalable, and easy to implement on a compiler. A performance evaluation and comparison with other existing techniques is presented. From the experimental results, the proposed method is a clear alternative to the array expansion and privatized buffer methods, usual on state-of-the-art parallelizing compilers, like Polaris or SUIF.

international conference on supercomputing | 1994

Combining static and dynamic scheduling on distributed-memory multiprocessors

Oscar G. Plata; Francisco F. Rivera

Loops are a large source of parallelism for many numerical applications. An important issue in the parallel execution of loops is how to schedule them so that the workload is well balanced among the processors. Most existing loop scheduling algorithms were designed for shared-memory multiprocessors, with uniform memory access costs. These approaches are not suitable for distributed-memory multiprocessors where data locality is a major concern and communication costs are high. This paper presents a new scheduling algorithm in which data locality is taken into account. Our approach combines both worlds, static and dynamic scheduling, in a two-level (overlapped) fashion. This way data locality is considered and communication costs are limited. The performance of the new algorithm is evaluated on a CM-5 message-passing distributed-memory multiprocessor.

Journal of Parallel and Distributed Computing | 2005

Parallel techniques in irregular codes: cloth simulation as case of study

Eladio Gutiérrez; Sergio Romero; Luis F. Romero; Oscar G. Plata; Emilio L. Zapata

When parallelizing irregular applications on ccNUMA machines several issues should be taken into account in order to achieve high code performance. These factors include locality exploitation and parallelism, as well as careful use of memory resources (memory overhead). An important number of numerical simulation codes are clear examples of irregular applications. Frequently these kinds of codes include reduction operations in their core, so that an important fraction of the computational time is spent on such operations. Specifically, cloth simulation belongs to this class of applications, being a topic of increasing interest in diverse areas, like in the multimedia industry. Moreover, when real time simulation is the aim, its parallelization becomes an important option. This paper discusses and compares different irregular reduction parallelization techniques on ccNUMA share memory machines. Broadly speaking, we may classify them into two groups: privatization-based and data partitioning-based methods. In this paper we describe a framework, based on data affinity, that permits to develop various algorithms inside the group of the data partitioning-based techniques. All these techniques and approaches are analyzed and adapted to the computational structure of a real, physically based, cloth simulator.

Signal Processing | 1990

Image template matching on hypercube SIMD computers

E.L. Zapata; José Ignacio Benavides; Oscar G. Plata; Francisco F. Rivera; José María Carazo

Abstract We present in this work a parallel algorithm to perform an image template matching (PITM) on SIMD hypercube computers with non-shared local memory. This parallel algorithm is general in the sense that it allows for arbitrary dimensions for the image, the template and the hypercube. The flexibility of the PITM algorithm is rooted in the partition of the dimensions of the hypercube into four subsets, each one associated with one independent loop of the sequential algorithm (template matching in the domain of the time), and in the way the data are distributed in the local memories of the processing elements (consecutive storage for the template and for the matrix of cross-correlation coefficients, and shifted-consecutive for the image). Both the algorithmic complexity and the data redundancy are analyzed.

Explore More