Jesus A. González
University of La Laguna
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jesus A. González.
Parallel Processing Letters | 2003
Antonio J. Dorta; Jesus A. González; Casiano Rodríguez; Francisco de Sande
The skeletal approach to the development of parallel applications has been revealed to be one of the most successful and has been widely explored in the recent years. The goal of this approach is to develop a methodology of parallel programming based on a restricted set of parallel constructs. This paper presents llc, a parallel skeletal language, the theoretical model that gives support to the language and a prototype implementation for its compiler. The language is based on directives, uses a C-like syntax and gives support to the most widely used skeletal constructs. llCoMP is a source to source compiler for the language built on top of MPI. We evaluate the performance of our prototype compiler using four different parallel architectures and three algorithms. We present the results obtained in both shared and distributed memory architectures. Our model guarantees the portability of the language to any platform and its simplicity greatly eases its implementation.
parallel computing | 2004
Vicente Blanco; Jesus A. González; Coromoto León; Casiano Rodríguez; Germán Rodríguez; M. Printista
This work presents a new approach to the relation between theoretical complexity models and performance analysis and tuning. The analysis of an algorithm produces a complexity function that gives an approach to the asymptotic number of operations performed by the algorithm. The time spent on these operations depends on the software-hardware platform being used. Usually such platforms are described, from the performance point of view, through a number of parameters. Those parameters are evaluated by a benchmarking program. Though for a given available platform, the algorithmic constants associated with the complexity formula can be computed using multidimensional linear regression, there is still the problem of predicting the performance when the platform is not available. We introduce the concept of Universal Instruction Class and derive from it a set of equations relating the values of the algorithmic constants with the platform parameters. Due to the hierarchical design of current memory systems, the performance behavior of most algorithms varies in a small number of large regions corresponding to small size, medium size and large size inputs. The constants involved in the complexity formula usually have different values for these regions. Assuming we have a complexity formula for the memory resources, it is possible to find a partition of the input size space and the different values of the algorithmic constants. This way, though the complexity formula is the same, the family of constants provides the adaptability of the formula to the different stationary uses of the memory.
european conference on parallel processing | 2003
Vicente Blanco; Jesus A. González; Coromoto León; Casiano Rodríguez; Germán Rodríguez
This work presents a new approach to the relation between theoretical complexity and performance analysis of parallel programs. The study of the performance is driven by the information produced during the complexity analysis stage and supported at the top level by a complexity analysis oriented language and at the bottom level by a special purpose statistical package.
Lecture Notes in Computer Science | 2003
Luis A. García; Jesus A. González; J. C. González; Coromoto León; Casiano Rodríguez; Germán Rodríguez
This work presents a new approach to the relation between theoretical complexity and performance analysis of MPI programs. The performance analysis is driven by the information produced during the complexity analysis stage and supported at the top level by a complexity analysis oriented language and at the bottom level by a special purpose statistical package.
euromicro workshop on parallel and distributed processing | 2000
Jesus A. González; Coromoto León; Fabiana Piccoli; Marcela Printista; José L. Roda; Casiano Rodríguez; Francisco de Sande
An extension to the Bulk Synchronous Parallel Model (BSP) to allow the use of asynchronous BSP groups of processors is presented. In this model, called Nested BSP, processor groups can be divided and processors in a group synchronize through group dependent collective operations generalizing the concept of barrier synchronization. A classification of problems and algorithms attending to their parallel input-output distribution is provided. For one of these problem classes, the called common-common class, we present a general strategy to derive efficient parallel algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying BSP software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. The expressiveness of the model is exemplified through three divide and conquer programs. The computational results for these programs in six high performance supercomputers show both the accuracy of the model and the optimality of the speedups for the class of problems considered.
european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 1999
Jesus A. González; Casiano Rodríguez; José L. Roda; Daniel González-Morales; Francisco de Sande; Francisco Almeida; Coromoto León
It has been argued that message passing systems based on pairwise, rather than barrier, synchronization suffer from having no simple analytic cost for model prediction. The BSP Without Barriers Model (BSPWB) has been proposed as an alternative to the Bulk Synchronous Parallel (BSP) model for the analysis, design and prediction of asynchronous MPI programs. This work compares the prediction accuracy of the BSP and BSPWB models and the performance of their respective software libraries: Oxford BSPlib and MPI. Three test cases, representing three general problem solving paradigms are considered. These cases cover a wide range of requirements in communication, synchronisation and computation. The results obtained on the CRAY-T3E show not only a better scalability of MPI but that the performance of MPI programs can be predicted with the same exactitude than Oxford BSPlib programs.
european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2001
Jesus A. González; Coromoto León; Casiano Rodríguez; Francisco de Sande
During the last decade, and with the aim of improving performance through the exploitation of parallelism, researchers have introduced, more than once, forall loops with different tastes, syntaxes, semantics and implementations. The High Performance Fortran (HPF) and OpenMP versions are, likely, among the most popular. This paper presents yet another forall loop construct. The One Thread Multiple Processor Model presented here aims for both homogeneous shared and distributed memory computers. It does not only integrates and extends sequential programming but also includes and expands the message passing programming model. The compilation schemes allowand exploit any nested levels of parallelism, taking advantage of situations where there are several small nested loops. Furthermore, the model has an associated complexity model that allows the prediction of the performance of a program.
international conference on parallel processing | 2001
Jesus A. González; Coromoto León; M. Pristinta; José L. Roda; Casiano Rodríguez; J. M. Rodríguez; Francisco de Sande
The accumulated experience indicates that complexity models like LogP or BSP, characterizing the performance of distributed machines through a few parameters, incur in a considerable loss of accuracy. Errors ranges up to 70%. The complexity analysis model presented here still makes use of the BSP concept ofsup erstep, but introduces a few novelties. To cover both oblivious synchronization and group partitioning we have to admit that different processors may finish the same superstep at different times. The other extension recognizes that, even if the numbers ofindividual communication or computation operations in two stages are the same, the actual times for these two stages may differ. These differences are due to the separate nature of the operations or to the particular pattern followed by the messages. A natural proposal is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with the different communications. Unfortunately, to use this approach implies that the parameters evaluation not only depend on the given architecture, but also reflect algorithm characteristics. Such parameter evaluation must be done for every algorithm. This is a heavy task, implying experiment design, timing, statistics and multi-parameter fitting algorithms. Software support is required. We have developed a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.
ieee international conference on high performance computing data and analytics | 2000
Antonio Estévez; F. H. Priano; Melquíades Pérez Pérez; Jesus A. González; Daniel González-Morales; José L. Roda
The use of Java Remote Method Invocation and Java Servlets allow the development of distributed systems. The University of La Laguna is made up of sixty departments on three separate sites. There are some 30,000 people involved in its day-to-day organisation, including students, teaching staff and service and administrative personnel. The geographical distribution of the University is a problem as far as administrative matters are concerned. Our developments in Web technology are giving users access to different services from their place of work or study. While the use of the Java version of CORBA, Java Remote Method Invocation, permits high interactive distributed applications; the Java Servlets provides an easy way to create three tier client/server applications over the Internet. In this work, we present end-user Web applications implemented using both techniques. The evaluation can be extended to any other organisation.
european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2000
Jesus A. González; Coromoto León; Fabiana Piccoli; Marcela Printista; José L. Roda; Casiano Rodríguez; Francisco de Sande
Several generalizations of the flat data parallel model have been proposed. Their aim is to allow the capability of nested parallel invocations, combining the easiness of programming of the data parallel model with the efficiency of the control parallel model. We examine the solutions provided to this issue by two standard parallel programming platforms, OpenMP and MPI. Both their expression capacity and their efficiency are compared on a Sun HPC 3500 and a SGI Origin 2000. The two considered architectures are shared memory and, consequently, more suitable for their exploitation under OpenMP. In spite of this, the results prove that, under the use of the methodology proposed for MPI in this paper, not only the performances of the two platforms are similar but, more remarkably, the effort invested in software development is also the same.