Dolores Rexachs
Autonomous University of Barcelona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dolores Rexachs.
acs/ieee international conference on computer systems and applications | 2011
Javier Balladini; Remo Suppi; Dolores Rexachs; Emilio Luque
Energy consumption has become one of the greatest challenges in the field of high performance computing (HPC). The energy cost produced by supercomputers during the lifetime of the installation is similar to acquisition. Thus, besides its impact on the environment, energy is a limiting factor for the HPC. Our research aims to reduce the energy consumption of computer systems to run parallel HPC applications. In this article we analyse the possible influence on the energy consumption of parallel programming paradigms of shared memory (OpenMP) and message passing (MPI), and the behaviour of systems at different clock frequencies of CPUs. The results show that the programming model has a major impact on the energy consumption of computer systems. It was found that the impact of reduced clock frequencies on the execution time, energy efficiency, and maximum power consumption depends not only on the type of application but also on its implementation in a specific programming model. We believe that another criteria to consider when choosing a parallel programming model is the impact on energy consumption.
high performance computing and communications | 2010
Alvaro Wong; Dolores Rexachs; Emilio Luque
Predicting performance of parallel applications is becoming increasingly complex and the best performance predictor is the application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of message-passing applications on different systems by extracting a signature which will allow us to predict what system will allow the application to perform best. To achieve this goal, we have developed a method we called Parallel Application Signatures for Performance Prediction (PAS2P) that strives to describe an application based on its behavior. Based on the application’s message-passing activity, we have been able to identify and extract representative phases, with which we created a Parallel Application Signature that has allowed us to predict the application’s performance. We have experimented with different signature-extraction algorithms and found a reduction in the prediction error using different scientific applications on different clusters. We were able to predict execution times with an average accuracy of over 98%.
Lecture Notes in Computer Science | 2006
Angelo Duarte; Dolores Rexachs; Emilio Luque
Independence of special elements, transparency and scalability are very significant features required from the fault tolerance schemes for modern clusters of computers. In order to attend such requirements we developed the RADIC architecture (Redundant Array of Distributed Independent Checkpoints). RADIC is an architecture based on a fully distributed array of processes that collaborate in order to create a distributed fault tolerance controller. This controller works without special, central or stable elements. RADIC implements the fault tolerance activities, transparently to the user application, using a message-log rollback-recovery protocol. Using the RADIC concepts we implemented a prototype, RADICMPI, which contains some standard MPI directives and includes all functionalities of RADIC. We tested RADICMPI in a real environment by injecting failures in nodes of the cluster and monitoring the behavior of the application. Our tests confirmed the correct operation of RADICMPI and the effectiveness of the RADIC mechanism.
international conference on cluster computing | 2009
Alvaro Wong; Dolores Rexachs; Emilio Luque
We seek to achieve characterization or application signature from a parallel application that will allow us, through the execution of this signature, to evaluate its performance in different computers. Sequential applications behavior can be understood by means of tools such as SimPoint. This tool can identify and select significant phases describing the applications behavior. Our proposal is to extend those concepts towards parallel applications, with the goal of modeling and predicting the parallel application. To achieve this, we developed a methodology, enabling us to identify and extract repetitive behavior to create the application signature. We have validated our proposal using scientific applications such as the NAS Parallel Benchmarks, Sweep3D. We could predict the execution time of the entire application.
european conference on parallel processing | 2008
Guna Santos; Angelo Duarte; Dolores Rexachs; Emilio Luque
The current supercomputers are almost achieving the petaflop level. These machines present a high number of interruptions in a relatively short time interval. Fault tolerance and preventive maintenance are key issues in order to enlarge the MTTI (Mean Time To Interrupt). In this paper we present how RADIC, a architecture for fault tolerance, provides different protection levels able to avoid system interruptions and allows the performance of preventive maintenance tasks. Our experiments show the effectiveness of our solution in order to keep a high availability with a large MTTI.
international conference on parallel processing | 2006
Eduardo Argollo; Adriana Gaudiani; Dolores Rexachs; Emilio Luque
The joining of geographically distributed heterogeneous clusters of workstations through the Internet can be a simple and effective approach to speed up a parallel application execution. This paper describes a methodology to migrate a parallel application from a single-cluster to a collection of clusters, guaranteeing a minimum level of efficiency. This methodology is applied to a parallel scientific application to use three geographically scattered clusters located in Argentina, Brazil and Spain. Experimental results prove that the speedup and efficiency estimations provided by this methodology are more than 90% precision. Without the tuning process of the application a 45% of the maximum speedup is obtained whereas a 94% of that maximum speedup is attained when a tuning process is applied. In both cases efficiency is over 90%.
international conference on cluster computing | 2006
Angelo Duarte; Dolores Rexachs; Emilio Luque
The redundant array of distributed independent checkpoints (RADIC) is a fault tolerant architecture based on a fully distributed array of dedicated process. These processes collaborate to create a fault tolerance controller which transparently manages all fault tolerance activities. The architecture is designed as a software layer between the application and the cluster structure and it was developed to attend to the requirements of scalability, user transparency and independency of dedicated/stable cluster resources. RADIC only requires the resources already available in the nodes used by the parallel application and it uses a pessimistic message-log rollback-recovery protocol in order to operate without any global synchronization. Such protocol, together with the independence of central elements, makes RADIC a scalable architecture that works transparently to the user. We tested the functionality and performance of the architecture in a real scenario using a prototype based on the MPI standard (RADICMPI)
ACM Transactions on Computing Education \/ ACM Journal of Educational Resources in Computing | 2002
Juan C. Moure; Dolores Rexachs; Emilio Luque
Modern processors increase their performance with complex microarchitectural mechanisms, which makes them more and more difficult to understand and evaluate. KScalar is a graphical simulation tool that facilitates the study of such processors. It allows students to analyze the performance behavior of a wide range of processor microarchitectures: from a very simple in-order, scalar pipeline, to a detailed out-of-order, superscalar pipeline with non-blocking caches, speculative execution, and complex branch prediction. The simulator interprets executables for the Alpha AXP instruction set: from very short program fragments to large applications. The objects program execution may be simulated in varying levels of detail: either cycle-by-cycle, observing all the pipeline events that determine processor performance, or million cycles at once, taking statistics of the main performance issues.Instructors may use KScalar in several ways. First, it may be used to provide demonstrations in lectures or online learning environments. Second, it allows students to investigate the characteristics of specific processor microarchitectures as practical short assignments associated to a lecture course. Third, students may undertake major projects involving the optimization of real programs at the software-hardware interface, or involving the optimization of a processor microarchitecture for a given application workload.A preliminary version of KScalar has been successfully used in several lecture courses during the last two years in the University Autónoma of Barcelona. It runs on a x86/Linux/KDE system. The graphical interface has been developed using the KDE and QT libraries. The simulator engine running behind the graphical interface is a heavily-modified version of SimpleScalar. KScalar code is available under the terms of the GNU and SimpleScalar General Public License
IEEE Transactions on Parallel and Distributed Systems | 2015
Alvaro Wong; Dolores Rexachs; Emilio Luque
Predicting the performance of parallel scientific applications is becoming increasingly complex. Our goal was to characterize the behavior of message-passing applications on different target machines. To achieve this goal, we developed a method called parallel application signature for performance prediction (PAS2P), which strives to describe an application based on its behavior. Based on the applications message-passing activity, we identified and extracted representative phases, with which we created a parallel application signature that enabled us to predict the applications performance. We experimented with using different scientific applications on different clusters. We were able to predict execution times with an average accuracy greater than 97 percent.
field programmable logic and applications | 1997
Ferran Lisa; Faustino Cuadrado; Dolores Rexachs; Jordi Carrabina
This paper describes a reconfigurable coprocessor based on an Altera CPLD, specifically designed for a real-time computer vision system. An overview of the system is given and the architecture of the coprocessor is described, discussing the utility of its distributed memory organization for image processing applications.