Is this you? Create Your Porfile

Emilio Luque

Autonomous University of Barcelona

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emilio Luque is active.

Explore More

Publication

Featured researches published by Emilio Luque.

parallel computing | 2006

Modeling master/worker applications for automatic performance tuning

Eduardo César; Andreu Moreno; Joan Sorribes; Emilio Luque

Parallel application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind of application development cycle. This means that developing applications using predefined programming structures (frameworks/skeletons) should be easier than doing it from scratch. We propose to take advantage of the intrinsic knowledge that these programming structures provide about the application in order to develop a dynamic and automatic tuning tool. We show that using this knowledge the tool could efficiently make better tuning decisions. Specifically, we focus this work on the definition of the performance model associated to applications developed with the Master/Worker framework.

Journal of Parallel and Distributed Computing | 2007

Design and implementation of a dynamic tuning environment

Anna Morajko; Tomàs Margalef; Emilio Luque

The main goal of parallel/distributed applications is to solve a considered problem as fast as possible using the minimum amount of system resources. In this context, the application performance becomes a crucial issue and developers of parallel/distributed applications must optimize them to provide high performance computation. Typically, to improve performance, developers analyze the application behavior, search for bottlenecks, determine their causes and change the source code. In this paper, we present the dynamic, automatic tuning approach. This approach aims at automating these tasks and minimizing user intervention. An application is monitored, its performance bottlenecks are detected and it is modified automatically during the execution, without recompiling or re-running it. The modifications introduced adapt the application behavior to the changing conditions. This paper describes design and implementation of the MATE environment (Monitoring, Analysis and Tuning Environment), which we have developed as a step towards dynamically tuning parallel/distributed applications.

Concurrency and Computation: Practice and Experience | 2007

MATE: Monitoring, Analysis and Tuning Environment for parallel/distributed applications

Anna Morajko; Paola Caymes-Scutari; Tomàs Margalef; Emilio Luque

The main goal of parallel/distributed applications is to solve the considered problem as fast as possible using the available resources. In this context, the application performance becomes a crucial issue. Developers of these applications must optimize them if they are to fulfill the promise of high‐performance computation. To improve performance, developers search for bottlenecks by analyzing application behavior, try to identify performance problems, determine their causes and overcome them by changing the source code of the application. Current approaches require developers to do these tasks manually and imply a high degree of expertise. Therefore, another approach is needed to help developers during the optimization process. This paper presents the dynamic tuning approach that addresses these issues. In this approach, many tasks are automated and the user intervention and required experience may be significantly reduced. An application is monitored, its performance bottlenecks are detected and it is modified automatically during execution, without recompiling or re‐running it. The introduced modifications adapt the application behavior to changing conditions. We present an environment called MATE (Monitoring, Analysis and Tuning Environment) that has been developed to provide dynamic tuning of parallel/distributed applications. We also show practical experiments conducted with MATE to prove its effectiveness and profitability. Copyright

international conference on computational science | 2006

Improved prediction methods for wildfires using high performance computing: a comparison

Germán Bianchini; Ana Cortés; Tomàs Margalef; Emilio Luque

Recently, dry and hot seasons have seriously increased the risk of forest fire in the Mediterranean area. Wildland simulators, used to predict fire behavior, can give erroneous forecasts due to lack of precision for certain dynamic input parameters. Developing methods to avoid such parameter problems can improve significantly the fire behavior prediction. In this paper, two methods are evaluated, involving statistical and uncertainty schemes. In each one, the number of simulations that must be carried out is enormous and it is necessary to apply high-performance computing techniques to make the methodology feasible. These techniques have been implemented in parallel schemes and tested in Linux cluster using MPI.

Lecture Notes in Computer Science | 2006

An intelligent management of fault tolerance in cluster using RADICMPI

Angelo Duarte; Dolores Rexachs; Emilio Luque

Independence of special elements, transparency and scalability are very significant features required from the fault tolerance schemes for modern clusters of computers. In order to attend such requirements we developed the RADIC architecture (Redundant Array of Distributed Independent Checkpoints). RADIC is an architecture based on a fully distributed array of processes that collaborate in order to create a distributed fault tolerance controller. This controller works without special, central or stable elements. RADIC implements the fault tolerance activities, transparently to the user application, using a message-log rollback-recovery protocol. Using the RADIC concepts we implemented a prototype, RADICMPI, which contains some standard MPI directives and includes all functionalities of RADIC. We tested RADICMPI in a real environment by injecting failures in nodes of the cluster and monitoring the behavior of the application. Our tests confirmed the correct operation of RADICMPI and the effectiveness of the RADIC mechanism.

international conference on parallel processing | 2006

Tuning application in a multi-cluster environment

Eduardo Argollo; Adriana Gaudiani; Dolores Rexachs; Emilio Luque

The joining of geographically distributed heterogeneous clusters of workstations through the Internet can be a simple and effective approach to speed up a parallel application execution. This paper describes a methodology to migrate a parallel application from a single-cluster to a collection of clusters, guaranteeing a minimum level of efficiency. This methodology is applied to a parallel scientific application to use three geographically scattered clusters located in Argentina, Brazil and Spain. Experimental results prove that the speedup and efficiency estimations provided by this methodology are more than 90% precision. Without the tuning process of the application a 45% of the maximum speedup is obtained whereas a 94% of that maximum speedup is attained when a tuning process is applied. In both cases efficiency is over 90%.

european conference on parallel processing | 2005

Automatic tuning of master/worker applications

Anna Morajko; Eduardo César; Paola Caymes-Scutari; Tomàs Margalef; Joan Sorribes; Emilio Luque

The Master/Worker paradigm is one of the most commonly used by parallel/distributed application developers. This paradigm is easy to understand and is fairly close to the abstract concept of a wide range of applications. However, to obtain adequate performance indexes, such a paradigm must be managed in a very precise way. There are certain features, such as data distribution or the number of workers, that must be tuned properly in order to obtain such performance indexes, and in most cases they cannot be tuned statically since they depend on the particular conditions of each execution. In this context, dynamic tuning seems to be a highly promising approach since it provides the capability to change the parameters during the execution of the application to improve performance. In this paper, we demonstrate the usage of a dynamic tuning environment that allows for adaptation of the number of workers based on a theoretical model of Master/Worker behavior. The results show that such an approach significantly improves the execution time when the application modifies its behavior during execution.

software engineering and advanced applications | 2005

Distributed P2P merging policy to decentralize the multicasting delivery

Xiaoyuan Yang; Porfidio Hernández; Ana Ripoll; Remo Suppi; Emilio Luque; Fernando Cores

Advances in network technology make multicast one of the most feasible video streaming delivery techniques for the near future. However, the scalability of a multicast VoD system is limited by the server bandwidth. In this paper, we propose a new multicast delivery scheme that allows every active client to collaborate with the server in order to scale the VoD system performance beyond the servers physical limitations. The solution combined the multicast delivery scheme and peer-2-peer paradigm in order to decentralize the delivery process. The new video delivery scheme is able to merge two or more multicast channels using distributed collaborations between a group of clients. We compared the new policy with chaining and patching schemes and the experimental results showed that our policy is better than previous schemes in terms of reduction of resource requirements and local network load. Compared with multicast patching policy, the new scheme reduced the resource requirement up to 77.5% while the local network load was 66.9% lower than a peer-2-peer chaining policy.

Journal of Parallel and Distributed Computing | 2010

Scalable dynamic Monitoring, Analysis and Tuning Environment for parallel applications

Paola Caymes-Scutari; Anna Morajko; Tomàs Margalef; Emilio Luque

Parallel/distributed systems are continuously growing. This allows and enables the scalability of the applications, either by considering bigger problems in the same period of time or by solving the problem in a shorter time. In consequence, the methodologies, approaches and tools related to parallel paradigm should be brought up to date to support the increasing requirements of the applications and the users. MATE (Monitoring, Analysis and Tuning Environment) provides automatic and dynamic tuning for parallel/distributed applications. The tuning decisions are made according to performance models, which provide a fast means to decide what to improve in the execution. However, MATE presents some bottlenecks as the application grows, due to the fact that the analysis process is made in a full centralized manner. In this work, we propose a new approach to make MATE scalable. In addition, we present the experimental results and the analysis to validate the proposed approach against the original one.

computing frontiers | 2006

Evaluation of the field-programmable cache: performance and energy consumption

Domingo Benitez; Juan C. Moure; Dolores Rexachs; Emilio Luque

Many authors have proposed power management techniques for general-purpose processors at the cost of degraded performance such as lower IPC or longer delay. Some proposals have focused on cache memories because they consume a significant fraction of total microprocessor power. We propose a reconfigurable and adaptive cache microarchitecture based on field-programmable technology that is intended to deliver high performance at low energy consumption. In this paper, we evaluate the performance and energy consumption of a run-time algorithm when used to manage a field-programmable L1 data cache. The adaptation strategy is based on two techniques: a learning process provides the best cache configuration for each program phase, and a recognition process detects program phase changes by using data working-set signatures to activate a low-overhead reconfiguration mechanism. Our proposals achieve performance improvement and cache energy saving at the same time. Considering a design scenario driven by performance constraints, we show that processor execution time and cache energy consumption can be reduced on average by 15.2% and 9.9% compared to a non-adaptive high-performance microarchitecture. Alternatively, when energy saving is prioritized and considering a non-adaptive energy-efficient microarchitecture as baseline, cache energy and processor execution time are reduced on average by 46.7% and 9.4% respectively. In addition to comparing to conventional microarchitectures, we show that the proposed microarchitecture achieves better performance and more cache energy reduction than other configurable caches.

Explore More