Is this you? Create Your Porfile

Andreu Moreno

Autonomous University of Barcelona

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andreu Moreno is active.

Explore More

Publication

Featured researches published by Andreu Moreno.

parallel computing | 2006

Modeling master/worker applications for automatic performance tuning

Eduardo César; Andreu Moreno; Joan Sorribes; Emilio Luque

Parallel application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind of application development cycle. This means that developing applications using predefined programming structures (frameworks/skeletons) should be easier than doing it from scratch. We propose to take advantage of the intrinsic knowledge that these programming structures provide about the application in order to develop a dynamic and automatic tuning tool. We show that using this knowledge the tool could efficiently make better tuning decisions. Specifically, we focus this work on the definition of the performance model associated to applications developed with the Master/Worker framework.

International Journal of Parallel Programming | 2014

Improving Performance on Data-Intensive Applications Using a Load Balancing Methodology Based on Divisible Load Theory

Claudia Rosas; Anna Sikora; Josep Jorba; Andreu Moreno; Eduardo César

Data-intensive applications are those that explore, query, analyze, and, in general, process very large data sets. Generally, these applications can be naturally implemented in parallel but, in many cases, these implementations show severe performance problems mainly due to load imbalances, inefficient use of available resources, and improper data partition policies. It is worth noticing that the problem becomes more complex when the conditions causing these problems change at run time. This paper proposes a methodology for dynamically improving the performance of certain data-intensive applications based on: adapting the size and number of data partitions, and the number of processing nodes, to the current application conditions in homogeneous clusters. To this end, the processing of each exploration is monitored and gathered data is used to dynamically tune the performance of the application. The tuning parameters included in the methodology are: (i) the partition factor of the data set, (ii) the distribution of the data chunks, and (iii) the number of processing nodes to be used. The methodology assumes that a single execution includes multiple related explorations on the same partitioned data set, and that data chunks are ordered according to their processing times during the application execution to assign first the most time consuming partitions. The methodology has been validated using the well-known bioinformatics tool—BLAST—and through extensive experimentation using simulation. Reported results are encouraging in terms of reducing total execution time of the application (up to a 40 % in some cases).

european conference on parallel processing | 2008

Dynamic Pipeline Mapping (DPM)

Andreu Moreno; Eduardo César; Andreu Guevara; Joan Sorribes; Tomàs Margalef; Emilio Luque

Parallel/distributed application development is an extremely difficult task for non-expert programmers, and support tools are therefore needed for all phases of the development cycle of this kind of applications. In particular, dynamic performance tuning tools can take advantage of the knowledge about the applications structure given by a skeleton based programming tool. This study shows the definition of a strategy for dynamically improving the performance of pipeline applications. This strategy, which has been called Dynamic Pipeline Mapping, improves the applications throughput by gathering the pipes fastest stages and replicating its slowest ones. We have evaluated the new algorithm by experimentation and simulation, and results show that DPM leads to significant performance improvements.

parallel computing | 2012

Load balancing in homogeneous pipeline based applications

Andreu Moreno; Eduardo César; Andreu Guevara; Joan Sorribes; Tomàs Margalef

We propose to use knowledge about a parallel applications structure that was acquired with the use of a skeleton based development strategy to dynamically improve its performance. Parallel/distributed programming provides the possibility of solving highly demanding computational problems. However, this type of application requires support tools in all phases of the development cycle because the implementation is extremely difficult, especially for non-expert programmers. This work shows a new strategy for dynamically improving the performance of pipeline applications. We call this approach Dynamic Pipeline Mapping (DPM), and the key idea is to have free computational resources by gathering the pipelines fastest stages and then using these resources to replicate the slowest stages. We present two versions of this strategy, both with complexity O(Nlog(N)) on the number of pipe stages, and we compare them to an optimal mapping algorithm and to the Binary Search Closest (BSC) algorithm [1]. Our results show that the DPM leads to significant performance improvements, increasing the application throughput up to 40% on average.

parallel, distributed and network-based processing | 2010

A Performance Tuning Strategy for Complex Parallel Application

Jose Alexander Guevara; Eduardo César; Joan Sorribes; Andreu Moreno; Tomàs Margalef; Emilio Luque

Defining performance models associated with the application structure has been proven a useful strategy for implementing dynamic tuning tools. However, for extending this strategy to more complex applications (those composed by different structures) it must integrate a policy for the distribution of the resources among the different application components. Consequently, we propose to take advantage of the knowledge of these models and combine them with a resource management policy for obtaining a global model. In this sense, this work constitutes the ongoing effort in the development of performance models for dynamic tuning.

Information Processing Letters | 2009

Task distribution using factoring load balancing in Master--Worker applications

Andreu Moreno; Eduardo César; Joan Sorribes; Tomàs Margalef; Emilio Luque

Load imbalance among workers is one of the main causes of performance shortcomings in Master-Worker applications. We have observed that this problem is very similar to the one of scheduling distributed parallel loops, which has been widely in the literature. Thus, we have adapted one of the most successful algorithms, known as Factoring, to be used for Master-Worker applications. This has leads to a simple an elegant strategy that can be used to obtain an excellent automatic and dynamic load balancing strategy for the workers. Finally, we have assessed the resulting strategy through extensive experimentation and simulation.

Future Generation Computer Systems | 2014

Dynamic tuning of the workload partition factor and the resource utilization in data-intensive applications

Claudia Rosas; Anna Sikora; Josep Jorba; Andreu Moreno; Antonio Espinosa; Eduardo César

Abstract The recent data deluge needing to be processed represents one of the major challenges in the computational field. This fact led to the growth of specially-designed applications known as data-intensive applications. In general, in order to ease the parallel execution of data-intensive applications input data is divided into smaller data chunks that can be processed separately. However, in many cases, these applications show severe performance problems mainly due to the load imbalance, inefficient use of available resources, and improper data partition policies. In addition, the impact of these performance problems can depend on the dynamic behavior of the application. This work proposes a methodology to dynamically improve the performance of data-intensive applications based on: (i) adapting the size and the number of data partitions to reduce the overall execution time; and (ii) adapting the number of processing nodes to achieve an efficient execution. We propose to monitor the application behavior for each exploration (query) and use gathered data to dynamically tune the performance of the application. The methodology assumes that a single execution includes multiple related queries on the same partitioned workload. The adaptation of the workload partition factor is addressed through the definition of the initial size for the data chunks; the modification of the scheduling policy to send first data chunks with large processing times; dividing of the data chunks with the biggest associated computation times; and joining of data chunks with small computation times. The criteria for dividing or gathering chunks are based on the chunks’ associated execution time (average and standard deviation) and the number of processing elements being used. Additionally, the resources utilization is addressed through the dynamic evaluation of the application performance and the estimation and modification of the number of processing nodes that can be efficiently used. We have evaluated our strategy using as cases of study a real and a synthetic data-intensive application. Analytical expressions have been analyzed through simulation. Applying our methodology, we have obtained encouraging results reducing total execution times and efficient use of resources.

high performance computing and communications | 2014

Predicting Performance of Hybrid Master/Worker Applications Using Model-Based Regression Trees

Abel Castellanos; Andreu Moreno; Joan Sorribes; Tomàs Margalef

Nowadays, there are several features related to node architecture, network topology and programming model that significantly affect the performance of applications. Therefore, the task of adjusting the values of parameters of hybrid parallel applications to achieve the best performance requires a high degree of expertise and a huge effort. Determining a performance model that considers all the system and application features is a very complex task that in most cases produces poor results. In order to simplify this goal and improve the results, we introduce a model-based regression tree technique to improve the accuracy of performance prediction for parallel Master/Worker applications on homogeneous multicore systems. The technique has been used to model the iteration time of the general expression for performance prediction. This approach significantly reduces the effort in getting an accurate prediction model, although it requires a relatively large training data set. The proposed model determines the configuration of the appropriate number of workers and threads of the hybrid application to achieve the best possible performance.

high performance computing and communications | 2013

Performance Model for Master/Worker Hybrid Applications on Multicore Clusters

Abel Castellanos; Andreu Moreno; Joan Sorribes; Tomàs Margalef

There are several parallel applications that are implemented using a Master/Worker parallel/distributed programming paradigm. Applications using this predefined programming structure can be easily implemented using message passing programming libraries (MPI). Moreover, the multicore features present nowadays on CPU architecture can be exploited at the node level by applying thread parallelism (OpenMP). In order to exploit the benefits of both two levels of parallelism, the Master/Worker applications can be implemented as hybrid applications. However, reaching the expected performance indexes is not trivial because there are several parameters (number of nodes, number of threads per node, thread affinity and data distribution among all nodes) that must be tuned for each particular application or even during its execution to reach a successful performance. On the other hand, the application workload may change drastically during successive executions, so those parameters need to be modified according to this behavior. Additionally, cache memory architecture in multicore systems directly influences the performance and this behaviour must be deeply analysed. In this paper we present a proposal to model the performance of hybrid Master/Worker applications on multicore systems considering the issues outlined above. In particular, this model determines dynamically at runtime the configuration of the appropriate number of workers and threads of the hybrid application to achieve the best possible performance.

The Journal of Supercomputing | 2017

HeDPM: load balancing of linear pipeline applications on heterogeneous systems

Andreu Moreno; Anna Sikora; Eduardo César; Joan Sorribes; Tomàs Margalef

This work presents a new algorithm, called Heterogeneous Dynamic Pipeline Mapping, that allows for dynamically improving the performance of pipeline applications running on heterogeneous systems. It is aimed at balancing the application load by determining the best replication (of slow stages) and gathering (of fast stages) combination taking into account processors computation and communication capacities. In addition, the algorithm has been designed with the requirement of keeping complexity low to allow its usage in a dynamic tuning tool. For this reason, it uses an analytical performance model of pipeline applications that addresses hardware heterogeneity and which depends on parameters that can be known in advance or measured at run-time. A wide experimentation is presented, including the comparison with the optimal brute force algorithm, a general comparison with the Binary Search Closest algorithm, and an application example with the Ferret pipeline included in the PARSEC benchmark suite. Results, matching those of the best existing algorithms, show significant performance improvements with lower complexity (

Explore More