Is this you? Create Your Porfile

Gladys Utrera

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gladys Utrera is active.

Explore More

Publication

Featured researches published by Gladys Utrera.

international conference on parallel architectures and compilation techniques | 2004

Implementing Malleability on MPI Jobs

Gladys Utrera; Julita Corbalan; Jesús Labarta

Parallel jobs are characterized for having processes that communicate and synchronize with each other frequently. A processor allocation strategy widely used in parallel supercomputers is space-sharing, that is assigning a processors partition to each job for its exclusive use. We present a global solution to offer virtual malleability on message-passing parallel jobs, by applying a processor allocation strategy, the Folding by JobType (FJT). This technique is based on folding and moldability concepts and tries to decide the optimal initial number of processes, when to fold jobs and the number of folding times by analyzing the current and past system information. At processor level, we apply co-scheduling. We implement and evaluate the FJT under several workloads with different job sizes, classes and machine utilization. Results show that the FJT adapts easily to load changes, and can obtain better performance than the rest evaluated, on workloads with high coefficient variation and especially with burst arrivals.

international conference on parallel processing | 2012

A job scheduling approach for multi-core clusters based on virtual malleability

Gladys Utrera; Siham Tabik; Julita Corbalan; Jesús Labarta

Many commercial job scheduling strategies in multi processing systems tend to minimize waiting times of short jobs. However, long jobs cannot be left aside as their impact on the performance of the system is also determinant. In this work we propose a job scheduling strategy that maximizes resources utilization and improves the overall performance by allowing jobs to adapt to variations in the load. The experimental evaluations include both simulations and executions of real workloads. The results show that our strategy provides significant improvements over the traditional EASY backfilling policy, especially in medium to high machine loads.

ieee international conference on high performance computing data and analytics | 2005

Dynamic load balancing in MPI jobs

Gladys Utrera; Julita Corbalan; Jesús Labarta

There are at least three dimensions of overhead to be considered by any parallel job scheduling algorithm: load balancing, synchronization, and communication overhead. In this work we first study several heuristics to choose the next to run from a global processes queue. After that we present a mechanism to decide at runtime weather to apply Local process queue per processor or Global processes queue per job, depending on the load balancing degree of the job, without any previous knowledge of it.

The Journal of Supercomputing | 2014

Scheduling parallel jobs on multicore clusters using CPU oversubscription

Gladys Utrera; Julita Corbalan; Jesús Labarta

Job scheduling strategies in multiprocessing systems aim to minimize waiting times of jobs while satisfying user requirements in terms of number of execution units. However, the lack of flexibility in the requests leaves the scheduler a reduced margin of action for scheduling decisions. Many of such decisions consist on just moving ahead some specific jobs in the wait queue. In this work, we propose a job scheduling strategy that improves the overall performance and maximizes resource utilization by allowing jobs to adapt to variations in the load through CPU oversubscription and backfilling. The experimental evaluations include both real executions on multicore clusters and simulations of workload traces from real production systems. The results show that our strategy provides significant improvements over previous proposals like Gang Scheduling with Backfilling, especially in medium to high workloads with strong variations.

european conference on parallel processing | 2004

Scheduling of MPI Applications: Self-co-scheduling

Gladys Utrera; Julita Corbalan; Jesús Labarta

Scheduling parallel jobs has been an active investigation area. The scheduler has to deal with heterogeneous workloads and try to obtain throughputs and response times such that ensures good performance.

parallel, distributed and network-based processing | 2015

Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication

Gladys Utrera; Marisa Gil; Xavier Martorell

HPC applications make intensive use of large sparse matrices with the matrix-vector product representing a significant fraction of the total run-time. These matrices are characterized by non-uniform matrix structures and irregular memory accesses that make it difficult to achieve a good scalability in modern HPC platforms with multi-or many-cores, SIMD and high-speed communication networks. One of the reasons for this drawback in scalability is caused by communication due to imbalance in both message synchronization and size. In this work we analyze such load imbalance in the sparse matrix vector product (SpMV) when running in a multi-node cluster using high-speed interconnection networks. The experimental alternatives to diminish communication load imbalance are evaluated on two programming models MPI+fork-join and MPI+task-based parallelism) using certain optimizations (i.e. computation-communication overlap or parallel send messages). The performance achieved for large matrix sizes can be up to 9%.

international conference on high performance computing and simulation | 2015

In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance

Gladys Utrera; Marisa Gil; Xavier Martorell

Applications for HPC platforms are mainly based on hybrid programming models: MPI for communication and OpenMP for task and fork-join parallelism to exploit shared memory communication inside a node. On the basis of this scheme, much research has been carried out to improve performance. Some examples are: the overlap of communication and computation, or the increase of speedup and bandwidth on new network fabrics (i.e. Infiniband and 10GB or 40GB ethernet). Henceforth, as far as computation and communication are concerned, the HPC platforms will be heterogeneous with high-speed networks. And, in this context, an important issue is to decide how to distribute the workload among all the nodes in order to balance the application execution as well as choosing the most appropriate programming model to exploit parallelism inside the node. In this paper we propose a mechanism to balance dynamically the work distribution among the heterogeneous components of an heterogeneous cluster based on their performance characteristics. For our evaluations we run the miniFE mini-application of the Mantevo suite benchmark, in a heterogeneous Intel MIC cluster. Experimental results show that making an effort to choose the appropriate number of threads can improve performance significantly over choosing the maximum available number of cores in the Intel MIC.

high performance computing systems and applications | 2014

Analyzing the impact of programming models for efficient communication overlap in high-speed networks

Gladys Utrera; Marisa Gil; Xavier Martorell

Exascale applications for civil engineering, simulations and other fields related with current research make intensive use of large sparse matrices. A characteristic of these matrices is the difficulty of balancing communication and computation, so that even when these two phases are overlapped the application does not achieve a good overall scalability, but instead suffers from a loss of performance. Some proposals have been presented in order to diminish this drawback, based on the hybrid use of programming models, using MPI as the communication basis and threads for computation -mainly OpenMP, but also Cilk, CUDA or OpenCL, to adapt to new heterogeneous platforms. In this work, we evaluate the impact of providing task-based parallelism instead of fork-join parallelism. As regards communication, the appearance of faster networks with specific optimizations and internal protocol characteristics makes it appealing to analyze and evaluate the influence of these networks on performance execution. We evaluate our results on two different communication networks: 10Gigabit Ethernet and Infiniband. For our evaluations we run the miniFE miniapplication of the Mantevo suite benchmark, in a homogeneous supercomputer platform based on Intel SandyBridge processors. Experimental results show how the network behavior can affect performance and how it can be managed via task-based models: from a hybrid MPI/OpenMP version that overlaps communication and computation, our task-based proposal MPI/OmpSs obtains up to 60% improvement.

international conference on parallel processing | 2015

Exploring Memory Error Vulnerability for Parallel Programming Models

Isil Oz; Marisa Gil; Gladys Utrera; Xavier Martorell

Transistor size reduction and more aggressive power modes in HPC platforms make chip components more error prone. In this context, HPC applications can have a diverse level of tolerance to memory errors that may change the execution in different ways. As the tolerance to memory errors depends on write frequency and access patterns, different programming models may exhibit a different behavior in the rate of failures and alleviate the performance loss caused by the overhead of fault-tolerance mechanisms. In this paper, we explore how tolerant to memory errors are two main parallel programming models, message-passing and shared memory: we perform a memory vulnerability analysis and also conduct error propagation experiments to observe the effect of memory errors through program flow. Our results show the need for soft error resiliency methods based on memory behavior of programs, and the evaluation of the tradeoffs between performance and reliability.

Concurrency and Computation: Practice and Experience | 2018

Analyzing the impact of communication imbalance in high-speed networks

Gladys Utrera; Marisa Gil; Xavier Martorell

In this work we analyze the communication load imbalance generated by irregular‐data applications running in a multi‐node cluster. Experimental approaches to diminish communication load imbalance are evaluated using a hybrid programming model MPI + OpenMP including certain optimizations such as computation‐communication overlap, issuing communications in parallel, and a new proposal based on message fragmentation to take advantage of the eager‐protocol. Performance results show that overlapped versions can obtain a great benefit of this optimization because it avoids switching to rendezvous protocols. However, non‐overlapped versions showed a better performance than overlapped ones. To evaluate also the impact due to network latency, the work has been tested on two high‐speed interconnection networks: Infiniband and 10 Gigabit Ethernet. In this case, the optimizations in the non‐overlapped miniFE benchmark reached and improved up to 7% on Infiniband and 11% on 10 Gigabit.

Explore More