Is this you? Create Your Porfile

Julita Corbalan

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julita Corbalan is active.

Explore More

Publication

Featured researches published by Julita Corbalan.

international workshop on openmp | 2008

Evaluation of OpenMP task scheduling strategies

Alejandro Duran; Julita Corbalan; Eduard Ayguadé

OpenMP is in the process of adding a tasking model that allowsthe programmer to specify independent units of work, called tasks,but does not specify how the scheduling of these tasks should be done(although it imposes some restrictions). We have evaluated differentscheduling strategies (schedulers and cut-offs) with several applicationsand we found that work-first schedules seem to have the best performancebut because of the restrictions that OpenMP imposes a breadthfirstscheduler is a better choice to have as a default for an OpenMPruntime.

ieee international conference on high performance computing data and analytics | 2008

An adaptive cut-off for task parallelism

Alejandro Duran; Julita Corbalan; Eduard Ayguadé

In task parallel languages, an important factor for achieving a good performance is the use of a cut-off technique to reduce the number of tasks created. Using a cut-off to avoid an excessive number of tasks helps the runtime system to reduce the total overhead associated with task creation, particularlt if the tasks are fine grain. Unfortunately, the best cut-off technique its usually dependent on the application structure or even the input data of the application. We propose a new cut-off technique that, using information from the application collected at runtime, decides which tasks should be pruned to improve the performance of the application. This technique does not rely on the programmer to determine the cut-off technique that is best suited for the application. We have implemented this cut-off in the context of the new OpenMP tasking model. Our evaluation, with a variety of applications, shows that our adaptive cut-off is able to make good decisions and most of the time matches the optimal cut-off that could be set by hand by a programmer.

international conference on supercomputing | 1999

Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors

Xavier Martorell; Eduard Ayguadé; Nacho Navarro; Julita Corbalan; Marc Gonzàlez; Jesús Labarta

This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization and processor grouping. Two work generation schemes and one join mechanism are designed, implemented, evaluated and compared with the ones used in the IRIX MP library, an efficient implementation which supports a single level of parallelism. Supporting multiple levels of parallelism is a current research goal, both in shared and distributed memory machines. Our proposals include a first work generation scheme (GWD, or global work descriptor) which supports multiple levels of parallelism, but not processor grouping. The second work generation scheme (LWD, or local work descriptor) has been designed to support multiple levels of parallelism and processor grouping. Processor grouping is needed to distribute processors among different parts of the computation and maintain the working set of each processor across different parallel constructs. The mechanisms are evaluated using synthetic benchmarks, two SPEC95fp applications and one NAS application. The performance evaluation concludes that: i) the overhead of the proposed mechanisms is similar to the overhead of the existing ones when exploiting a single level of parallelism, and ii) a remarkable improvement in performance is obtained for applications that have multiple levels of parallelism. The comparison with the traditional single-level parallelism exploitation gives an improvement in the range of 30-65% for these applications.

grid computing | 2005

eNANOS grid resource broker

Ivan Rodero; Julita Corbalan; Rosa M. Badia; Jesús Labarta

Grid computing has been presented as a way of sharing geographically and organizationally distributed resources and of performing successfully distributed computation. To achieve these goals a software layer is necessary to interact with grid environments. Therefore, not only a middleware and its services are needed, but it is also necessary to offer resource management services to hide the underlying complexity of the Grid resources to Grid users. In this paper, we present the design and implementation of an OGSI-compliant Grid resource broker compatible with both GT2 and GT3. It focuses in resource discovery and management, and dynamic policies management for job scheduling and resource selection. The presented resource broker is designed in an extensible and modular way using standard protocols and schemas to become compatible with new middleware versions. We also present experimental results to demonstrate the resource broker behavior.

Journal of Parallel and Distributed Computing | 2012

Understanding the future of energy-performance trade-off via DVFS in HPC environments

Maja Etinski; Julita Corbalan; Jesús Labarta; Mateo Valero

DVFS is a ubiquitous technique for CPU power management in modern computing systems. Reducing processor frequency/voltage leads to a decrease of CPU power consumption and an increase in the execution time. In this paper, we analyze which application/platform characteristics are necessary for a successful energy-performance trade-off of large scale parallel applications. We present a model that gives an upper bound on performance loss due to frequency scaling using the application parallel efficiency. The model was validated with performance measurements of large scale parallel applications. Then we track how application sensitivity to frequency scaling evolved over the last decade for different cluster generations. Finally, we study how cluster power consumption characteristics together with application sensitivity to frequency scaling determine the energy effectiveness of the DVFS technique.

international conference on green computing | 2010

Optimizing job performance under a given power constraint in HPC centers

Maja Etinski; Julita Corbalan; Jesús Labarta; Mateo Valero

Never-ending striving for performance has resulted in a tremendous increase in power consumption of HPC centers. Power budgeting has become very important from several reasons such as reliability, operating costs and limited power draw due to the existing infrastructure. In this paper we propose a power budget guided job scheduling policy that maximize overall job performance for a given power budget. We have shown that using DVFS under a power constraint performance can be significantly improved as it allows more jobs to run simultaneously leading to shorter wait times. Aggressiveness of frequency scaling applied to a job depends on instantaneous power consumption and on the jobs predicted performance. Our policy has been evaluated for four workload traces from systems in production use with up to 4 008 processors. The results show that our policy achieves up to two times better performance compared to power budgeting without DVFS. Moreover it leads to 23% lower CPU energy consumption on average. Furthermore, we have investigated how much job performance and energy efficiency can be improved under our policy and same power budget by an increase in the number of DVFS enabled processors.

international conference on parallel architectures and compilation techniques | 2004

Implementing Malleability on MPI Jobs

Gladys Utrera; Julita Corbalan; Jesús Labarta

Parallel jobs are characterized for having processes that communicate and synchronize with each other frequently. A processor allocation strategy widely used in parallel supercomputers is space-sharing, that is assigning a processors partition to each job for its exclusive use. We present a global solution to offer virtual malleability on message-passing parallel jobs, by applying a processor allocation strategy, the Folding by JobType (FJT). This technique is based on folding and moldability concepts and tries to decide the optimal initial number of processes, when to fold jobs and the number of folding times by analyzing the current and past system information. At processor level, we apply co-scheduling. We implement and evaluate the FJT under several workloads with different job sizes, classes and machine utilization. Results show that the FJT adapts easily to load changes, and can obtain better performance than the rest evaluated, on workloads with high coefficient variation and especially with burst arrivals.

IEEE Transactions on Parallel and Distributed Systems | 2005

Performance-driven processor allocation

Julita Corbalan; Xavier Martorell; Jesús Labarta

In current multiprogrammed multiprocessor systems, to take into account the performance of parallel applications is critical to decide an efficient processor allocation. In this paper, we present the performance-driven processor allocation policy (PDPA). PDPA is a new scheduling policy that implements a processor allocation policy and a multiprogramming-level policy, in a coordinated way, based on the measured application performance. With regard to the processor allocation, PDPA is a dynamic policy that allocates to applications the maximum number of processors to reach a given target efficiency. With regard to the multiprogramming level, PDPA allows the execution of a new application when free processors are available and the allocation of all the running applications is stable, or if some applications show bad performance. Results demonstrate that PDPA automatically adjusts the processor allocation of parallel applications to reach the specified target efficiency, and that it adjusts the multiprogramming level to the workload characteristics. PDPA is able to adjust the processor allocation and the multiprogramming level without human intervention, which is a desirable property for self-configurable systems, resulting in a better individual application response time.

parallel computing | 2012

Parallel job scheduling for power constrained HPC systems

Maja Etinski; Julita Corbalan; Jesús Labarta; Mateo Valero

Power has become the primary constraint in high performance computing. Traditionally, parallel job scheduling policies have been designed to improve certain job performance metrics when scheduling parallel workloads on a system with a given number of processors. The available number of processors is not anymore the only limitation in parallel job scheduling. The recent increase in processor power consumption has resulted in a new limitation: the available power. Given constraints naturally lead to an optimization problem. We proposed MaxJobPerf, a new parallel job scheduling policy based on integer linear programming. Dynamic Voltage Frequency Scaling (DVFS) is a widely used technique that running applications at reduced CPU frequency/voltage trades increased execution time for power reduction. The optimization problem determines which jobs should run and at which frequency. In this paper, we compare the MaxJobPerf policy against other power budgeting policies for different power budgets. It clearly outperforms the other power-budgeting approaches at the parallel job scheduling level. Furthermore, we give a detailed analysis of the policy parameters including a discussion on how to manage job reservations to avoid job starvation.

Archive | 2008

Looking for an Evolution of Grid Scheduling: Meta-Brokering

Ivan Rodero; Francesc Guim; Julita Corbalan; Liana Fong; Yanbin Liu; Seyed Masoud Sadjadi

A Grid Resource Broker for a Grid domain, or also known as meta-scheduler, is a middleware component used for matching works to available Grid resources from one or more IT organizations. A Grid meta-scheduler usually has its own interfaces for the functionalities it provides and its own job scheduling objectives. This situation causes two main problems: the user uniform access to the Grid is lost, and the scheduling decisions are taken separately while they should be done in coordination. These problems have been observed in different efforts such as the HPC-Europa project but they are still open problems. In this paper we discuss the requirements to achieve a more uniform access to the Grids through a new approach to global brokering. As the results of these discussions on brokering requirements, we propose a meta-brokering design, so called metameta-scheduler design, and discuss how it can be realized as a centralized model for the HPC-Europa project, and as a distributed model for the LA Grid project.

Explore More