Is this you? Create Your Porfile

Yolanda Becerra

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yolanda Becerra is active.

Explore More

Publication

Featured researches published by Yolanda Becerra.

network operations and management symposium | 2010

Performance-driven task co-scheduling for MapReduce environments

Jorda Polo; David Carrera; Yolanda Becerra; Malgorzata Steinder; Ian Whalley

MapReduce is a data-driven programming model proposed by Google in 2004 which is especially well suited for distributed data analytics applications. We consider the management of MapReduce applications in an environment where multiple applications share the same physical resources. Such sharing is in line with recent trends in data center management which aim to consolidate workloads in order to achieve cost and energy savings. In a shared environment, it is necessary to predict and manage the performance of workloads given a set of performance goals defined for them. In this paper, we address this problem by introducing a new task scheduler for a MapReduce framework that allows performance-driven management of MapReduce tasks. The proposed task scheduler dynamically predicts the performance of concurrent MapReduce jobs and adjusts the resource allocation for the jobs. It allows applications to meet their performance objectives without over-provisioning of physical resources.

international middleware conference | 2011

Resource-aware adaptive scheduling for mapreduce clusters

Jorda Polo; Claris Castillo; David Carrera; Yolanda Becerra; Ian Whalley; Malgorzata Steinder; Jordi Torres; Eduard Ayguadé

We present a resource-aware scheduling technique for MapReduce multi-job workloads that aims at improving resource utilization across machines while observing completion time goals. Existing MapReduce schedulers define a static number of slots to represent the capacity of a cluster, creating a fixed number of execution slots per machine. This abstraction works for homogeneous workloads, but fails to capture the different resource requirements of individual jobs in multi-user environments. Our technique leverages job profiling information to dynamically adjust the number of slots on each machine, as well as workload placement across them, to maximize the resource utilization of the cluster. In addition, our technique is guided by user-provided completion time goals for each job. Source code of our prototype is available at [1].

IEEE Network | 2013

All-optical packet/circuit switching-based data center network for enhanced scalability, latency, and throughput

Jordi Perelló; Salvatore Spadaro; Sergio Ricciardi; Davide Careglio; Shuping Peng; Reza Nejabati; Georgios Zervas; Dimitra Simeonidou; Alessandro Predieri; Matteo Biancani; Harm J. S. Dorren; S Stefano Di Lucente; Jun Luo; N Nicola Calabretta; Giacomo Bernini; Nicola Ciulli; Jose Carlos Sancho; Steluta Iordache; Montse Farreras; Yolanda Becerra; Chris Liou; Iftekhar Hussain; Yawei Yin; Lei Liu; Roberto Proietti

Applications running inside data centers are enabled through the cooperation of thousands of servers arranged in racks and interconnected together through the data center network. Current DCN architectures based on electronic devices are neither scalable to face the massive growth of DCs, nor flexible enough to efficiently and cost-effectively support highly dynamic application traffic profiles. The FP7 European Project LIGHTNESS foresees extending the capabilities of todays electrical DCNs throPugh the introduction of optical packet switching and optical circuit switching paradigms, realizing together an advanced and highly scalable DCN architecture for ultra-high-bandwidth and low-latency server-to-server interconnection. This article reviews the current DC and high-performance computing (HPC) outlooks, followed by an analysis of the main requirements for future DCs and HPC platforms. As the key contribution of the article, the LIGHTNESS DCN solution is presented, deeply elaborating on the envisioned DCN data plane technologies, as well as on the unified SDN-enabled control plane architectural solution that will empower OPS and OCS transmission technologies with superior flexibility, manageability, and customizability.

grid computing | 2010

Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques

Ramon Bertran; Yolanda Becerra; David Carrera; Vicenç Beltran; Marc Gonzàlez; Xavier Martorell; Jordi Torres; Eduard Ayguadé

Virtualized infrastructure providers demand new methods to increase the accuracy of the accounting models used to charge their customers. Future data centers will be composed of many-core systems that will host a large number of virtual machines (VMs) each. While resource utilization accounting can be achieved with existing system tools, energy accounting is a complex task when per-VM granularity is the goal. In this paper, we propose a methodology that brings new opportunities to energy accounting by adding an unprecedented degree of accuracy on the per-VM measurements. We present a system -which leverages CPU and memory power models based in performance monitoring counters (PMCs)- to perform energy accounting in virtualized systems. The contribution of this paper is twofold. First, we show that PMC-based power modeling methods are still valid on virtualized environments. And second, we introduce a novel methodology for accounting of energy consumption in virtualized systems. In overall, the results for an Intel® Core™ 2 Duo show errors in energy estimations below the 5%. Such approach brings flexibility to the chargeback models used by service and infrastructure providers. For instance, we show that VMs executed during the same amount of time, present more than 20% differences in energy consumption even only taking into account the consumption of the CPU and the memory.

IEEE Transactions on Network and Service Management | 2013

Deadline-Based MapReduce Workload Management

Jorda Polo; Yolanda Becerra; David Carrera; Malgorzata Steinder; Ian Whalley; Jordi Torres; Eduard Ayguadé

This paper presents a scheduling technique for multi-job MapReduce workloads that is able to dynamically build performance models of the executing workloads, and then use these models for scheduling purposes. This ability is leveraged to adaptively manage workload performance while observing and taking advantage of the particulars of the execution environment of modern data analytics applications, such as hardware heterogeneity and distributed storage. The technique targets a highly dynamic environment in which new jobs can be submitted at any time, and in which MapReduce workloads share physical resources with other workloads. Thus the actual amount of resources available for applications can vary over time. Beyond the formulation of the problem and the description of the algorithm and technique, a working prototype (called Adaptive Scheduler) has been implemented. Using the prototype and medium-sized clusters (of the order of tens of nodes), the following aspects have been studied separately: the schedulers ability to meet high-level performance goals guided only by user-defined completion time goals; the schedulers ability to favor data-locality in the scheduling algorithm; and the schedulers ability to deal with hardware heterogeneity, which introduces hardware affinity and relative performance characterization for those applications that can benefit from executing on specialized processors.

international conference on parallel processing | 2009

Speeding Up Distributed MapReduce Applications Using Hardware Accelerators

Yolanda Becerra; Vicenç Beltran; David Carrera; Marc Gonzàlez; Jordi Torres; Eduard Ayguadé

In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, to different system architectures, operating systems and networks. Exploiting the intrinsic multi-level parallelism present in such a complex execution environment has become a challenging task using traditional parallel and distributed programming models. As a result, an increasing need for novel approaches to exploiting parallelism has arisen in these environments. MapReduce is a data-driven programming model originally proposed by Google back in 2004 as a flexible alternative to the existing models, specially devoted to hiding the complexity of both developing and running massively distributed applications in large compute clusters. In some recent works, the MapReduce model has been also used to exploit parallelism in other non-distributed environments, such as multi-cores, heterogeneous processors and GPUs. In this paper we introduce a novel approach for exploiting the heterogeneity of a Cell BE cluster linking an existing MapReduce runtime implementation for distributed clusters and one runtime to exploit the parallelism of the Cell BE nodes. The novel contribution of this work is the design and evaluation of a MapReduce execution environment that effectively exploits the parallelism existing at both the Cell BE cluster level and the heterogeneous processors level.

Nucleic Acids Research | 2016

BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data

Pau Andrio; Cesare Cugnasco; Laia Codó; Yolanda Becerra; Pablo D. Dans; Federica Battistini; Jordi Torres; Ramon Goni; Modesto Orozco; Josep Ll. Gelpí

Molecular dynamics simulation (MD) is, just behind genomics, the bioinformatics tool that generates the largest amounts of data, and that is using the largest amount of CPU time in supercomputing centres. MD trajectories are obtained after months of calculations, analysed in situ, and in practice forgotten. Several projects to generate stable trajectory databases have been developed for proteins, but no equivalence exists in the nucleic acids world. We present here a novel database system to store MD trajectories and analyses of nucleic acids. The initial data set available consists mainly of the benchmark of the new molecular dynamics force-field, parmBSC1. It contains 156 simulations, with over 120 μs of total simulation time. A deposition protocol is available to accept the submission of new trajectory data. The database is based on the combination of two NoSQL engines, Cassandra for storing trajectories and MongoDB to store analysis results and simulation metadata. The analyses available include backbone geometries, helical analysis, NMR observables and a variety of mechanical analyses. Individual trajectories and combined meta-trajectories can be downloaded from the portal. The system is accessible through http://mmb.irbbarcelona.org/BIGNASim/. Supplementary Material is also available on-line at http://mmb.irbbarcelona.org/BIGNASim/SuppMaterial/.

international conference on big data | 2014

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Nicolas Poggi; David Carrera; Aaron Call; Sergio Mendoza; Yolanda Becerra; Jordi Torres; Eduard Ayguadé; Fabrizio Gagliardi; Jesús Labarta; Rob Reinauer; Nikola Vujic; Daron Green; José A. Blakeley

This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results. ALOJA is the latest phase of a long-term collaborative engagement between BSC and Microsoft which, over the past 6 years has explored a range of different aspects of computing systems, software technologies and performance profiling. While during the last 5 years, Hadoop has become the de-facto platform for Big Data deployments, still little is understood of how the different layers of the software and hardware deployment options affects its performance. Early ALOJA results show that Hadoops runtime performance, and therefore its price, are critically affected by relatively simple software and hardware configuration choices e.g., number of mappers, compression, or volume configuration. Project ALOJA presents a vendor-neutral repository featuring over 5000 Hadoop runs, a test bed, and tools to evaluate the cost-effectiveness of different hardware, parameter tuning, and Cloud services for Hadoop. As few organizations have the time or performance profiling expertise, we expect our growing repository will benefit Hadoop customers to meet their Big Data application needs. ALOJA seeks to provide both knowledge and an online service to with which users make better informed configuration choices for their Hadoop compute infrastructure whether this be on-premise or cloud-based. The initial version of ALOJAs Web application and sources are available at http://hadoop.bsc.es.

International Journal of High Performance Computing Applications | 2017

PyCOMPSs: Parallel computational workflows in Python

Enric Tejedor; Yolanda Becerra; Guillem Alomar; Anna Queralt; Rosa M. Badia; Jordi Torres; Toni Cortes; Jesús Labarta

The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important characteristics that favour its adoption. Nevertheless, Python still lacks a solution for easily parallelizing generic scripts on distributed infrastructures, since the current alternatives mostly require the use of APIs for message passing or are restricted to embarrassingly parallel computations. In that sense, this paper presents PyCOMPSs, a framework that facilitates the development of parallel computational workflows in Python. In this approach, the user programs her script in a sequential fashion and decorates the functions to be run as asynchronous parallel tasks. A runtime system is in charge of exploiting the inherent concurrency of the script, detecting the data dependencies between tasks and spawning them to the available resources. Furthermore, we show how this programming model can be built on top of a Big Data storage architecture, where the data stored in the backend is abstracted and accessed from the application in the form of persistent objects.

euromicro workshop on parallel and distributed processing | 1998

Experiences on implementing PARMACS macros to run the SPLASH-2 suite on multiprocessors

Ernest Artiaga; Xavier Martorell; Yolanda Becerra; Nacho Navarro

In order to evaluate the goodness of parallel systems, it is necessary to know how parallel programs behave. The SPLASH-2 applications provide us with a realistic workload for such systems. So, we have made different implementations of the PARMACS macros used by SPLASH-2 applications, based on several execution and synchronization models, from classical Unix processes to multithreaded systems. Results have been tested in two different multiprocessor systems (Digital and Silicon Graphics). As parallel constructs in the SPLASH-2 applications are limited to those provided by PARMACS, we can easily study the overhead introduced by synchronization and parallelism management.

Explore More