Renato L. F. Cunha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Renato L. F. Cunha is active.

Explore More

Publication

Featured researches published by Renato L. F. Cunha.

modeling, analysis, and simulation on computer and telecommunication systems | 2014

Evaluating Auto-scaling Strategies for Cloud Computing Environments

Marco Aurelio Stelmar Netto; Carlos Henrique Cardonha; Renato L. F. Cunha; Marcos Dias de Assunção

Auto-scaling is a key feature in clouds responsible for adjusting the number of available resources to meet service demand. Resource pool modifications are necessary to keep performance indicators, such as utilisation level, between user-defined lower and upper bounds. Auto-scaling strategies that are not properly configured according to user workload characteristics may lead to unacceptable QoS and large resource waste. As a consequence, there is a need for a deeper understanding of auto-scaling strategies and how they should be configured to minimise these problems. In this work, we evaluate various auto-scaling strategies using log traces from a production Google data centre cluster comprising millions of jobs. Using utilisation level as performance indicator, our results show that proper management of auto-scaling parameters reduces the difference between the target utilisation interval and the actual values-we define such difference as Auto-scaling Demand Index. We also present a set of lessons from this study to help cloud providers build recommender systems for auto-scaling operations.

Future Generation Computer Systems | 2017

Job placement advisor based on turnaround predictions for HPC hybrid clouds

Renato L. F. Cunha; Eduardo Rocha Rodrigues; Leonardo P. Tizzei; Marco Aurelio Stelmar Netto

Several companies and research institutes are moving their CPU-intensive applications to hybrid High Performance Computing (HPC) cloud environments. Such a shift depends on the creation of software systems that help users decide where a job should be placed considering execution time and queue wait time to access on-premise clusters. Relying blindly on turnaround prediction techniques will affect negatively response times inside HPC cloud environments. This paper introduces a tool to make job placement decisions in HPC hybrid cloud environments taking into account the inaccuracy of execution and waiting time predictions. We used job traces from real supercomputing centers to run our experiments, and compared the performance between environments using real speedup curves. We also extended a state-of-the-art machine learning based predictor to work with data from the cluster scheduler. Our main findings are: (i) depending on workload characteristics, there is a turning point where predictions should be disregarded in favor of a more conservative decision to minimize job turnaround times and (ii) scheduler data plays a key role in improving predictions generated with machine learning using job trace data---our experiments showed around 20% prediction accuracy improvements.

IEEE Computer | 2015

Deciding When and How to Move HPC Jobs to the Cloud

Marco Aurelio Stelmar Netto; Renato L. F. Cunha; Nicole Sultanum

Now used for high-performance computing applications, the cloud presents a challenge for users who must decide, based on efficiency and cost-effectiveness, when and how to run jobs on cloud-based resources versus when to use on-premise clusters. The authors propose a decision-support system to help make these determinations.

international conference on cloud computing | 2014

Exploiting User Patience for Scaling Resource Capacity in Cloud Services

Renato L. F. Cunha; Marcos Dias De Assuncao; Carlos Henrique Cardonha; Marco Aurelio Stelmar Netto

An important feature of cloud computing is its elasticity, that is, the ability to have resource capacity dynamically modified according to the current system load. Auto-scaling is challenging because it must account for two conflicting objectives: minimising system capacity available to users and maximising QoS, which typically translates to short response times. Current auto-scaling techniques are based solely on load forecasts and ignore the perception that users have from cloud services. As a consequence, providers tend to provision a volume of resources that is significantly larger than necessary to keep users satisfied. In this article, we propose a scheduling algorithm and an auto-scaling triggering technique that explore user patience in order to identify critical times when auto-scaling is needed and the appropriate volume of capacity by which the cloud platform should either extend or shrink. The proposed technique assists service providers in reducing costs related to resource allocation while keeping the same QoS to users. Our experiments show that it is possible to reduce resource-hour by up to approximately 8% compared to auto-scaling based on system utilisation.

arXiv: Distributed, Parallel, and Cluster Computing | 2016

Helping HPC users specify job memory requirements via machine learning

Eduardo Rocha Rodrigues; Renato L. F. Cunha; Marco Aurelio Stelmar Netto; Michael J. Spriggs

Resource allocation in High Performance Computing (HPC) settings is still not easy for end-users due to the wide variety of application and environment configuration options. Users have difficulties to estimate the number of processors and amount of memory required by their jobs, select the queue and partition, and estimate when job output will be available to plan for next experiments. Apart from wasting infrastructure resources by making wrong allocation decisions, overall user response time can also be negatively impacted. Techniques that exploit batch scheduler systems to predict waiting time and runtime of user jobs have already been proposed. However, we observed that such techniques are not suitable for predicting job memory usage. In this paper we introduce a tool to help users predict their memory requirements using machine learning. We describe the integration of the tool with a batch scheduler system, discuss how batch scheduler log data can be exploited to generate memory usage predictions through machine learning, and present results of two production systems containing thousands of jobs.

Future Generation Computer Systems | 2018

JobPruner: A machine learning assistant for exploring parameter spaces in HPC applications

Bruno Silva; Marco Aurelio Stelmar Netto; Renato L. F. Cunha

Abstract High Performance Computing (HPC) applications are essential for scientists and engineers to create and understand models and their properties. These professionals depend on the execution of large sets of computational jobs that explore combinations of parameter values. Avoiding the execution of unnecessary jobs brings not only speed to these experiments, but also reductions in infrastructure usage—particularly important due to the shift of these applications to HPC cloud platforms. Our hypothesis is that data generated by these experiments can help users in identifying such jobs. To address this hypothesis we need to understand the similarity levels among multiple experiments necessary for job elimination decisions and the steps required to automate this process. In this paper we present a study and a machine learning-based tool called JobPruner to support parameter exploration in HPC experiments. The tool was evaluated with three real-world use cases from different domains including seismic analysis and agronomy. We observed the tool reduced 93% of jobs in a single experiment, while improving quality in most scenarios. In addition, reduction in job executions was possible even considering past experiments with low correlations.

ACM Computing Surveys | 2018

HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

Marco Aurelio Stelmar Netto; Rodrigo N. Calheiros; Eduardo Rocha Rodrigues; Renato L. F. Cunha; Rajkumar Buyya

High performance computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show that hybrid environments are the natural path to get the best of the on-premise and cloud resources—steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This article brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.

Future Generation Computer Systems | 2016