Francisco Hernández-Rodriguez
Umeå University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Francisco Hernández-Rodriguez.
international conference on software engineering | 2014
Cristian Klein; Martina Maggio; Karl-Erik Årzén; Francisco Hernández-Rodriguez
Self-adaptation is a first class concern for cloud applications, which should be able to withstand diverse runtime changes. Variations are simultaneously happening both at the cloud infrastructure level - for example hardware failures - and at the user workload level - flash crowds. However, robustly withstanding extreme variability, requires costly hardware over-provisioning. In this paper, we introduce a self-adaptation programming paradigm called brownout. Using this paradigm, applications can be designed to robustly withstand unpredictable runtime variations, without over-provisioning. The paradigm is based on optional code that can be dynamically deactivated through decisions based on control theory. We modified two popular web application prototypes - RUBiS and RUBBoS - with less than 170 lines of code, to make them brownout-compliant. Experiments show that brownout self-adaptation dramatically improves the ability to withstand flash-crowds and hardware failures.
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference on | 2013
Mina Sedaghat; Francisco Hernández-Rodriguez; Erik Elmroth
An automated solution to horizontal vs. vertical elasticity problem is central to make cloud autoscalers truly autonomous. Todays cloud autoscalers are typically varying the capacity allocated by increasing and decreasing the number of virtual machines (VMs) of a predefined size (horizontal elasticity), not taking into account that as load varies it may be advantageous not only to vary the number but also the size of VMs (vertical elasticity). We analyze the price/performance effects achieved by different strategies for selecting VM-sizes for handling increasing load and we propose a cost-benefit based approach to determine when to (partly) replace a current set of VMs with a different set. We evaluate our repacking approach in combination with different auto-scaling strategies. Our results show a range of 7% up to 60% cost saving in total resource utilization cost of our sample applications and workloads.
ACM Computing Surveys | 2015
Olumuyiwa Ibidunmoye; Francisco Hernández-Rodriguez; Erik Elmroth
In order to meet stringent performance requirements, system administrators must effectively detect undesirable performance behaviours, identify potential root causes, and take adequate corrective measures. The problem of uncovering and understanding performance anomalies and their causes (bottlenecks) in different system and application domains is well studied. In order to assess progress, research trends, and identify open challenges, we have reviewed major contributions in the area and present our findings in this survey. Our approach provides an overview of anomaly detection and bottleneck identification research as it relates to the performance of computing systems. By identifying fundamental elements of the problem, we are able to categorize existing solutions based on multiple factors such as the detection goals, nature of applications and systems, system observability, and detection methods.
ieee acm international conference utility and cloud computing | 2014
Ewnetu Bayuh Lakew; Cristian Klein; Francisco Hernández-Rodriguez; Erik Elmroth
Resource provisioning in cloud computing is typically coarse-grained. For example, entire CPU cores may be allocated for periods of up to an hour. The Resource-as-a-Service cloud concept has been introduced to improve the efficiency of resource utilization in clouds. In this concept, resources are allocated in terms of CPU core fractions, with granularities of seconds. Such infrastructures could be created using existing technologies such as lightweight virtualization using LXC or by exploiting the Xen hyper visors capacity for vertical elasticity. However, performance models for determining how much capacity to allocate to each application are currently lacking. To address this deficit, we evaluate two performance models for predicting mean response times: the previously proposed queue length model and the novel inverse model. The models are evaluated using 3 applications under both open and closed system models. The inverse model reacted rapidly and remained stable even with targets as low as 0.5 seconds.
symposium on reliable distributed systems | 2014
Cristian Klein; Alessandro Vittorio Papadopoulos; Manfred Dellkrantz; Jonas Dürango; Martina Maggio; Karl-Erik Årzén; Francisco Hernández-Rodriguez; Erik Elmroth
We focus on improving resilience of cloud services (e.g., e-commerce website), when correlated or cascading failures lead to computing capacity shortage. We study how to extend the classical cloud service architecture composed of a load-balancer and replicas with a recently proposed self-adaptive paradigm called brownout. Such services are able to reduce their capacity requirements by degrading user experience (e.g., disabling recommendations). Combining resilience with the brownout paradigm is to date an open practical problem. The issue is to ensure that replica self-adaptivity would not confuse the load-balancing algorithm, overloading replicas that are already struggling with capacity shortage. For example, load-balancing strategies based on response times are not able to decide which replicas should be selected, since the response times are already controlled by the brownout paradigm. In this paper we propose two novel brownout-aware load-balancing algorithms. To test their practical applicability, we extended the popular lighttpd web server and load-balancer, thus obtaining a production-ready implementation. Experimental evaluation shows that the approach enables cloud services to remain responsive despite cascading failures. Moreover, when compared to Shortest Queue First (SQF), believed to be near-optimal in the non-adaptive case, our algorithms improve user experience by 5%, with high statistical significance, while preserving response time predictability.
2014 International Conference on Cloud and Autonomic Computing | 2014
Luis Tomás; Cristian Klein; Johan Tordsson; Francisco Hernández-Rodriguez
Resource overbooking is an admission control technique to increase utilization in cloud environments. However, due to uncertainty about future application workloads, overbooking may result in overload situations and deteriorated performance. We mitigate this using brownout, a feedback approach to application performance steering, that ensures graceful degradation during load spikes and thus avoids overload. Additionally, brownout management information is included into the overbooking system, enabling the development of improved reactive methods to overload situations. Our combined brownout-overbooking approach is evaluated based on real-life interactive workloads and non-interactive batch applications. The results show that our approach achieves an improvement of resource utilization of 11 to 37 percentage points, while keeping response times lower than the set target of 1 second, with negligible application degradation.
2014 International Conference on Cloud and Autonomic Computing | 2014
Mina Sedaghat; Francisco Hernández-Rodriguez; Erik Elmroth
We address the problem of resource management for large scale cloud data centers. We propose a Peer to Peer (P2P) resource management framework, comprised of a number of agents, overlayed as a scale-free network. The structural properties of the overlay, along with dividing the management responsibilities among the agents enables the management framework to be scalable in terms of both the number of physical servers and incoming Virtual Machine (VM) requests, while it is computationally feasible. While our framework is intended for use in different cloud management functionalities, e.g. Admission control or fault tolerance, we focus on the problem of resource allocation in clouds. We evaluate our approach by simulating a data center with 2500 servers, striving to allocate resources to 20000 incoming VM placement requests. The simulation results indicate that by maintaining an efficient request propagation, we can achieve promising levels of performance and scalability when dealing with large number of servers and placement requests.
international performance computing and communications conference | 2012
Ewnetu Bayuh Lakew; Francisco Hernández-Rodriguez; Lei Xu; Erik Elmroth
We present a fully distributed solution for managing resource allocation for services running across multiple clusters in a large-scale cloud computing environment. Our solution allows individual services running across clusters to compete dynamically for allocations based on their rate of consumption while maintaining the global cloud level allocation limits. The solution monitors resource consumption by services that are spread over a number of clusters. Global polls are triggered only when the allocated balance in a cluster decreases below a threshold and allocations are reassigned in a manner that avoids further immediate global polls. Our solution achieves scalability by minimizing global message exchanges, increases performance by distributing requests, and improves availability by avoiding a single point of failure. We perform a range of simulations to verify the accuracy of our approach, to validate our theoretical results, and to evaluate against competing approaches.
conference on decision and control | 2014
Jonas Dürango; Manfred Dellkrantz; Martina Maggio; Cristian Klein; Alessandro Vittorio Papadopoulos; Francisco Hernández-Rodriguez; Erik Elmroth; Karl-Erik Årzén
Cloud applications are often subject to unexpected events like flash crowds and hardware failures. Without a predictable behaviour, users may abandon an unresponsive application. This problem has been partially solved on two separate fronts: first, by adding a self-adaptive feature called brownout inside cloud applications to bound response times by modulating user experience, and, second, by introducing replicas - copies of the applications having the same function-alities - for redundancy and adding a load-balancer to direct incoming traffic.
ieee international conference on cloud computing technology and science | 2014
Mina Sedaghat; Francisco Hernández-Rodriguez; Erik Elmroth; Sarunas Girdzijauskas
Efficient resource utilization is one of the main concerns of cloud providers, as it has a direct impact on energy costs and thus their revenue. Virtual machine (VM) consolidation is one the common techniques, used by infrastructure providers to efficiently utilize their resources. However, when it comes to large-scale infrastructures, consolidation decisions become computationally complex, since VMs are multi-dimensional entities with changing demand and unknown lifetime, and users often overestimate their actual demand. These uncertainties urges the system to take consolidation decisions continuously in a real time manner. In this work, we investigate a decentralized approach for VM consolidation using Peer to Peer (P2P) principles. We investigate the opportunities offered by P2P systems, as scalable and robust management structures, to address VM consolidation concerns. We present a P2P consolidation protocol, considering the dimensionality of resources and dynamicity of the environment. The protocol benefits from concurrency and decentralization of control and it uses a dimension aware decision function for efficient consolidation. We evaluate the protocol through simulation of 100,000 physical machines and 200,000 VM requests. Results demonstrate the potentials and advantages of using a P2P structure to make resource management decisions in large scale data centers. They show that the P2P approach is feasible and scalable and produces resource utilization of 75% when the consolidation aim is 90%.