2019 18th International Symposium on Parallel and Distributed Computing (ISPDC) | 2019
Portfolio Scheduling for Managing Operational and Disaster-Recovery Risks in Virtualized Datacenters Hosting Business-Critical Workloads
Abstract
Cloud datacenters are increasingly hosting business workloads. Such long-running, on-demand workloads raise important challenges in datacenter operation, requiring efficient online scheduling of workloads with unprecedented characteristics under strict service level agreements (SLAs). In this work, we propose an approach to manage the risk of not meeting SLAs. Our approach is based on portfolio scheduling, which is an online scheduling technique that dynamically selects a scheduling algorithm from a set (portfolio), subject to a possibly changing utility function. Ours is the first datacenter-scheduling approach to consider operational and disaster-recovery risks. Using trace-based simulation with traces collected from a commercial multi-datacenter environment, we give evidence that portfolio scheduling is able to mitigate risks significantly better than its constituent scheduling algorithms and better than datacenter engineers.