IEEE Transactions on Cloud Computing | 2019

Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop

 
 
 
 

Abstract


With today s massive jobs spanning thousands of tasks each, cost-optimality has become more important than ever. Modern distributed data processing paradigms can be significantly more sensitive to cost than makespan, especially for long jobs deployed in commercial clouds. This paper posits that minimized dollar costs can not be achieved unless data and tasks are scheduled simultaneously. In this paper, we introduce the problem of cost-efficient co-scheduling for highly data-intensive jobs in cloud, such as MapReduce. We show that while the problem is polynomial in some cases, its general problem is NP-Hard. We propose to tackle the problem by using integer programming techniques coupled with heuristic reduction and optimization to enable a near-realtime solution. AffordHadoop, a pluggable co-scheduler for Hadoop, is implemented as an example of such a co-scheduler. AffordHadoop can save up to 48 percent of the overall dollar costs when compared to existing schedulers and provides significant flexibility in fine-tuning the cost-performance tradeoff.

Volume 7
Pages 719-732
DOI 10.1109/TCC.2017.2702661
Language English
Journal IEEE Transactions on Cloud Computing

Full Text