IEEE Transactions on Cloud Computing | 2019

Faster MapReduce Computation on Clouds Through Better Performance Estimation

 
 

Abstract


Processing Big Data in cloud is on the increase. An important issue for efficient execution of Big Data processing jobs on a cloud platform is selecting the best fitting virtual machine (VM) configuration(s) among the miscellany of choices that cloud providers offer. Wise selection of VM configurations can lead to better performance, cost and energy consumption. Therefore, it is crucial to explore the available configurations and opt for the best ones that well suit each MapReduce application. Profiling the given application on all the configurations is costly, time and energy consuming. An alternative is to run the application on a subset of configurations (sample configurations) and estimate its performance on other configurations based on the obtained values by sample configurations. We show that the choice of these sample configurations highly affects accuracy of later estimations. Our Smart Configuration Selection (SCS) scheme chooses better representatives from among all configurations by once-off analysis of given performance figures of the benchmarks so as to increase the accuracy of estimations of missing values, and consequently, to more accurately choose the configuration providing the highest performance. The results show that the SCS choice of sample configurations is very close to the best choice, and can reduce estimation error to 11.58 percent from the original 19.72 percent of random configuration selection. More importantly, using SCS estimations in a makespan minimization algorithm improves the execution time by up to 36.03 percent compared with random sample selection.

Volume 7
Pages 770-783
DOI 10.1109/TCC.2017.2677906
Language English
Journal IEEE Transactions on Cloud Computing

Full Text