Is this you? Create Your Porfile

Dazhao Cheng

University of Colorado Colorado Springs

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dazhao Cheng is active.

Explore More

Publication

Featured researches published by Dazhao Cheng.

international middleware conference | 2014

Improving MapReduce performance in heterogeneous environments with adaptive task tuning

Dazhao Cheng; Jia Rao; Yanfei Guo; Xiaobo Zhou

The deployment of MapReduce in datacenters and clouds present several challenges in achieving good job performance. Compared to in-house dedicated clusters, datacenters and clouds often exhibit significant hardware and performance heterogeneity due to continuous server replacement and multi-tenant interferences. As most Mapreduce implementations assume homogeneous clusters, heterogeneity can cause significant load imbalance in task execution, leading to poor performance and low cluster utilizations. Despite existing optimizations on task scheduling and load balancing, MapReduce still performs poorly on heterogeneous clusters. In this paper, we find that the homogeneous configuration of tasks on heterogeneous nodes can be an important source of load imbalance and thus cause poor performance. Tasks should be customized with different settings to match the capabilities of heterogeneous nodes. To this end, we propose an adaptive task tuning approach, Ant, that automatically finds the optimal settings for individual tasks running on different nodes. Ant works best for large jobs with multiple rounds of map task execution. It first configures tasks with randomly selected configurations and gradually improves tasks settings by reproducing the settings from best performing tasks and discarding poor performing configurations. To accelerate task tuning and avoid trapping in local optimum, Ant uses genetic functions during task configuration. Experimental results on a heterogeneous cluster and a virtual cluster with varying hardware capabilities show that Ant improves the average job completion time by 23%, 11%, and 16% compared to stock Hadoop, customized Hadoop with industry recommendations, and a profiling-based configuration approach, respectively.

IEEE Transactions on Parallel and Distributed Systems | 2017

iShuffle: Improving Hadoop Performance with Shuffle-on-Write

Yanfei Guo; Jia Rao; Dazhao Cheng; Xiaobo Zhou

Hadoop is a popular implementation of the MapReduce framework for running data-intensive jobs on clusters of commodity servers. Shuffle, the all-to-all input data fetching phase between the map and reduce phase can significantly affect job performance. However, the shuffle phase and reduce phase are coupled together in Hadoop and the shuffle can only be performed by running the reduce tasks. This leaves the potential parallelism between multiple waves of map and reduce unexploited and resource wastage in multi-tenant Hadoop clusters, which significantly delays the completion of jobs in a multi-tenant Hadoop cluster. More importantly, Hadoop lacks the ability to schedule task efficiently and mitigate the data distribution skew among reduce tasks, which leads to further degradation of job performance. In this work, we propose to decouple shuffle from reduce tasks and convert it into a platform service provided by Hadoop. We present iShuffle, a user-transparent shuffle service that pro-actively pushes map output data to nodes via a novel shuffle-on-write operation and flexibly schedules reduce tasks considering workload balance. Experimental results with representative workloads and Facebook workload trace show that iShuffle reduces job completion time by as much as 29.6 and 34 percent in single-user and multi-user clusters, respectively.

international parallel and distributed processing symposium | 2015

Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters

Dazhao Cheng; Jia Rao; Changjun Jiang; Xiaobo Zhou

As Hadoop is becoming increasingly popular in large-scale data analysis, there is a growing need for providing predictable services to users who have strict requirements on job completion times. While earliest deadline first scheduling (EDF) like algorithms are popular in guaranteeing job deadlines in real-time systems, they are not effective in a dynamic Hadoop environment, i.e., a Hadoop cluster with dynamically available resources. As there is a growing number of Hadoop clusters deployed on hybrid systems, e.g., infrastructure powered by mix of traditional and renewable energy, and cloud platforms hosting heterogeneous workloads, variable resource availability becomes common when running Hadoop jobs. In this paper, we propose, RDS, a Resource and Deadline-aware Hadoop job Scheduler that takes future resource availability into consideration when minimizing job deadline misses. We formulate the job scheduling problem as an online optimization problem and solve it using an efficient receding horizon control algorithm. To aid the control, we design a self-learning model to estimate job completion times and use a simple but effective model to predict future resource availability. We have implemented RDS in the open source Hadoop implementation and performed evaluations with various benchmark workloads. Experimental results show that RDS substantially reduces the penalty of deadline misses by at least 36% and 10% compared with Fair Scheduler and EDF scheduler, respectively.

international parallel and distributed processing symposium | 2014

Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters

Dazhao Cheng; Changjun Jiang; Xiaobo Zhou

While major cloud service operators have taken various initiatives to operate their sustainable data enters with green energy, it is challenging to effectively utilize the green energy since its generation depends on dynamic natural conditions. Fortunately, the geographical distribution of data enters provides an opportunity for optimizing the system performance by distributing cloud workloads. In this paper, we propose a holistic heterogeneity-aware cloud workload placement and migration approach, sCloud, that aims to maximize the system good put in distributed self-sustainable data enters. sCloud adaptively places the transactional workload to distributed data enters, allocates the available resource to heterogeneous workloads in each data enter, and migrates batch jobs across data enters, while taking into account the green power availability and QoS requirements. We formulate the transactional workload placement as a constrained optimization problem that can be solved by nonlinear programming. Then, we propose a batch job migration algorithm to further improve the system good put when the green power supply varies widely at different locations. We have implemented sCloud in a university cloud test bed with real-world weather conditions and workload traces. Experimental results demonstrate sCloud can achieve near-to-optimal system performance while being resilient to dynamic power availability. It outperforms a heterogeneity-oblivious approach by 26% in improving system good put and 29% in reducing QoS violations.

international conference on distributed computing systems | 2015

Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment

Dazhao Cheng; Palden Lama; Changjun Jiang; Xiaobo Zhou

The cost of powering servers, storage platforms and related cooling systems has become a major component of the operational costs in big data deployments. Hence, the design of energy-efficient Hadoop clusters has attracted significant research attentions in recent years. However, existing studies do not consider the impact of the complex interplay between workload and hardware heterogeneity on energy efficiency. In this paper, we find that heterogeneity-oblivious task assignment approaches are detrimental to both performance and energy efficiency of Hadoop clusters. Importantly, we make a counterintuitive observation that even heterogeneity-aware techniques that focus on reducing job completion time do not necessarily guarantee energy efficiency. We propose a heterogeneity-aware task assignment approach, E-Ant, that aims to minimize the overall energy consumption in a heterogeneous Hadoop cluster without sacrificing job performance. It adaptively schedules heterogeneous workloads on energy-efficient machines, without a priori knowledge of the workload properties. Furthermore, it provides the flexibility to trade off energy efficiency and job fairness in a Hadoop cluster. E-Ant employs an ant colony optimization approach that generates task assignment solutions based on the feedback of each tasks energy consumption reported by Hadoop Task Trackers in an agile way. Experimental results on a heterogeneous cluster with varying hardware capabilities show that E-Ant improves the overall energy savings for a synthetic workload from Microsoft by 17% and 12% compared to Fair Scheduler and Tarazu, respectively.

IEEE Transactions on Parallel and Distributed Systems | 2017

Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning

Dazhao Cheng; Jia Rao; Yanfei Guo; Changjun Jiang; Xiaobo Zhou

Datacenter-scale clusters are evolving toward heterogeneous hardware architectures due to continuous server replacement. Meanwhile, datacenters are commonly shared by many users for quite different uses. It often exhibits significant performance heterogeneity due to multi-tenant interferences. The deployment of MapReduce on such heterogeneous clusters presents significant challenges in achieving good application performance compared to in-house dedicated clusters. As most MapReduce implementations are originally designed for homogeneous environments, heterogeneity can cause significant performance deterioration in job execution despite existing optimizations on task scheduling and load balancing. In this paper, we observe that the homogeneous configuration of tasks on heterogeneous nodes can be an important source of load imbalance and thus cause poor performance. Tasks should be customized with different configurations to match the capabilities of heterogeneous nodes. To this end, we propose a self-adaptive task tuning approach, Ant, that automatically searches the optimal configurations for individual tasks running on different nodes. In a heterogeneous cluster, Ant first divides nodes into a number of homogeneous subclusters based on their hardware configurations. It then treats each subcluster as a homogeneous cluster and independently applies the self-tuning algorithm to them. Ant finally configures tasks with randomly selected configurations and gradually improves tasks configurations by reproducing the configurations from best performing tasks and discarding poor performing configurations. To accelerate task tuning and avoid trapping in local optimum, Ant uses genetic algorithm during adaptive task configuration. Experimental results on a heterogeneous physical cluster with varying hardware capabilities show that Ant improves the average job completion time by 31, 20, and 14 percent compared to stock Hadoop (Stock), customized Hadoop with industry recommendations (Heuristic), and a profiling-based configuration approach (Starfish), respectively. Furthermore, we extend Ant to virtual MapReduce clusters in a multi-tenant private cloud. Specifically, Ant characterizes a virtual node based on two measured performance statistics: I/O rate and CPU steal time. It uses k-means clustering algorithm to classify virtual nodes into configuration groups based on the measured dynamic interference. Experimental results on virtual clusters with varying interferences show that Ant improves the average job completion time by 20, 15, and 11 percent compared to Stock, Heuristic and Starfish, respectively.

IEEE Transactions on Computers | 2016

Elastic Power-Aware Resource Provisioning of Heterogeneous Workloads in Self-Sustainable Datacenters

Dazhao Cheng; Jia Rao; Changjun Jiang; Xiaobo Zhou

While major Cloud service operators have taken various initiatives to operate their datacenters with renewable energy partially or completely, it is challenging to effectively utilize the renewable energy since its generation depends on dynamic natural conditions. In this paper, we propose and develop an elastic power-aware resource provisioning approach (ePower) for heterogeneous workloads in self-sustainable datacenters that completely rely on renewable energy. We aim to maximize the system goodput and control the system power consumption with respect to green power supply. ePower takes challenges and advantages of dynamic power supply, heterogeneous workload characteristics and QoS requirements, and automatically optimizes elastic resource allocations to workloads. The core of ePower design is a novel power-aware simulated annealing algorithm with fuzzy performance modeling for the efficient search of an optimal resource allocation. We have implemented ePower in a university cloud testbed hosting Gridmix2 and RUBiS benchmark applications. We utilize real weather data traces to simulate the green power generation and supply in the experiments. Experimental results demonstrate ePower can achieve near-to-optimal system performance while being resilient to dynamic power availability. It outperforms a representative resource provisioning approach for heterogeneous workloads by at least 24% in improving system goodput and 35 percent in reducing QoS violations.

modeling, analysis, and simulation on computer and telecommunication systems | 2013

Self-Tuning Batching with DVFS for Improving Performance and Energy Efficiency in Servers

Dazhao Cheng; Yanfei Guo; Xiaobo Zhou

Performance improvement and energy efficiency are two important goals in provisioning Internet services in data center servers. In this paper, we propose and develop a self-tuning request batching mechanism to simultaneously achieve the two correlated goals. The batching mechanism increases the cache hit rate at the front-tier Web server, which provides the opportunity to improve applications performance and energy efficiency of the server system. The core of the batching mechanism is a novel and practical two-layer control system that adaptively adjusts the batching interval and frequency states of CPUs according to the service level agreement and the workload characteristics. The batching control adopts a self-tuning fuzzy model predictive control approach for application performance improvement. The power control dynamically adjusts the frequency of CPUs with DVFS in response to workload fluctuations for energy efficiency. A coordinator between the two control loops achieves the desired performance and energy efficiency. We implement the mechanism in a test bed and experimental results demonstrate that the new approach significantly improves the applications performance in terms of the system throughput and average response time. The results also illustrate it can reduce the energy consumption of the server system by 13% at the same time.

IEEE Transactions on Computers | 2017

Cross-Platform Resource Scheduling for Spark and MapReduce on YARN

Dazhao Cheng; Xiaobo Zhou; Palden Lama; Jun Wu; Changjun Jiang

While MapReduce is inherently designed for batch and high throughput processing workloads, there is an increasing demand for non-batch processes on big data, e.g., interactive jobs, real-time queries, and stream computations. Emerging Apache Spark fills in this gap, which can run on an established Hadoop cluster and take advantages of existing HDFS. As a result, the deployment model of Spark-on-YARN is widely applied by many industry leaders. However, we identify three key challenges to deploy Spark on YARN, inflexible reservation-based resource management, inter-task dependency blind scheduling, and the locality interference between Spark and MapReduce applications. The three challenges cause inefficient resource utilization and significant performance deterioration. We propose and develop a cross-platform resource scheduling middleware, iKayak, which aims to improve the resource utilization and application performance in multi-tenant Spark-on-YARN clusters. iKayak relies on three key mechanisms: reservation-aware executor placement to avoid long waiting for resource reservation, dependency-aware resource adjustment to exploit under-utilized resource occupied by reduce tasks, and cross-platform locality-aware task assignment to coordinate locality competition between Spark and MapReduce applications. We implement iKayak in YARN. Experimental results on a testbed show that iKayak can achieve 50 percent performance improvement for Spark applications and 19 percent performance improvement for MapReduce applications, compared to two popular Spark-on-YARN deployment models, i.e., YARN-client model and YARN-cluster model.

ACM Transactions on Autonomous and Adaptive Systems | 2015

Self-Tuning Batching with DVFS for Performance Improvement and Energy Efficiency in Internet Servers

Dazhao Cheng; Yanfei Guo; Changjun Jiang; Xiaobo Zhou

Performance improvement and energy efficiency are two important goals in provisioning Internet services in datacenter servers. In this article, we propose and develop a self-tuning request batching mechanism to simultaneously achieve the two correlated goals. The batching mechanism increases the cache hit rate at the front-tier Web server, which provides the opportunity to improve an application’s performance and the energy efficiency of the server system. The core of the batching mechanism is a novel and practical two-layer control system that adaptively adjusts the batching interval and frequency states of CPUs according to the service level agreement and the workload characteristics. The batching control adopts a self-tuning fuzzy model predictive control approach for application performance improvement. The power control dynamically adjusts the frequency of Central Processing Units (CPUs) with Dynamic Voltage and Frequency Scaling (DVFS) in response to workload fluctuations for energy efficiency. A coordinator between the two control loops achieves the desired performance and energy efficiency. We further extend the self-tuning batching with DVFS approach from a single-server system to a multiserver system. It relies on a MIMO expert fuzzy control to adjust the CPU frequencies of multiple servers and coordinate the frequency states of CPUs at different tiers. We implement the mechanism in a test bed. Experimental results demonstrate that the new approach significantly improves the application performance in terms of the system throughput and average response time. At the same time, the results also illustrate the mechanism can reduce the energy consumption of a single-server system by 13% and a multiserver system by 11%, respectively.

Explore More