Zhuozhao Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhuozhao Li is active.

Explore More

Publication

Featured researches published by Zhuozhao Li.

international conference on parallel processing | 2015

Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements for High Application Performance

Zhuozhao Li; Haiying Shen

Since scale-up machines perform better for jobs with small and median (KB, MB) data sizes while scale-out machines perform better for jobs with large (GB, TB) data size, and a workload usually consists of jobs with different data size levels, we propose building a hybrid Hadoop architecture that includes both scale-up and scale-out machines, which however is not trivial. The first challenge is workload data storage. Thousands of small data size jobs in a workload may overload the limited local disks of scale-up machines. Jobs from scale-up and scale-out machines may both request the same set of data, which leads to data transmission between the machines. The second challenge is to automatically schedule jobs to either scale-up or scale-out cluster to achieve the best performance. We conduct a thorough performance measurement of different applications on scale-up and scale-out clusters, configured with Hadoop Distributed File System (HDFS) and a remote file system (i.e., OFS), respectively. We find that using OFS rather than HDFS can solve the data storage challenge. Also, we identify the factors that determine the performance differences on the scale-up and scale-out clusters and their cross points to make the choice. Accordingly, we design and implement the hybrid scale-up/out Hadoop architecture. Our trace-driven experimental results show that our hybrid architecture outperforms both the traditional Hadoop architecture with HDFS and with OFS in terms of job completion time.

IEEE Transactions on Parallel and Distributed Systems | 2017

An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements

Zhuozhao Li; Haiying Shen; Walter B. Ligon; Jeffrey Denton

Scale-up machines perform better for jobs with small and median (KB, MB) data sizes, while scale-out machines perform better for jobs with large (GB, TB) data size. Since a workload usually consists of jobs with different data size levels, we propose building a hybrid Hadoop architecture that includes both scale-up and scale-out machines, which however is not trivial. The first challenge is workload data storage. Thousands of small data size jobs in a workload may overload the limited local disks of scale-up machines. Jobs from scale-up and scale-out machines may both request the same set of data, which leads to data transmission between the machines. The second challenge is to automatically schedule jobs to either scale-up or scale-out cluster to achieve the best performance. We conduct a thorough performance measurement of different applications on scale-up and scale-out clusters, configured with Hadoop Distributed File System (HDFS) and a remote file system (i.e., OFS), respectively. We find that using OFS rather than HDFS can solve the data storage challenge. Also, we identify the factors that determine the performance differences on the scale-up and scale-out clusters and their cross points to make the choice. Accordingly, we design and implement the hybrid scale-up/out Hadoop architecture. Our trace-driven experimental results show that our hybrid architecture outperforms both the traditional Hadoop architecture with HDFS and with OFS in terms of job completion time, throughput and job failure rate.

ieee international conference on cloud computing technology and science | 2016

Goodbye to Fixed Bandwidth Reservation: Job Scheduling with Elastic Bandwidth Reservation in Clouds

Haiying Shen; Lei Yu; Liuhua Chen; Zhuozhao Li

The shared nature of cloud network infrastructures causes unpredictable network performance, which may degrade the performance of these applications. Recently, several works propose to explicitly reserve the network bandwidth in the cloud with virtual network abstraction models, which pre-specify the network bandwidth between virtual machines (VMs) for a tenant job. However, the pre-specification fails to exploit the elastic feature of the bandwidth resource (i.e., more reserved bandwidth within no-elongation threshold bandwidth leads to shorter job execution time and vice versa) in job scheduling. It is difficult for ordinary tenants (without specialized network knowledge) to estimate the exact needed bandwidth. In this paper, we propose a new cloud job scheduler, in which each tenant only needs to specify job deadline and each jobs reserved bandwidth is elastically determined by leveraging the elastic feature to maximize the total job rewards, which represent the worth of successful completion by deadlines. Finally, the scheduler tries to reduce the execution time of each job. It also jointly considers the computational capacity of VMs and reserved VM bandwidth in job scheduling. Using trace-driven and real cluster experiments, we show the efficiency and effectiveness of our job scheduler in comparison with other scheduling strategies.

international conference on distributed computing systems | 2017

Opportunistic Energy Sharing Between Power Grid and Electric Vehicles: A Game Theory-Based Pricing Policy

Ankur Sarker; Zhuozhao Li; William Kolodzey; Haiying Shen

Electric vehicles (EVs) have great potential to reduce dependency on fossil fuels. The recent surge in the development of online EV (OLEV) will help to address the drawbacks associated with current generation EVs, such as the heavy and expensive batteries. OLEVs are integrated with the smart grid of power infrastructure through a wireless power transfer system (WPT) to increase the driving range of the OLEV. However, the integration of OLEVs with the grid creates a tremendous load for the smart grid. The demand of a power grid changes over time and the price of power is not fixed throughout the day. There should be some congestion avoidance and load balancing policy implications to ensure quality of services for OLEVs. In this paper, first, we conduct an analysis to show the existence of unpredictable power load and congestion because of OLEVs. We use the Simulation for Urban MObility tool and hourly traffic counts of a road section of the New York City to analyze the amount of energy OLEVs can receive at different times of the day. Then, we present a game theory based on a distributed power schedule framework to find the optimal schedule between OLEVs and smart grid. In the proposed framework, OLEVs receive the amount of power charging from the smart grid based on a power payment function which is updated using best response strategy. We prove that the updated power requests converge to the optimal power schedule. In this way, the smart grid maximizes the social welfare of OLEVs, which is defined as mixed consideration of total satisfaction and its power charging cost. Finally, we verify the performance of our proposed pricing policy under different scenarios in a simulation study.

international conference on computer communications and networks | 2016

Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set Based Approach

Zhuozhao Li; Haiying Shen; Kang Chen

We consider learning the underlying graph structure of a network in which infection spreads based on the observations of node infection times. We give an algorithm based on minimal hitting set to learn the exact underlying graph structure and provide sufficient condition on number of cascades required (i.e. sample complexity) for reliable recovery, which is shown to be O(logn), where n is the number of nodes in the graph. We then analytically evaluate performance of minimal hitting set approach in learning the degree distribution and detecting leaf nodes of a graph and provide a sufficient condition for its sample complexity which is shown to be lower than that of learning the whole graph. We also generalize the exact graph estimation problem to the problem of estimating the graph within a certain distortion, measured by edit distance. We show that this edit distance based graph estimator has a lower sample complexity. Our experimental results based on both synthetic network topologies and a real-world network trace show that our algorithm achieves superior performance than a previously proposed algorithm based on maximum likelihood.

international conference on big data | 2016

Comparing application performance on HPC-based Hadoop platforms with local storage and dedicated storage

Zhuozhao Li; Haiying Shen; Jeffrey Denton; Walter B. Ligon

Many high-performance computing (HPC) sites extend their clusters to support Hadoop MapReduce for a variety of applications. However, HPC cluster differs from Hadoop cluster on the configurations of storage resources. In the Hadoop Distributed File System (HDFS), data resides on the compute nodes, while in the HPC cluster, data is stored on separate nodes dedicated to storage. Dedicated storage offloads I/O load from the compute nodes and provides more powerful storage. Local storage provides better locality and avoids contention for shared storage resources. To gain an insight of the two platforms, in this paper, we investigate the performance and resource utilization of different types (i.e., I/O-intensive, data-intensive and CPU-intensive) of applications on the HPC-based Hadoop platforms with local storage and dedicated storage. We find that the I/Ointensive and data-intensive applications with large input file size can benefit more from the dedicated storage, while these applications with small input file size can benefit more from the local storage. CPU-intensive applications with a large number of small-size input files benefit more from the local storage, while these applications with large-size input files benefit approximately equally from the two platforms. We verify our findings by trace-driven experiments on different types of jobs from the Facebook synthesized trace. This work provides guidance on choosing the best platform to optimize the performance of different types of applications and reduce system overhead.

international conference on cloud computing | 2016

Performance Measurement on Scale-Up and Scale-Out Hadoop with Remote and Local File Systems

Zhuozhao Li; Haiying Shen

MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given application, which is the best running platform among the different combinations of scale-up and scale-out Hadoop with remote and local file systems. However, there has been no previous research on how different types of applications (e.g., CPU-intensive, data-intensive) with different characteristics (e.g., input data size) can benefit from the different platforms. Thus, in this paper, we conduct a comprehensive performance measurement of different applications on scale-up and scaleout clusters configured with HDFS and a remote file system (i.e., OFS), respectively. We identify and study how different job characteristics (e.g., input data size, the number of file reads/writes, and the amount of computations) affect the performance of different applications on the different platforms. This study is expected to provide a guidance for users to choose the best platform to run different applications with different characteristics in the environment that provides both remote and local storage, such as HPC cluster.

IEEE Transactions on Parallel and Distributed Systems | 2017

Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform

Zhuozhao Li; Haiying Shen

MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given application, which is the best running platform among the different combinations of scale-up and scale-out Hadoop with remote and local file systems. However, there has been no previous research on how different types of applications (e.g., CPU-intensive, data-intensive) with different characteristics (e.g., input data size) can benefit from the different platforms. Thus, in this paper, we conduct a comprehensive performance measurement of different applications on scale-up and scale-out clusters configured with HDFS and a remote file system (i.e., OFS), respectively. We identify and study how different job characteristics (e.g., input data size, the number of file reads/writes, and the amount of computations) affect the performance of different applications on the different platforms. Based on the measurement results, we also propose a performance prediction model to help users select the best platforms that lead to the minimum latency. Our evaluation using a Facebook workload trace demonstrates the effectiveness of our prediction model. This study is expected to provide a guidance for users to choose the best platform to run different applications with different characteristics in the environment that provides both remote and local storage, such as HPC cluster and cloud environment.

international conference on cloud computing | 2016

On-Demand Bandwidth Pricing for Congestion Control in Core Switches in Cloud Networks

Abouzar Ghavami; Zhuozhao Li; Haiying Shen

The cloud networks use switches to transfer inbound and outbound traffic through the data centers. Access of multiple tenants to the limited bandwidth capacity over the network switches increases the data traffic congestion in the network. The highly congested switches are vulnerable to get overloaded, and consequently slow down the flow of data traffic in the network. This paper proposes a nonlinear pricing policy for on-demand bandwidth allocation that jointly maximizes the total satisfaction of tenants and minimizes the congestion in the core switches. The optimal schedule is found through the best response strategy, in which each tenant updates its bandwidth allocation at each step based on the updated load-dependent predetermined nonlinear bandwidth pricing functions. The updated bandwidth allocations converge to the optimal bandwidth schedule that balances the load over the core switches. The performance of proposed pricing policy is evaluated under different scenarios.

symposium on cloud computing | 2017

Job scheduling for data-parallel frameworks with hybrid electrical/optical datacenter networks

Zhuozhao Li; Haiying Shen

In spite of many advantages of hybrid electrical/optical datacenter networks (Hybrid-DCN), current job schedulers for data-parallel frameworks are not suitable for Hybrid-DCN, since the schedulers do not aggregate data traffic to facilitate using optical circuit switch (OCS). We propose SchedOCS, a job scheduler for data-parallel frameworks in Hybrid-DCN that aims to take full advantage of the OCS to improve the job performance.

Explore More