Zhongzhi Luan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhongzhi Luan is active.

Explore More

Publication

Featured researches published by Zhongzhi Luan.

grid computing | 2012

MapReduce Workload Modeling with Statistical Approach

Hailong Yang; Zhongzhi Luan; Wenjun Li; Depei Qian

Large-scale data-intensive cloud computing with the MapReduce framework is becoming pervasive for the core business of many academic, government, and industrial organizations. Hadoop, a state-of-the-art open source project, is by far the most successful realization of MapReduce framework. While MapReduce is easy- to-use, efficient and reliable for data-intensive computations, the excessive configuration parameters in Hadoop impose unexpected challenges on running various workloads with a Hadoop cluster effectively. Consequently, developers who have less experience with the Hadoop configuration system may devote a significant effort to write an application with poor performance, either because they have no idea how these configurations would influence the performance, or because they are not even aware that these configurations exist. There is a pressing need for comprehensive analysis and performance modeling to ease MapReduce application development and guide performance optimization under different Hadoop configurations. In this paper, we propose a statistical analysis approach to identify the relationships among workload characteristics, Hadoop configurations and workload performance. We apply principal component analysis and cluster analysis to 45 different metrics, which derive relationships between workload characteristics and corresponding performance under different Hadoop configurations. Regression models are also constructed that attempt to predict the performance of various workloads under different Hadoop configurations. Several non-intuitive relationships between workload characteristics and performance are revealed through our analysis and the experimental results demonstrate that our regression models accurately predict the performance of MapReduce workloads under different Hadoop configurations.

international conference on cluster computing | 2012

ERMS: An Elastic Replication Management System for HDFS

Zhendong Cheng; Zhongzhi Luan; You Meng; Yijing Xu; Depei Qian; Alain Roy; Ning Zhang; Gang Guan

The Hadoop Distributed File System (HDFS) is a distributed storage system that stores large-scale data sets reliably and streams those data sets to applications at high bandwidth. HDFS provides high performance, reliability and availability by replicating data, typically three copies of every data. The data in HDFS changes in popularity over time. To get better performance and higher disk utilization, the replication policy of HDFS should be elastic and adapt to data popularity. In this paper, we describe ERMS, an elastic replication management system for HDFS. ERMS provides an active/standby storage model for HDFS. It utilizes a complex event processing engine to distinguish real-time data types, and then dynamically increases extra replicas for hot data, cleans up these extra replicas when the data cool down, and uses erasure codes for cold data. ERMS also introduces a replica placement strategy for the extra replicas of hot data and erasure coding parities. The experiments show that ERMS effectively improves the reliability and performance of HDFS and reduce storage overhead.

conference on decision and control | 2011

Virtual machine mapping policy based on load balancing in private cloud environment

Junjie Ni; Yuanqiang Huang; Zhongzhi Luan; Juncheng Zhang; Depei Qian

The virtual machine allocation problem is the key to build a private cloud environment. This paper presents a virtual machine mapping policy based on multi-resource load balancing. It uses the resource consumption of the running virtual machine and the self-adaptive weighted approach, which resolves the load balancing conflicts of each independent resource caused by different demand for resources of cloud applications. Meanwhile, it uses probability approach to ease the problem of load crowding in the concurrent users scene. The experiments and comparative analysis show that this policy achieves the better effect than existing approach.

Future Generation Computer Systems | 2014

iMeter: An integrated VM power model based on performance profiling

Hailong Yang; Qi Zhao; Zhongzhi Luan; Depei Qian

Abstract The unprecedented burst in power consumption encountered by contemporary datacenters continually boosts the development of energy efficient techniques from both hardware and software perspectives to alleviate the energy problem. The most widely adopted power saving solutions in datacenters that deliver cloud computing services are power capping and VM consolidation. However, without the capability to track the VM power usage precisely, the combined effect of the above two techniques could cause severe performance degradation to the consolidated VMs, thus violating the user service level agreements. In this paper, we propose an integrated VM power model called iMeter, which overcomes the drawbacks of overpresumption and overapproximation in segregated power models used in previous studies. We leverage the kernel-based performance counters that provide accurate performance statistics as well as high portability across heterogeneous platforms to build the VM power model. Principal component analysis is applied to identify performance counters that show strong impact on the VM power consumption with mathematical confidence. We also present a brief interpretation of the first four selected principal components on their indications of VM power consumption. We demonstrate that our approach is independent of underlying hardware and virtualization configurations with clustering analysis. We utilize the support vector regression to build the VM power model predicting the power consumption of both a single VM and multiple consolidated VMs running various workloads. The experimental results show that our model is able to predict the instantaneous VM power usage with an average error of 5% and 4.7% respectively against the actual power measurement.

international parallel and distributed processing symposium | 2012

Statistics-based Workload Modeling for MapReduce

Hailong Yang; Zhongzhi Luan; Wenjun Li; Depei Qian; Gang Guan

Large-scale data-intensive computing with MapReduce framework in Cloud is becoming pervasive for the core business of many academic, government, and industrial organizations. Hadoop is by far the most successful realization of MapReduce framework. While MapReduce is easy-to-use, efficient and reliable for data-intensive computations, the excessive configuration parameters in Hadoop cause unexpected challenges when running various workloads with Hadoop cluster effectively. Consequently, developers who have less experience with the Hadoop configuration system may devote a significant effort to write an application with poor performance, because they have no idea how these configurations would influence the performance, or they are not even aware that these configurations exist. In this paper, we propose a statistic analysis approach to identify the relationships among workload characteristics, Hadoop configurations and workload performance. Several non-intuitive relationships between workload characteristics and relative performance are revealed and the experimental results demonstrate that our regression models accurately predict the performance of MapReduce workloads under different Hadoop configurations.

Advances in Experimental Medicine and Biology | 2010

GPU Acceleration of Dock6’s Amber Scoring Computation

Hailong Yang; Qiongqiong Zhou; Bo Li; Yongjian Wang; Zhongzhi Luan; Depei Qian; Hanlu Li

Dressing the problem of virtual screening is a long-term goal in the drug discovery field, which if properly solved, can significantly shorten new drugs’ R&D cycle. The scoring functionality that evaluates the fitness of the docking result is one of the major challenges in virtual screening. In general, scoring functionality in docking requires a large amount of floating-point calculations, which usually takes several weeks or even months to be finished. This time-consuming procedure is unacceptable, especially when highly fatal and infectious virus arises such as SARS and H1N1, which forces the scoring task to be done in a limited time. This paper presents how to leverage the computational power of GPU to accelerate Dock6’s (http://dock.compbio.ucsf.edu/DOCK_6/) Amber (J. Comput. Chem. 25: 1157–1174, 2004) scoring with NVIDIA CUDA (NVIDIA Corporation Technical Staff, Compute Unified Device Architecture – Programming Guide, NVIDIA Corporation, 2008) (Compute Unified Device Architecture) platform. We also discuss many factors that will greatly influence the performance after porting the Amber scoring to GPU, including thread management, data transfer, and divergence hidden. Our experiments show that the GPU-accelerated Amber scoring achieves a 6.5× speedup with respect to the original version running on AMD dual-core CPU for the same problem size. This acceleration makes the Amber scoring more competitive and efficient for large-scale virtual screening problems.

international conference on algorithms and architectures for parallel processing | 2013

Interference-Aware Program Scheduling for Multicore Processors

Lin Wang; Rui Wang; Cuijiao Fu; Zhongzhi Luan; Depei Qian

Running multiple application programs on a multicore processor can maximize processor resources utilization. However, contention to the shared resources may result in interference among co-running programs, and make the program performance unstable and unpredictable. In order to optimize the performance of co-running programs and ensure the QoS of latency-sensitive applications, we propose an interference-aware scheduling strategy IA for systems based on multicore processors. Our work begins with analysis of the behavior of a set of benchmark programs, after that we train a simple program classifier. We use this classifier to classify the benchmark programs into three categories according to their interference with each other. The interference-aware scheduler tries to schedule the programs with less interference to the same multicore processor. Experiments results show that our method improves system performance while maintaining reasonable resource utilization. It outperforms the previously published scheduling strategy in guaranteeing the QoS of latency-sensitive applications.

international conference on cluster computing | 2012

Predictive Data and Energy Management under Budget

Yijing Xu; Zhongzhi Luan; Zhendong Cheng; Depei Qian; Ning Zhang; Gang Guan

Power reducing in clusters has become increasingly important over the past few years. People have tried hard to reduce the power consumption of clusters. However, managing the power is more important than reducing the power. In this paper, we add power consumption to the list of managed resources and help developers to understand and control power profile of their clusters. MapReduce is an efficient and popular programming model for data-intensive computing, so we focus on designing green power management for MapReduce workloads. We designed these strategies to make every node in clusters run under a local power budget, and the whole cluster under a global power budget. We modified the data placement policies in HDFS, designed dynamic replica placement policies, and examined different workloads to learn power consumption models. In addition, we also right sizing the clusters according to the power budget. As our predictive power model focuses on the variation of the power, we can predict when users should take measures to reduce power usage. We also present implementation and experiments in this paper.

cyber-enabled distributed computing and knowledge discovery | 2009

An improved staged event driven architecture for Master-Worker network computing

Biao Han; Zhongzhi Luan; Danfeng Zhu; Yinan Ren; Ting Chen; Yongjian Wang; Zhongxin Wu

We propose a new design for Master-Worker network computing systems, called the Master-Worker event driven architecture (MEDA). MEDA is an extension of staged event driven architecture and designed for the goal to meet the dynamic demand of Master-Worker model network computing systems, with support for high concurrency, adaptive resources management and modular construction. In MEDA, applications consist of a network of event-driven stages connected by queues. MEDA makes use of a set of dynamic control mechanisms for automatic tuning and load conditioning. Queuing theory is used for thread management mechanism. Application field extends from stand-alone environment to the wide area network environment through the introduction of network queue. Delay event queue realizes the balance of performance and resource consumption. Priority queue expands the availability of job scheduling strategy and improves system efficiency. From experiments of Drug Discovery Grid, these results show that MEDA systems exhibit higher performance, better reliability than Master-Worker network computing system developed using traditional design.

computing frontiers | 2016

Lock-based synchronization for GPU architectures

Yunlong Xu; Lan Gao; Rui Wang; Zhongzhi Luan; Weiguo Wu; Depei Qian

Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, the existing GPU lock implementations either incur frequent concurrency bugs, or lead to extremely low hardware utilization due to the Single Instruction Multiple Threads (SIMT) execution paradigm of GPUs. To make more applications with data sharing benefit from GPU acceleration, we propose a new locking scheme for GPU architectures. The proposed locking scheme allows lock stealing within individual warps to avoid the concurrency bugs due to the SMIT execution of GPUs. Moreover, it adopts lock virtualization to reduce the memory cost of fine-grain GPU locks. To illustrate the usage and the benefit of GPU locks, we apply the proposed GPU locking scheme to Delaunay mesh refinement (DMR), an application involving massive data sharing among threads. Our lock-based implementation can achieve 1.22x speedup over an algorithmic optimization based implementation (which uses a synchronization mechanism tailored for DMR) with 94% less memory cost.

Explore More