Is this you? Create Your Porfile

Xiaojun Ruan

West Chester University of Pennsylvania

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaojun Ruan is active.

Explore More

Publication

Featured researches published by Xiaojun Ruan.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Improving MapReduce performance through data placement in heterogeneous Hadoop clusters

Jiong Xie; Shu Yin; Xiaojun Ruan; Zhiyang Ding; Yun Tian; James Majors; Adam Manzanares; Xiao Qin

MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop-an open-source implementation of MapReduce is widely used for short jobs requiring low response time. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Data locality has not been taken into account for launching speculative map tasks, because it is assumed that most maps are data-local. Unfortunately, both the homogeneity and data locality assumptions are not satisfied in virtualized data centers. We show that ignoring the data-locality issue in heterogeneous environments can noticeably reduce the MapReduce performance. In this paper, we address the problem of how to place data across nodes in a way that each node has a balanced data processing load. Given a dataintensive application running on a Hadoop MapReduce cluster, our data placement scheme adaptively balances the amount of data stored in each node to achieve improved data-processing performance. Experimental results on two real data-intensive applications show that our data placement strategy can always improve the MapReduce performance by rebalancing data across nodes before performing a data-intensive application in a heterogeneous Hadoop cluster.

IEEE Transactions on Computers | 2011

EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters

Ziliang Zong; Adam Manzanares; Xiaojun Ruan; Xiao Qin

High-performance clusters have been widely deployed to solve challenging and rigorous scientific and engineering tasks. On one hand, high performance is certainly an important consideration in designing clusters to run parallel applications. On the other hand, the ever increasing energy cost requires us to effectively conserve energy in clusters. To achieve the goal of optimizing both performance and energy efficiency in clusters, in this paper, we propose two energy-efficient duplication-based scheduling algorithms-Energy-Aware Duplication (EAD) scheduling and Performance-Energy Balanced Duplication (PEBD) scheduling. Existing duplication-based scheduling algorithms replicate all possible tasks to shorten schedule length without reducing energy consumption caused by duplication. Our algorithms, in contrast, strive to balance schedule lengths and energy savings by judiciously replicating predecessors of a task if the duplication can aid in performance without degrading energy efficiency. To illustrate the effectiveness of EAD and PEBD, we compare them with a nonduplication algorithm, a traditional duplication-based algorithm, and the dynamic voltage scaling (DVS) algorithm. Extensive experimental results using both synthetic benchmarks and real-world applications demonstrate that our algorithms can effectively save energy with marginal performance degradation.

international conference on parallel processing | 2007

Energy-Efficient Scheduling for Parallel Applications Running on Heterogeneous Clusters

Ziliang Zong; Xiao Qin; Xiaojun Ruan; Kiranmai Bellam; Mais Nijim; Mohammed I. Alghamdi

High performance clusters have been widely used to provide amazing computing capability for both commercial and scientific applications. However, huge power consumption has prevented the further application of large-scale clusters. Designing energy-efficient scheduling algorithms for parallel applications running on clusters, especially on the high performance heterogeneous clusters, is highly desirable. In this regard, we propose a novel scheduling strategy called energy efficient task duplication schedule (EETDS for short), which can significantly conserve power by judiciously shrinking communication energy cost when allocating parallel tasks to heterogeneous computing nodes. We present the preliminary simulation results for Gaussian and FFT parallel task models to prove the efficiency of our algorithm.

international conference on computer communications and networks | 2007

An Energy-Efficient Scheduling Algorithm Using Dynamic Voltage Scaling for Parallel Applications on Clusters

Xiaojun Ruan; Xiao Qin; Ziliang Zong; Kiranmai Bellam; Mais Nijim

In the past decade cluster computing platforms have been widely applied to support a variety of scientific and commercial applications, many of which are parallel in nature. However, scheduling parallel applications on large scale clusters is technically challenging due to significant communication latencies and high energy consumption. As such, shortening schedule length and conserving energy consumption are two major concerns in designing economical and environmentally friendly clusters. In this paper, we propose an energy-efficient scheduling algorithm (TDVAS) using the dynamic voltage scaling technique to provide significant energy savings for clusters. The TDVAS algorithm aims at judiciously leveraging processor idle times to lower processor voltages (i.e., the dynamic voltage scaling technique or DVS), thereby reducing energy consumption experienced by parallel applications running on clusters. Reducing processor voltages, however, can inevitably lead to increased execution times of parallel task. The salient feature of the TDVAS algorithm is to tackle this problem by exploiting tasks precedence constraints. Thus, TDVAS applies the DVS technique to parallel tasks followed by idle processor times to conserve energy consumption without increasing schedule lengths of parallel applications. Experimental results clearly show that the TDVAS algorithm is conducive to reducing energy dissipation in large-scale clusters without adversely affecting system performance.

ACM Transactions on Storage | 2009

Dynamic load balancing for I/O-intensive applications on clusters

Xiao Qin; Hong Jiang; Adam Manzanares; Xiaojun Ruan; Shu Yin

Load balancing for clusters has been investigated extensively, mainly focusing on the effective usage of global CPU and memory resources. However, previous CPU- or memory-centric load balancing schemes suffer significant performance drop under I/O-intensive workloads due to the imbalance of I/O load. To solve this problem, we propose two simple yet effective I/O-aware load-balancing schemes for two types of clusters: (1) homogeneous clusters where nodes are identical and (2) heterogeneous clusters, which are comprised of a variety of nodes with different performance characteristics in computing power, memory capacity, and disk speed. In addition to assigning I/O-intensive sequential and parallel jobs to nodes with light I/O loads, the proposed schemes judiciously take into account both CPU and memory load sharing in the system. Therefore, our schemes are able to maintain high performance for a wide spectrum of workloads. We develop analytic models to study mean slowdowns, task arrival, and transfer processes in system levels. Using a set of real I/O-intensive parallel applications and synthetic parallel jobs with various I/O characteristics, we show that our proposed schemes consistently improve the performance over existing non-I/O-aware load-balancing schemes, including CPU- and Memory-aware schemes and a PBS-like batch scheduler for parallel and sequential jobs, for a diverse set of workload conditions. Importantly, this performance improvement becomes much more pronounced when the applications are I/O-intensive. For example, the proposed approaches deliver 23.6--88.0 % performance improvements for I/O-intensive applications such as LU decomposition, Sparse Cholesky, Titan, Parallel text searching, and Data Mining. When I/O load is low or well balanced, the proposed schemes are capable of maintaining the same level of performance as the existing non-I/O-aware schemes.

IEEE Transactions on Computers | 2010

Communication-Aware Load Balancing for Parallel Applications on Clusters

Xiao Qin; Hong Jiang; Adam Manzanares; Xiaojun Ruan; Shu Yin

Cluster computing has emerged as a primary and cost-effective platform for running parallel applications, including communication-intensive applications that transfer a large amount of data among the nodes of a cluster via the interconnection network. Conventional load balancers have proven effective in increasing the utilization of CPU, memory, and disk I/O resources in a cluster. However, most of the existing load-balancing schemes ignore network resources, leaving an opportunity to improve the effective bandwidth of networks on clusters running parallel applications. For this reason, we propose a communication-aware load-balancing technique that is capable of improving the performance of communication-intensive applications by increasing the effective utilization of networks in cluster environments. To facilitate the proposed load-balancing scheme, we introduce a behavior model for parallel applications with large requirements of network, CPU, memory, and disk I/O resources. Our load-balancing scheme can make full use of this model to quickly and accurately determine the load induced by a variety of parallel applications. Simulation results generated from a diverse set of both synthetic bulk synchronous and real parallel applications on a cluster show that our scheme significantly improves the performance, in terms of slowdown and turn-around time, over existing schemes by up to 206 percent (with an average of 74 percent) and 235 percent (with an average of 82 percent), respectively.

IEEE Transactions on Wireless Communications | 2008

Improving Security of Real-Time Wireless Networks Through Packet Scheduling [Transactions Letters]

Xiao Qin; Mohammed I. Alghamdi; Mais Nijim; Ziliang Zong; Kiranmai Bellam; Xiaojun Ruan; Adam Manzanares

Modern real-time wireless networks require high security level to assure confidentiality of information stored in packages delivered through wireless links. However, most existing algorithms for scheduling independent packets in real-time wireless networks ignore various security requirements of the packets. Therefore, in this paper we remedy this problem by proposing a novel dynamic security-aware packet-scheduling algorithm, which is capable of achieving high quality of security for realtime packets while making the best effort to guarantee realtime requirements (e.g., deadlines) of those packets. We conduct extensive simulation experiments to evaluate the performance of our algorithm. Experimental results show that compared with two baseline algorithms, the proposed algorithm can substantially improve both quality of security and real-time packet guarantee ratio under a wide range of workload characteristics.

network computing and applications | 2009

Energy-Aware Prefetching for Parallel Disk Systems: Algorithms, Models, and Evaluation

Adam Manzanres; Xiaojun Ruan; Shu Yin; Mais Nijim; Wei Luo; Xiao Qin

Parallel disk systems consume a significant amount of energy due to the large number of disks. To design economically attractive and environmentally friendly parallel disk systems, in this paper we design and evaluate an energy-aware prefetching strategy for parallel disk systems consisting of a small number of buffer disks and large number of data disks. Using buffer disks to temporarily handle requests for data disks, we can keep data disks in the low-power mode as long as possible. Our prefetching algorithm aims to group many small idle periods in data disks to form large idle periods, which in turn allow data disks to remain in the standby state to save energy. To achieve this goal, we utilize buffer disks to aggressively fetch popular data from regular data disks into buffer disks, thereby putting data disks into the standby state for longer time intervals. A centrepiece in the prefetcing mechanism is an energy-saving prediction model, based on which we implement the energy-saving calculation module that is invoked in the prefetching algorithm. We quantitatively compare our energy-aware prefetching mechanism against existing solutions, including the dynamic power management strategy. Experimental results confirm that the buffer-disk-based prefetching can significantly reduce energy consumption in parallel disk systems by up to 50 percent. In addition, we systematically investigate the energy efficiency impact that varying disk power parameters has on our prefetching algorithm.

international conference on parallel processing | 2009

Performance Evaluation of Energy-Efficient Parallel I/O Systems with Write Buffer Disks

Xiaojun Ruan; Adam Manzanares; Shu Yin; Ziliang Zong; Xiao Qin

In the past decade, parallel disk systems have been developed to address the problem of I/O performance. A critical challenge with modern parallel I/O systems is that parallel disks consume a significant amount of energy in servers and high performance computers. To conserve energy consumption in parallel I/O systems, one can immediately spin down disks when disk are idle; however, spinning down disks might not be able to produce energy savings due to penalties of spinning operations. Unlike powering up CPUs, spinning down and up disks need physical movements. Therefore, energy savings provided by spinning down operations must offset energy penalties of the disk spinning operations. To substantially reduce the penalties incurred by disk spinning operations, we developed a novel approach to conserving energy of parallel I/O systems with write buffer disks, which are used to accumulate small writes using a log file system. Data sets buffered in the log file system can be transferred to target data disks in a batch way. Thus, buffer disks aim to serve a majority of incoming write requests, attempting to reduce the large number of disk spinning operations by keeping data disks in standby for long period times. Interestingly, the write buffer disks not only can achieve high energy efficiency in parallel I/O systems, but also can shorten response times of write requests. To evaluate the performance and energy efficiency of our parallel I/O systems with buffer disks, we implemented a prototype using a cluster storage system as a testbed. Experimental results show that under light and moderate I/O load, buffer disks can be employed to significantly reduce energy dissipation in parallel I/O systems without adverse impacts on I/O performance.

international conference on cluster computing | 2009

How reliable are parallel disk systems when energy-saving schemes are involved?

Shu Yin; Xiaojun Ruan; Adam Manzanares; Xiao Qin

Many energy conservation techniques have been proposed to achieve high energy efficiency in disk systems. Unfortunately, growing evidence shows that energy-saving schemes in disk drives usually have negative impacts on storage systems. Existing reliability models are inadequate to estimate reliability of parallel disk systems equipped with energy conservation techniques. To solve this problem, we propose a mathematical model — called MINT — to evaluate the reliability of a parallel disk system where energy-saving mechanisms are implemented. In this paper, we focus on modeling the reliability impacts of two well-known energy-saving techniques — the Popular Disk Concentration technique (PDC) and the Massive Array of Idle Disks (MAID). We started this research by investigating how PDC and MAID affect the utilization and power-state transition frequency of each disk in a parallel disk system. We then model the annual failure rate of each disk as a function of the disks utilization, powerstate transition frequency as well as operating temperature, because these parameters are key reliability-affecting factors in addition to disk ages. Next, the reliability of a parallel disk system can be derived from the annual failure rate of each disk in the parallel disk system. Finally, we used MINT to study the reliability of a parallel disk system equipped with the PDC and MAID techniques. Experimental results show that PDC is more reliable than MAID when disk workload is low. In contrast, the reliability of MAID is higher than that of PDC under relatively high I/O load.

Explore More