Keqin Li
State University of New York System
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Keqin Li.
Information Sciences | 2014
Yuming Xu; Kenli Li; Jingtong Hu; Keqin Li
On parallel and distributed heterogeneous computing systems, a heuristic-based task scheduling algorithm typically consists of two phases: task prioritization and processor selection. In a heuristic based task scheduling algorithm, different prioritization will produce different makespan on a heterogeneous computing system. Therefore, a good scheduling algorithm should be able to efficiently assign a priority to each subtask depending on the resources needed to minimize makespan. In this paper, a task scheduling scheme on heterogeneous computing systems using a multiple priority queues genetic algorithm (MPQGA) is proposed. The basic idea of our approach is to exploit the advantages of both evolutionary-based and heuristic-based algorithms while avoiding their drawbacks. The proposedalgorithm incorporates a genetic algorithm (GA) approach to assign a priority to each subtask while using a heuristic-based earliest finish time (EFT) approach to search for a solution for the task-to-processor mapping. The MPQGA method also designs crossover, mutation, and fitness function suitable for the scenario of directed acyclic graph (DAG) scheduling. The experimental results for large-sized problems from a large set of randomly generated graphs as well as graphs of real-world problems with various characteristics show that the proposed MPQGA algorithm outperforms two non-evolutionary heuristics and a random search method in terms of schedule quality.
Information Sciences | 1998
Yi Pan; Keqin Li
A new computational model, called a linear array with a reconfigurable pipelined bus system (LARPBS), has been proposed as a feasible and efficient parallel computational model based on current optical technologies. In this paper, we further study this model by proposing several basic data movement operations on the model. These operations include broadcast, multicast, compression, split, binary prefix sum, maximum finding. Using these basic operations, several image processing algorithms are also presented for the model. We show that all algorithms can be executed efficiently on the LARPBS model. It is our hope that the LARPBS model can be used as a new and practical parallel computational model for designing parallel algorithms.
IEEE Transactions on Parallel and Distributed Systems | 1998
Keqin Li; Yi Pan; S. Q. Zheng
We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassens algorithm. The main contributions of the paper are as follows. We develop five matrix multiplication algorithms with varying degrees of parallelism on the LARPBS computing model; namely, MM/sub 1/, MM/sub 2/, MM/sub 3/, and compound algorithms C/sub 1/(/spl epsiv/)and C/sub 2/(/spl delta/). Algorithm C/sub 1/(/spl epsiv/) has adjustable time complexity in sublinear level. Algorithm C/sub 2/(/spl delta/) implies that it is feasible to achieve sublogarithmic time using /spl sigma/(N/sup 3/) processors for matrix multiplication on a realistic system. Algorithms MM/sub 3/, C/sub 1/(/spl epsiv/), and C/sub 2/(/spl delta/) all have o(/spl Nscr//sup 3/) cost and, hence, are very processor efficient. Algorithms MM/sub 1/, MM/sub 3/, and C/sub 1/(/spl epsiv/) are general-purpose matrix multiplication algorithms, where the array elements are in any ring. Algorithms MM/sub 2/ and C/sub 2/(/spl delta/) are applicable to array elements that are integers of bounded magnitude, or floating-point values of bounded precision and magnitude, or Boolean values. Extension of algorithms MM/sub 2/ and C/sub 2/(/spl delta/) to unbounded integers and reals are also discussed.
IEEE Transactions on Parallel and Distributed Systems | 2013
Junwei Cao; Kai Hwang; Keqin Li; Albert Y. Zomaya
Along with the development of cloud computing, an increasing number of enterprises start to adopt cloud service, which promotes the emergence of many cloud service providers. For cloud service providers, how to configure their cloud service platforms to obtain the maximum profit becomes increasingly the focus that they pay attention to. In this paper, we take customer satisfaction into consideration to address this problem. Customer satisfaction affects the profit of cloud service providers in two ways. On one hand, the cloud configuration affects the quality of service which is an important factor affecting customer satisfaction. On the other hand, the customer satisfaction affects the request arrival rate of a cloud service provider. However, few existing works take customer satisfaction into consideration in solving profit maximization problem, or the existing works considering customer satisfaction do not give a proper formalized definition for it. Hence, we first refer to the definition of customer satisfaction in economics and develop a formula for measuring customer satisfaction in cloud computing. And then, an analysis is given in detail on how the customer satisfaction affects the profit. Lastly, taking into consideration customer satisfaction, service-level agreement, renting price, energy consumption, and so forth, a profit maximization problem is formulated and solved to get the optimal configuration such that the profit is maximized.
IEEE Transactions on Parallel and Distributed Systems | 2015
Yuming Xu; Keqin Li; Ligang He; Longxin Zhang; Kenqin Li
Scheduling for directed acyclic graph (DAG) tasks with the objective of minimizing makespan has become an important problem in a variety of applications on heterogeneous computing platforms, which involves making decisions about the execution order of tasks and task-to-processor mapping. Recently, the chemical reaction optimization (CRO) method has proved to be very effective in many fields. In this paper, an improved hybrid version of the CRO method called HCRO (hybrid CRO) is developed for solving the DAG-based task scheduling problem. In HCRO, the CRO method is integrated with the novel heuristic approaches, and a new selection strategy is proposed. More specifically, the following contributions are made in this paper. (1) A Gaussian random walk approach is proposed to search for optimal local candidate solutions. (2) A left or right rotating shift method based on the theory of maximum Hamming distance is used to guarantee that our HCRO algorithm can escape from local optima. (3) A novel selection strategy based on the normal distribution and a pseudo-random shuffle approach are developed to keep the molecular diversity. Moreover, an exclusive-OR (XOR) operator between two strings is introduced to reduce the chance of cloning before new molecules are generated. Both simulation and real-life experiments have been conducted in this paper to verify the effectiveness of HCRO. The results show that the HCRO algorithm schedules the DAG tasks much better than the existing algorithms in terms of makespan and speed of convergence.
IEEE Transactions on Computers | 2015
Kenli Li; Xiaoyong Tang; Bharadwaj Veeravalli; Keqin Li
Generally, a parallel application consists of precedence constrained stochastic tasks, where task processing times and intertask communication times are random variables following certain probability distributions. Scheduling such precedence constrained stochastic tasks with communication times on a heterogeneous cluster system with processors of different computing capabilities to minimize a parallel applications expected completion time is an important but very difficult problem in parallel and distributed computing. In this paper, we present a model of scheduling stochastic parallel applications on heterogeneous cluster systems. We discuss stochastic scheduling attributes and methods to deal with various random variables in scheduling stochastic tasks. We prove that the expected makespan of scheduling stochastic tasks is greater than or equal to the makespan of scheduling deterministic tasks, where all processing times and communication times are replaced by their expected values. To solve the problem of scheduling precedence constrained stochastic tasks efficiently and effectively, we propose a stochastic dynamic level scheduling (SDLS) algorithm, which is based on stochastic bottom levels and stochastic dynamic levels. Our rigorous performance evaluation results clearly demonstrate that the proposed stochastic task scheduling algorithm significantly outperforms existing algorithms in terms of makespan, speedup, and makespan standard deviation.
IEEE Transactions on Parallel and Distributed Systems | 2014
Kenli Li; Xiaoyong Tang; Keqin Li
In the past few years, with the rapid development of heterogeneous computing systems (HCS), the issue of energy consumption has attracted a great deal of attention. How to reduce energy consumption is currently a critical issue in designing HCS. In response to this challenge, many energy-aware scheduling algorithms have been developed primarily using the dynamic voltage-frequency scaling (DVFS) capability which has been incorporated into recent commodity processors. However, these techniques are unsatisfactory in minimizing both schedule length and energy consumption. Furthermore, most algorithms schedule tasks according to their average-case execution times and do not consider task execution times with probability distributions in the real-world. In realizing this, we study the problem of scheduling a bag-of-tasks (BoT) application, made of a collection of independent stochastic tasks with normal distributions of task execution times, on a heterogeneous platform with deadline and energy consumption budget constraints. We build execution time and energy consumption models for stochastic tasks on a single processor. We derive the expected value and variance of schedule length on HCS by Clarks equations. We formulate our stochastic task scheduling problem as a linear programming problem, in which we maximize the weighted probability of combined schedule length and energy consumption metric under deadline and energy consumption budget constraints. We propose a heuristic energy-aware stochastic task scheduling algorithm called ESTS to solve this problem. Our algorithm can achieve high scheduling performance for BoT applications with low time complexity O(n(M + logn)), where n is the number of tasks and M is the total number of processor frequencies. Our extensive simulations for performance evaluation based on randomly generated stochastic applications and real-world applications clearly demonstrate that our proposed heuristic algorithm can improve the weighted probability that both the deadline and the energy consumption budget constraints can be met, and has the capability of balancing between schedule length and energy consumption.
ieee international conference on cloud computing technology and science | 2013
Kashif Bilal; Marc Manzano; Samee Ullah Khan; Eusebi Calle; Keqin Li; Albert Y. Zomaya
Data centers being an architectural and functional block of cloud computing are integral to the Information and Communication Technology (ICT) sector. Cloud computing is rigorously utilized by various domains, such as agriculture, nuclear science, smart grids, healthcare, and search engines for research, data storage, and analysis. A Data Center Network (DCN) constitutes the communicational backbone of a data center, ascertaining the performance boundaries for cloud infrastructure. The DCN needs to be robust to failures and uncertainties to deliver the required Quality of Service (QoS) level and satisfy Service Level Agreement (SLA). In this paper, we analyze robustness of the state-of-the-art DCNs. Our major contributions are: (a) we present multi-layered graph modeling of various DCNs; (b) we study the classical robustness metrics considering various failure scenarios to perform a comparative analysis; (c) we present the inadequacy of the classical network robustness metrics to appropriately evaluate the DCN robustness; and (d) we propose new procedures to quantify the DCN robustness. Currently, there is no detailed study available centering the DCN robustness. Therefore, we believe that this study will lay a firm foundation for the future DCN robustness research.
IEEE Transactions on Parallel and Distributed Systems | 2015
Kenli Li; Wangdong Yang; Keqin Li
This paper presents a unique method of performance analysis and optimization for sparse matrix-vector multiplication (SpMV) on GPU. This method has wide adaptability for different types of sparse matrices and is different from existing methods which only adapt to some particular sparse matrices. In addition, our method does not need additional benchmarks to get optimized parameters, which are calculated directly through the probability mass function (PMF). We make the following contributions. (1) We present a PMF to analyze precisely the distribution pattern of non-zero elements in a sparse matrix. The PMF can provide theoretical basis for the compression of a sparse matrix. (2) Compression efficiency of COO, CSR, ELL, and HYB can be analyzed precisely through the PMF, and combined with the hardware parameters of GPU, the performance of SpMV based on COO, CSR, ELL, and HYB can be estimated. Furthermore, the most appropriate format for SpMV can be selected according to estimated value of the performance. Experiments prove that the theoretical estimated values and the tested values have high consistency. (3) For HYB, the optimal segmentation threshold can be found through the PMF to achieve the optimal performance for SpMV. Our performance modeling and analysis are very accurate. The order of magnitude of the estimated speedup and that of the tested speedup for each of the ten tested sparse matrices based on the three formats COO, CSR, and ELL are the same. The percentage of relative difference between an estimated value and a tested value is less than 20 percent for over 80 percent cases. The performance improvement of our algorithm is also effective. The average performance improvement of the optimal solution for HYB is over 15 percent compared with that of the automatic solution provided by CUSPARSE lib.
Future Generation Computer Systems | 2014
F. Zhang; Junwei Cao; Keqin Li; Samee Ullah Khan; Kai Hwang
a b s t r a c t The scheduling of a many-task workflow in a distributed computing platform is a well known NP-hard problem. The problem is even more complex and challenging when the virtualized clusters are used to execute a large number of tasks in a cloud computing platform. The difficulty lies in satisfying multiple objectives that may be of conflicting nature. For instance, it is difficult to minimize the makespan of many tasks, while reducing the resource cost and preserving the fault tolerance and/or the quality of service (QoS) at the same time. These conflicting requirements and goals are difficult to optimize due to the unknown runtime conditions, such as the availability of the resources and random workload distributions. Instead of taking a very long time to generate an optimal schedule, we propose a new method to generate suboptimal or sufficiently good schedules for smooth multitask workflows on cloud platforms. Our new multi-objective scheduling (MOS) scheme is specially tailored for clouds and based on the ordinal optimization (OO) method that was originally developed by the automation community for the design optimization of very complex dynamic systems. We extend the OO scheme to meet the special demands from cloud platforms that apply to virtual clusters of servers from multiple data centers. We prove the suboptimality through mathematical analysis. The major advantage of our MOS method lies in the significantly reduced scheduling overhead time and yet a close to optimal performance. Extensive experiments were carried out on virtual clusters with 16 to 128 virtual machines. The multitasking workflow is obtained from a real scientific LIGO workload for earth gravitational wave analysis. The experimental results show that our proposed algorithm rapidly and effectively generates a small set of semi-optimal scheduling solutions. On a 128-node virtual cluster, the method results in a thousand times of reduction in the search time for semi-optimal workflow schedules compared with the use of the Monte Carlo and the Blind Pick methods for the same purpose.