Kenli Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenli Li is active.

Explore More

Publication

Featured researches published by Kenli Li.

IEEE Transactions on Computers | 2012

vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines

Lin Shi; Hao Chen; Jianhua Sun; Kenli Li

This paper describes vCUDA, a general-purpose graphics processing unit (GPGPU) computing solution for virtual machines (VMs). vCUDA allows applications executing within VMs to leverage hardware acceleration, which can be beneficial to the performance of a class of high-performance computing (HPC) applications. The key insights in our design include API call interception and redirection and a dedicated RPC system for VMs. With API interception and redirection, Compute Unified Device Architecture (CUDA) applications in VMs can access a graphics hardware device and achieve high computing performance in a transparent way. In the current study, vCUDA achieved a near-native performance with the dedicated RPC system. We carried out a detailed analysis of the performance of our framework. Using a number of unmodified official examples from CUDA SDK and third-party applications in the evaluation, we observed that CUDA applications running with vCUDA exhibited a very low performance penalty in comparison with the native environment, thereby demonstrating the viability of vCUDA architecture.

Information Sciences | 2014

A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues

Yuming Xu; Kenli Li; Jingtong Hu; Keqin Li

On parallel and distributed heterogeneous computing systems, a heuristic-based task scheduling algorithm typically consists of two phases: task prioritization and processor selection. In a heuristic based task scheduling algorithm, different prioritization will produce different makespan on a heterogeneous computing system. Therefore, a good scheduling algorithm should be able to efficiently assign a priority to each subtask depending on the resources needed to minimize makespan. In this paper, a task scheduling scheme on heterogeneous computing systems using a multiple priority queues genetic algorithm (MPQGA) is proposed. The basic idea of our approach is to exploit the advantages of both evolutionary-based and heuristic-based algorithms while avoiding their drawbacks. The proposedalgorithm incorporates a genetic algorithm (GA) approach to assign a priority to each subtask while using a heuristic-based earliest finish time (EFT) approach to search for a solution for the task-to-processor mapping. The MPQGA method also designs crossover, mutation, and fitness function suitable for the scenario of directed acyclic graph (DAG) scheduling. The experimental results for large-sized problems from a large set of randomly generated graphs as well as graphs of real-world problems with various characteristics show that the proposed MPQGA algorithm outperforms two non-evolutionary heuristics and a random search method in terms of schedule quality.

Applied Soft Computing | 2013

Chemical reaction optimization with greedy strategy for the 0-1 knapsack problem

Tung Khac Truong; Kenli Li; Yuming Xu

The 0-1 knapsack problem (KP01) is a well-known combinatorial optimization problem. It is an NP-hard problem which plays important roles in computing theory and in many real life applications. Chemical reaction optimization (CRO) is a new optimization framework, inspired by the nature of chemical reactions. CRO has demonstrated excellent performance in solving many engineering problems such as the quadratic assignment problem, neural network training, multimodal continuous problems, etc. This paper proposes a new chemical reaction optimization with greedy strategy algorithm (CROG) to solve KP01. The paper also explains the operator design and parameter turning methods for CROG. A new repair function integrating a greedy strategy and random selection is used to repair the infeasible solutions. The experimental results have proven the superior performance of CROG compared to genetic algorithm (GA), ant colony optimization (ACO) and quantum-inspired evolutionary algorithm (QEA).

Journal of Parallel and Distributed Computing | 2010

List scheduling with duplication for heterogeneous computing systems

Xiaoyong Tang; Kenli Li; Guiping Liao; Renfa Li

Effective task scheduling is essential for obtaining high performance in heterogeneous computing systems (HCS). However, finding an effective task schedule in HCS, requires the consideration of the heterogeneity of computation and communication. To solve this problem, we present a list scheduling algorithm, called Heterogeneous Earliest Finish with Duplicator (HEFD). As task priority is a key attribute for list scheduling algorithm, this paper presents a new approach for computing their priority which considers the performance difference in target HCS using variance. Another novel idea proposed in this paper is to try to duplicate all parent tasks and get an optimal scheduling solution. The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm HEFD significantly surpasses other three well-known algorithms.

IEEE Transactions on Computers | 2015

Scheduling Precedence Constrained Stochastic Tasks on Heterogeneous Cluster Systems

Kenli Li; Xiaoyong Tang; Bharadwaj Veeravalli; Keqin Li

Generally, a parallel application consists of precedence constrained stochastic tasks, where task processing times and intertask communication times are random variables following certain probability distributions. Scheduling such precedence constrained stochastic tasks with communication times on a heterogeneous cluster system with processors of different computing capabilities to minimize a parallel applications expected completion time is an important but very difficult problem in parallel and distributed computing. In this paper, we present a model of scheduling stochastic parallel applications on heterogeneous cluster systems. We discuss stochastic scheduling attributes and methods to deal with various random variables in scheduling stochastic tasks. We prove that the expected makespan of scheduling stochastic tasks is greater than or equal to the makespan of scheduling deterministic tasks, where all processing times and communication times are replaced by their expected values. To solve the problem of scheduling precedence constrained stochastic tasks efficiently and effectively, we propose a stochastic dynamic level scheduling (SDLS) algorithm, which is based on stochastic bottom levels and stochastic dynamic levels. Our rigorous performance evaluation results clearly demonstrate that the proposed stochastic task scheduling algorithm significantly outperforms existing algorithms in terms of makespan, speedup, and makespan standard deviation.

IEEE Transactions on Parallel and Distributed Systems | 2014

Energy-Efficient Stochastic Task Scheduling on Heterogeneous Computing Systems

Kenli Li; Xiaoyong Tang; Keqin Li

In the past few years, with the rapid development of heterogeneous computing systems (HCS), the issue of energy consumption has attracted a great deal of attention. How to reduce energy consumption is currently a critical issue in designing HCS. In response to this challenge, many energy-aware scheduling algorithms have been developed primarily using the dynamic voltage-frequency scaling (DVFS) capability which has been incorporated into recent commodity processors. However, these techniques are unsatisfactory in minimizing both schedule length and energy consumption. Furthermore, most algorithms schedule tasks according to their average-case execution times and do not consider task execution times with probability distributions in the real-world. In realizing this, we study the problem of scheduling a bag-of-tasks (BoT) application, made of a collection of independent stochastic tasks with normal distributions of task execution times, on a heterogeneous platform with deadline and energy consumption budget constraints. We build execution time and energy consumption models for stochastic tasks on a single processor. We derive the expected value and variance of schedule length on HCS by Clarks equations. We formulate our stochastic task scheduling problem as a linear programming problem, in which we maximize the weighted probability of combined schedule length and energy consumption metric under deadline and energy consumption budget constraints. We propose a heuristic energy-aware stochastic task scheduling algorithm called ESTS to solve this problem. Our algorithm can achieve high scheduling performance for BoT applications with low time complexity O(n(M + logn)), where n is the number of tasks and M is the total number of processor frequencies. Our extensive simulations for performance evaluation based on randomly generated stochastic applications and real-world applications clearly demonstrate that our proposed heuristic algorithm can improve the weighted probability that both the deadline and the energy consumption budget constraints can be met, and has the capability of balancing between schedule length and energy consumption.

IEEE Transactions on Parallel and Distributed Systems | 2015

Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling

Kenli Li; Wangdong Yang; Keqin Li

This paper presents a unique method of performance analysis and optimization for sparse matrix-vector multiplication (SpMV) on GPU. This method has wide adaptability for different types of sparse matrices and is different from existing methods which only adapt to some particular sparse matrices. In addition, our method does not need additional benchmarks to get optimized parameters, which are calculated directly through the probability mass function (PMF). We make the following contributions. (1) We present a PMF to analyze precisely the distribution pattern of non-zero elements in a sparse matrix. The PMF can provide theoretical basis for the compression of a sparse matrix. (2) Compression efficiency of COO, CSR, ELL, and HYB can be analyzed precisely through the PMF, and combined with the hardware parameters of GPU, the performance of SpMV based on COO, CSR, ELL, and HYB can be estimated. Furthermore, the most appropriate format for SpMV can be selected according to estimated value of the performance. Experiments prove that the theoretical estimated values and the tested values have high consistency. (3) For HYB, the optimal segmentation threshold can be found through the PMF to achieve the optimal performance for SpMV. Our performance modeling and analysis are very accurate. The order of magnitude of the estimated speedup and that of the tested speedup for each of the ten tested sparse matrices based on the three formats COO, CSR, and ELL are the same. The percentage of relative difference between an estimated value and a tested value is less than 20 percent for over 80 percent cases. The performance improvement of our algorithm is also effective. The average performance improvement of the optimal solution for HYB is over 15 percent compared with that of the automatic solution provided by CUSPARSE lib.

Journal of Parallel and Distributed Computing | 2013

A DAG scheduling scheme on heterogeneous computing systems using double molecular structure-based chemical reaction optimization

Yuming Xu; Kenli Li; Ligang He; Tung Khac Truong

A new meta-heuristic method, called Chemical Reaction Optimization (CRO), has been proposed very recently. The method encodes solutions as molecules and mimics the interactions of molecules in chemical reactions to search the optimal solutions. The CRO method has demonstrated its capability in solving NP-hard optimization problems. In this paper, the CRO scheme is used to formulate the scheduling of Directed Acyclic Graph (DAG) jobs in heterogeneous computing systems, and a Double Molecular Structure-based Chemical Reaction Optimization (DMSCRO) method is developed. There are two molecular structures in DMSCRO: one is used to encode the execution order of the tasks in a DAG job, and the other to encode the task-to-computing-node mapping. The DMSCRO method also designs four elementary chemical reaction operations and the fitness function suitable for the scenario of DAG scheduling. In this paper, we have also conducted the simulation experiments to verify the effectiveness and efficiency of DMSCRO over a large set of randomly generated graphs and the graphs for real-world problems.

Journal of Parallel and Distributed Computing | 2010

Reliability-aware scheduling strategy for heterogeneous distributed computing systems

Xiaoyong Tang; Kenli Li; Renfa Li; Bharadwaj Veeravalli

Heterogeneous computing systems are promising computing platforms, since single parallel architecture based systems may not be sufficient to exploit the available parallelism with the running applications. In some cases, heterogeneous distributed computing (HDC) systems can achieve higher performance with lower cost than single-machine supersystems. However, in HDC systems, processors and networks are not failure free and any kind of failure may be critical to the running applications. One way of dealing with such failures is to employ a reliable scheduling algorithm. Unfortunately, most existing scheduling algorithms for precedence constrained tasks in HDC systems do not adequately consider reliability requirements of inter-dependent tasks. In this paper, we design a reliability-driven scheduling architecture that can effectively measure system reliability, based on an optimal reliability communication path search algorithm, and then we introduce reliability priority rank (RRank) to estimate the tasks priority by considering reliability overheads. Furthermore, based on directed acyclic graph (DAG) we propose a reliability-aware scheduling algorithm for precedence constrained tasks, which can achieve high quality of reliability for applications. The comparison studies, based on both randomly generated graphs and the graphs of some real applications, show that our scheduling algorithm outperforms the existing scheduling algorithms in terms of makespan, scheduling length ratio, and reliability. At the same time, the improvement gained by our algorithm increases as the data communication among tasks increases.

Information Sciences | 2015

Maximizing reliability with energy conservation for parallel task scheduling in a heterogeneous cluster

Longxin Zhang; Kenli Li; Yuming Xu; Jing Mei; F. Zhang; Keqin Li

A heterogeneous computing system in a cluster is a promising computing platform, which attracts a large number of researchers due to its high performance potential. High system reliability and low power consumption are two primary objectives for a data center. Dynamic voltage scaling (DVS) has been proved to be the most efficient technique and is exploited widely to realize a low power system. Unfortunately, transient fault is inevitable during the execution of an application while applying the DVS technique. Most existing scheduling algorithms for precedence constrained tasks in a multiprocessor computer system do not adequately consider task reliability. In this paper, we devise a novel Reliability Maximization with Energy Constraint (RMEC) algorithm, which incorporates three important phases, including task priority establishment, frequency selection, and processor assignment. The RMEC algorithm can effectively balance the tradeoff between high reliability and energy consumption. Our rigorous performance evaluation study, based on both randomly generated task graphs and the graphs of some real-world applications, shows that our scheduling algorithm surpasses the existing algorithms in terms of system reliability enhancement and energy consumption saving.

Explore More