Kai-Cheung Leung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kai-Cheung Leung is active.

Explore More

Publication

Featured researches published by Kai-Cheung Leung.

grid computing | 2010

Metrics and task scheduling policies for energy saving in multicore computers

Jason Mair; Kai-Cheung Leung; Zhiyi Huang

In this paper, we have proposed three new metrics, Speedup per Watt (SPW), Power per Speedup (PPS) and Energy per Target (EPT), to guide task schedulers to select the best task schedules for energy saving in multicore computers. Based on these metrics, we have proposed the novel Sharing Policies, the Hare and the Tortoise Policies, which have taken into account parallelism and Dynamic Voltage Frequency Scaling (DVFS) in their schedules. Our experiments show that, on a modern multicore computer, the Hare Policy can save energy up to 72% in a system with low utilization. On a busier system the Sharing Policy can make a saving up to 20% of energy over standard scheduling policies.

Advances in Focused Retrieval | 2009

Wikisearching and Wikilinking

Dylan Jenkinson; Kai-Cheung Leung; Andrew Trotman

The University of Otago submitted three element runs and three passage runs to the Relevance-in-Context task of the ad hoc track. The best Otago run was a whole-document run placing 7th. The best Otago passage run placed 13th while the best Otago element run placed 31st. There were a total of 40 runs submitted to the task. The ad hoc result reinforced our prior belief that passages are better answers than elements and that the most important aspect of the focused retrieval is the identification of relevant documents. Six runs were submitted to the Link-the-Wiki track. The best Otago run placed 1st (of 21) in file to file automatic assessment and 6th (of 28) with manual assessment. The Itakura & Clarke algorithm was used for outgoing links, with special attention paid to parsing and case sensitivity. For incoming links representative terms were selected from the document and used to find similar documents.

parallel and distributed computing: applications and technologies | 2009

Maotai 2.0: Data Race Prevention in View-Oriented Parallel Programming

Kai-Cheung Leung; Zhiyi Huang; Qihang Huang; Paul Werstein

This paper proposes a data race prevention scheme, which can prevent data races in the View-Oriented Parallel Programming (VOPP) model. VOPP is a novel shared-memory data-centric parallel programming model, which uses views to bundle mutual exclusion with data access. We have implemented the data race prevention scheme with a memory protection mechanism. Experimental results show that the extra overhead of memory protection is trivial in our applications. We also present a new VOPP implementation--Maotai 2.0, which has advanced features such as deadlock avoidance, producer/consumer view and system queues, in addition to the data race prevention scheme. The performance of Maotai 2.0 is evaluated and compared with modern programming models such as OpenMP and Cilk.

high performance distributed computing | 2014

Data filtering for scalable high-dimensional k-NN search on multicore systems

Xiaoxin Tang; Steven Mills; David M. Eyers; Kai-Cheung Leung; Zhiyi Huang; Minyi Guo

K Nearest Neighbors (k-NN) search is a widely used category of algorithms with applications in domains such as computer vision and machine learning. With the rapidly increasing amount of data available, and their high dimensionality, k-NN algorithms scale poorly on multicore systems because they hit a memory wall. In this paper, we propose a novel data filtering strategy, named Subspace Clustering for Filtering (SCF), for k-NN search algorithms on multicore platforms. By excluding unlikely features in k-NN search, this strategy can reduce memory footprint as well as computation. Experimental results on four k-NN algorithms show that SCF can improve their performance on two modern multicore platforms with insignificant loss of search precision.

The Journal of Supercomputing | 2010

Data race: tame the beast

Kai-Cheung Leung; Zhiyi Huang; Qihang Huang; Paul Werstein

Data races hamper parallel programming and threaten the reliability of future software. This paper proposes the data race prevention scheme View-Oriented Data race Prevention (VODAP), which can prevent data races in the View-Oriented Parallel Programming (VOPP) model. VOPP is a novel shared-memory data-centric parallel programming model, which uses views to bundle mutual exclusion with data access. We have implemented the data race prevention scheme with a memory protection mechanism. Experimental results show that the extra overhead of memory protection is trivial in our applications. The performance is evaluated and compared with modern programming models such as OpenMP and Cilk.

international conference on parallel processing | 2012

When and How VOTM Can Improve Performance in Contention Situations

Kai-Cheung Leung; Yawen Chen; Zhiyi Huang

This paper extends the Restricted Admission Control (RAC) theoretical model to cover the multiple-view cases in View-Oriented Transactional Memory (VOTM) to analyze potential performance gain in VOTM when shared data is partitioned into multiple views. Experimental results show that partitioning shared data into separate views, each of which is independently controlled by RAC, can improve performance when one of the views has high contention while others have low contention. In memory-intensive transactions, even when contention is not high enough to justify admission control by RAC, partitioning shared data into different views can improve the performance of TM systems such as NOrec by reducing the contention in accessing the TM metadata.

parallel computing | 2013

Performance evaluation of View-Oriented Transactional Memory

Zhiyi Huang; Kai-Cheung Leung

This paper extensively evaluates the performance of View-Oriented Transactional Memory (VOTM) based on two implementations that adopt different Transactional Memory (TM) algorithms. The Restricted Admission Control (RAC) mechanism in VOTM plays a key role in the performance gains of VOTM. In this paper, we use six applications to evaluate the performance advantage of VOTM. Experimental results show that partitioning shared data into separate views can improve performance when one of the views has high contention while others may have low contention, because the contention of each view is independently controlled by RAC. For memory-intensive applications, even when the contention on application data is not high enough to justify admission control by RAC, partitioning shared data into different views can improve the performance of TM systems due to the reduced contention on the metadata of the TM systems.

international conference on parallel processing | 2013

Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Xiaoxin Tang; Steven Mills; David M. Eyers; Zhiyi Huang; Kai-Cheung Leung; Minyi Guo

Parallel programming is the mainstream for todays HPC applications. Programmers need to parallelize their programs to achieve better performance on multicore systems. However, due to a lack of good understanding of parallelism in algorithms, scheduling policy in runtime systems, and multicore architectures, programmers usually find it very hard to write high-performance, scalable programs on these parallel platforms. Although using a parallelized library written by experts can reduce the amount of work for coding, it does not automatically guarantee good performance according to our study. A better understanding of parallelism in algorithms, the OS/runtime systems, and hardware architectures is necessary if programmers wish to further improve performance. In this paper, we use SIFT-based feature matching within large-scale image collections to show the importance of three factors-the level of parallelism, scheduling policy, and memory architecture-that affect the performance of large-scale feature matching on multicore systems. We demonstrate experimental results using programs based on OpenCV and OpenMP, which are executed on both 16-core and 64-core machines. From our experimental results, we find that images with a large number of features achieve poor scalability on the 64-core machine due to a poor cache utilization. To address this issue of cache performance, we propose a Divide-and-Merge algorithm that divides the feature space into several small sub-spaces so that they fit within the cache. Our experiments show that the performance tuning addressing all of the three factors improves the speedup of feature matching from 10.6× to 21.5× on the 64-core machine. While the speedup is improved by 103%, the scalability of the feature matching algorithm is improved by up to 6.45 times on the 64-core machine with our performance tuning. Our study indicates that performance tuning on multicore systems is very challenging even for a simple image processing algorithm.

image and vision computing new zealand | 2013

Large-scale feature matching with distributed and heterogeneous computing

Steven Mills; David M. Eyers; Kai-Cheung Leung; Xiaoxin Tang; Zhiyi Huang

Feature matching is a fundamental problem in many computer vision tasks. As datasets become larger, and individual image resolution increases, this is becoming more and more computationally demanding work. While prior knowledge about the scene geometry can, in some cases, reduce the number of image pairs that need to be considered, the sheer volume of data means that parallel and distributed computing solutions must be considered. In this paper we examine the costs incurred in such solutions, and assess the way in which the problem scales with the number of cores within a single node, and the number of nodes in a distributed system. We also consider the role of heterogeneous systems, where nodes with different numbers and types of cores (including GPUs) are included in a distributed system. We show that distribution of this task across a cluster of machines has good (sometimes super-linear) scalability. However, scalability on many-core machines and GPU architectures is more limited, and is thus an important area for future research.

parallel and distributed computing: applications and technologies | 2010

Maotai 3.0: Automatic Detection of View Access in VOPP

Kai-Cheung Leung; Zhiyi Huang

This paper proposes a scheme for automatic detection of view access in the View-Oriented Parallel Programming (VOPP) model. VOPP is a shared-memory-based, data-centric model that uses “views” to bundle mutual exclusion with data access. Based on the automatic detection scheme, a view is automatically acquired when first accessed, and automatically released at proper time. This scheme simplifies the VOPP model and prevents programming errors. With this scheme, the programmability of VOPP is similar to transactional memory models. In addition, VOPP can eliminate data races without compromising performance. A new VOPP implementation, Maotai 3.0, has been developed and incorporated the above features. Experimental results demonstrate that the performance of Maotai 3.0 surpasses transactional memory models such as TL-2.

Explore More