Wai-Shing Ho | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wai-Shing Ho is active.

Explore More

Publication

Featured researches published by Wai-Shing Ho.

IEEE Transactions on Knowledge and Data Engineering | 2010

Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index

Ben Kao; Sau Dan Lee; Foris K. F. Lee; David W. Cheung; Wai-Shing Ho

Abstract-We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdfs). We show that the UK-means algorithm, which generalizes the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (EDs) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculations. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previously known in the literature. We then introduce an R-tree index to organize the uncertain objects so as to reduce pruning overheads. We conduct experiments to evaluate the effectiveness of our novel techniques. We show that our techniques are additive and, when used in combination, significantly outperform previously known methods.

international conference on data mining | 2008

Clustering Uncertain Data Using Voronoi Diagrams

Ben Kao; Sau Dan Lee; David W. Cheung; Wai-Shing Ho; K. F. Chan

We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdf). We show that the UK-means algorithm, which generalises the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (ED) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculation. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previous known in the literature. We conduct experiments to evaluate the effectiveness of our pruning techniques and to show that our techniques significantly outperform previous methods.

conference on information and knowledge management | 2009

Supporting ranking pattern-based aggregate queries in sequence data cubes

Chun Kit Chui; Eric Lo; Ben Kao; Wai-Shing Ho

Sequence data processing has been studied extensively in the literature. In recent years, the warehousing and online-analytical processing (OLAP) of archived sequence data have received growing attentions. In particular, the concept of sequence OLAP is recently proposed with the objective of evaluating various kinds of so-called Pattern-Based Aggregate (PBA) queries so that various kinds of data analytical tasks on sequence data can be carried out efficiently. This paper studies the evaluation of ranking PBA queries, which rank the results of PBA queries and return only the top-ranked ones to users. We discuss how ranking PBA queries drastically improve the usability of S-OLAP systems and present techniques that can evaluate various kinds of ranking PBA queries efficiently.

systems man and cybernetics | 2002

Automatic construction of online catalog topologies

Wing-Kin Sung; David Yang; Siu-Ming Yiu; David W. Cheung; Wai-Shing Ho; Tak Wah Lam

A good online catalog is crucial to the success of an e-commerce web site. Traditionally, an online catalog is mainly built by hand. To what extent this can be automated is a challenging problem. Recently, there have been investigations on how to reorganize an existing online catalog based on some criteria, but none of them has addressed the problem of organizing an online catalog automatically from scratch. This paper attempts to tackle this problem. We model an online catalog organization as a decision tree structure and propose a metric, based on the popularity of products and the relative importance of product attribute values, to evaluate the quality of a catalog organization. The problem is then formulated as a decision tree construction problem. Although traditional decision tree algorithms, such as C4.5, can be used to generate online catalog organization, the catalog constructed is generally not good based on our metric. An efficient greedy algorithm (GENCAT) is thus developed, and the experimental results show that GENCAT produces better catalog organizations based on our metric.

database and expert systems applications | 2004

Processing Ad-Hoc Joins on Mobile Devices

Eric Lo; Nikos Mamoulis; David W. Cheung; Wai-Shing Ho; Panos Kalnis

Mobile devices are capable of retrieving and processing data from remote databases. In a wireless data transmission environment, users are typically charged by the size of transferred data, rather than the amount of time they stay connected. We propose algorithms that join information from non-collaborative remote databases on mobile devices. Our methods minimize the data transferred during the join process, by also considering the limitations of mobile devices. Experimental results show that our approach can perform join processing on mobile devices effectively.

computational intelligence and data mining | 2007

Adaptive Frequency Counting over Bursty Data Streams

Bill Lin; Wai-Shing Ho; Ben Kao; Chun Kit Chui

We investigate the problem of frequent itemset mining over a data stream with bursty traffic. In many modern applications, data arrives at a system as a continuous stream of transactions. In many cases, the arrival rate of transactions fluctuates wildly. Traditional stream mining algorithms, such as Lossy Counting (LC), were generally designed to handle data streams with steady data arrival rates. We show that LC suffers significant loss of accuracy when the data stream is bursty. We propose the Adaptive Frequency Counting algorithm (AFC) to handle bursty data. AFC has a feedback mechanism that dynamically adjusts the mining speed to cope with the changing data arrival rate. Through extensive experiments, we show that AFC outperforms LC under bursty traffics in terms of the accuracy of the set of frequent itemsets

international workshop on advanced issues of e commerce and web based information systems wecwis | 2000

Construction of online catalog topologies using decision trees

David Yang; Wing-Kin Sung; Siu-Ming Yiu; David W. Cheung; Wai-Shing Ho; Tak Wah Lam; Sau Dan Lee

Organization of a Web site is important to help users get the most out of the site. A good Web site should help visitors find the information they want easily. Visitors typically find information by searching for selected terms of interest or by following links from one Web page to another. The first approach is more useful if the visitor knows exactly what he is seeking, while the second approach is useful when the visitor has less of a preconceived notion about what he wants. The organization of a Web site is especially important in the latter case. Traditionally, Web site organization is done by hand. In this paper, we introduce the problem of automatic Web site construction and propose a solution for solving a major step of the problem based on decision tree algorithms. The solution is found to be useful in automatic construction of product catalogs.

database systems for advanced applications | 2004

SF-Tree: An efficient and flexible structure for estimating selectivity of simple path expressions with statistical accuracy guarantee

Wai-Shing Ho; Ben Kao; David W. Cheung; Yip Chi Lap; Eric Lo

Estimating the selectivity of a simple path expression (SPE) is essential for selecting the most efficient evaluation plans for XML queries. To estimate selectivity, we need an efficient and flexible structure to store a summary of the path expressions that are present in an XML document collection. In this paper we propose a new structure called SF-Treeto address the selectivity estimation problem. SF-Tree provides a flexible way for the users to choose among accuracy, space requirement and selectivity retrieval speed. It makes use of signature files to store the SPEs in a tree form to increase the selectivity retrieval speed and the accuracy of the retrieved selectivity. Our analysis shows that the probability that a selectivity estimation error occurs decreases exponentially with respect to the error size.

International Conference on ICT in Teaching and Learning | 2013

The Design and Implementation of an Information System for Placement Programmes

Steven Kwan Keung Ng; Wai-Shing Ho; Fu Lee Wang; Kenneth Wong; Michael Cheung

Placement programme is now an integral part of higher education curriculums as it brings invaluable chances to integrate practice and theories. However, traditional course management systems cannot handle all the information inside a placement programme such as the input from stakeholders like employers and placement counselors. We have recently designed and implemented an information system to support the information need of our placement programme. The system provides a centralized platform for employers to post placement information; for students to view possible placement opportunities; and for placement counselor to provide placement guidance to the students. The system also provides a matching service to identify the best placement opportunities for each student and the best students for each placement job. This paper shares our experience in the design and implementation of our placement information system.

electronic commerce and web technologies | 2001

Automatic Construction of Online Catalog Topologies

Wing-Kin Sung; David Yang; Siu-Ming Yiu; Wai-Shing Ho; David W. Cheung; Tak Wah Lam

The organization of a web site is important to help users get the most out of the site. Designing such an organization, however, is a complicated problem. Traditionally, this design is mainly done by hand. To what extent this can be automated is a challenging problem. Recently, there have been investigations on how to reorganize an existing web site based on some criteria. But none of them has addressed the problem of organizing a web site automatically from scratch. In this paper, we attempt to tackle this problem by restricting the domain to online catalog organization. We model an online catalog organization as a decision tree structure and propose a metric, based on the popularity of products and the relative importance of product attribute values, to evaluate the quality of a catalog organization. The problem is then formulated as a decision tree construction problem. Although traditional decision tree algorithms, such as C4.5, can be used to generate online catalog organization, the catalog constructed is generally not good based on our metric. An efficient greedy algorithm (GENCAT) is thus developed and the experimental results show that GENCAT produces better catalog organizations based on our metric.

Explore More