Ben Kao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ben Kao is active.

Explore More

Publication

Featured researches published by Ben Kao.

database systems for advanced applications | 1997

A General Incremental Technique for Maintaining Discovered Association Rules

David W. Cheung; Sau Dan Lee; Ben Kao

A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modijication of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the case of insertion. The proposed algorithm FUP2 makes use of the previous mining result to cut down the cost of finding the new rules in an updated database. In the insertion only case, FUP2 is equivalent to FUP. In the deletion only case, FUP2 is a complementary algorithm of FUP which is very eficient when the deleted transactions is a small part of the database, which is the most applicable case. In the general case, FUP2 can elqiciently update the discovered rules when new transactions are added to a transaction database, and obsolete transactions are removed from it. The proposed algorithm has been implemented and its performance is studied and compared with the best algorithms for mining association rules studied so far. The study shows that the new incremental algorithm is signijcantly faster than the traditional approach of mining the whole updated database.

knowledge discovery and data mining | 2007

Mining frequent itemsets from uncertain data

Chun Kit Chui; Ben Kao; Edward Hung

We study the problem of mining frequent itemsets from uncertain data under a probabilistic framework. We consider transactions whose items are associated with existential probabilities and give a formal definition of frequent patterns under such an uncertain data model. We show that traditional algorithms for mining frequent itemsets are either inapplicable or computationally inefficient under such a model. A data trimming framework is proposed to improve mining efficiency. Through extensive experiments, we show that the data trimming technique can achieve significant savings in both CPU cost and I/O cost.

IEEE Transactions on Parallel and Distributed Systems | 1997

Deadline assignment in a distributed soft real-time system

Ben Kao; Hector Garcia-Molina

In a distributed environment, tasks often have processing demands at multiple different sites. A distributed task is usually divided into several subtasks, each to be executed in order at some site. In a real-time system, an overall deadline is usually specified by an application designer indicating when a distributed task is to be finished. In this paper, we present and analyze techniques for automatically translating the overall deadline into deadlines for the individual subtasks.

knowledge discovery and data mining | 2006

Uncertain data mining: an example in clustering location data

Michael Chau; Reynold Cheng; Ben Kao; Jackey Ng

Data uncertainty is an inherent property in various applications due to reasons such as outdated sources or imprecise measurement. When data mining techniques are applied to these data, their uncertainty has to be considered to obtain high quality results. We present UK-means clustering, an algorithm that enhances the K-means algorithm to handle data uncertainty. We apply UK-means to the particular pattern of moving-object uncertainty. Experimental results show that by considering uncertainty, a clustering algorithm can produce more accurate results.

international conference on data mining | 2009

Naive Bayes Classification of Uncertain Data

Jiangtao Ren; Sau Dan Lee; Xianlu Chen; Ben Kao; Reynold Cheng; David W. Cheung

Traditional machine learning algorithms assume that data are exact or precise. However, this assumption may not hold in some situations because of data uncertainty arising from measurement errors, data staleness, and repeated measurements, etc. With uncertainty, the value of each data item is represented by a probability distribution function (pdf). In this paper, we propose a novel naive Bayes classification algorithm for uncertain data with a pdf. Our key solution is to extend the class conditional probability estimation in the Bayes model to handle pdf’s. Extensive experiments on UCI datasets show that the accuracy of naive Bayes model can be improved by taking into account the uncertainty information.

knowledge discovery and data mining | 2005

Online algorithms for mining inter-stream associations from large sensor networks

K. K. Loo; Ivy Tong; Ben Kao

We study the problem of mining frequent value sets from a large sensor network. We discuss how sensor stream data could be represented that facilitates efficient online mining and propose the interval-list representation. Based on Lossy Counting, we propose ILB, an interval-list-based online mining algorithm for discovering frequent sensor value sets. Through extensive experiments, we compare the performance of ILB against an application of Lossy Counting (LC) using a weighted transformation method. Results show that ILB outperforms LC significantly for large sensor networks.

Data Mining and Knowledge Discovery | 1998

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules

Sau Dan Lee; David W. Cheung; Ben Kao

By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a good approximation to those in the updated database. Hence, we do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. Experimental results show that our algorithm is very efficient and highly accurate.

international conference on data mining | 2007

Reducing UK-Means to K-Means

Sau Dan Lee; Ben Kao; Reynold Cheng

This paper proposes an optimisation to the UK-means algorithm, which generalises the k-means algorithm to han- dle objects whose locations are uncertain. The location of each object is described by a probability density function (pdf). The UK-means algorithm needs to compute expected distances (EDs) between each object and the cluster repre- sentatives. The evaluation of ED from first principles is very costly operation, because the pdf s are different and arbi- trary. But UK-means needs to evaluate a lot of EDs. This is a major performance burden of the algorithm. In this pa- per, we derive a formula for evaluating EDs efficiently. This tremendously reduces the execution time of UK-means, as demonstrated by our preliminary experiments. We also il- lustrate that this optimised formula effectively reduces the UK-means problem to the traditional clustering algorithm addressed by the k-means algorithm.

Knowledge Based Systems | 1998

Discovering user access patterns on the World Wide Web

David W. Cheung; Ben Kao; Joseph Hun Wei Lee

The World Wide Web provides its users with almost unlimited access to documents on the Internet. The use of intelligent agents is suggested to assist users to locate documents related to their interests instead of browsing the Web via primitive search engines. A number of key components in such intelligent systems are identified and a system architecture is proposed. In particular, a learning agent is designed along with the underlying algorithms for the discovery of areas of interest from user access logs. The discovered topics can be used to improve the efficiency of information retrieval by prefetching documents for the users and storing then in a document database in the system. A prototype system has also been implemented to illustrate the various concepts. Experiments are performed which show that the area of interest discovered can in fact be used to improve the efficiency of information retrieval on a distributed information system such as the Internet.

extending database technology | 1996

Database Support for Efficiently Maintaining Derived Data

Brad Adelberg; Ben Kao; Hector Garcia-Molina

Derived data is maintained in a database system to correlate and summarize base data which record real world facts. As base data changes, derived data needs to be recomputed. A high performance system should execute all these updates and recomputations in a timely fashion so that the data remains fresh and useful, while at the same time executing user transactions quickly. This paper studies the intricate balance between recomputing derived data and transaction execution. Our focus is on efficient recomputation strategies — how and when recomputations should be done to reduce their cost without jeopardizing data timeliness. We propose the Forced Delay recomputation algorithm and show how it can exploit update locality to improve both data freshness and transaction response time.

Explore More