Is this you? Create Your Porfile

Wai-Ho Au

Hong Kong Polytechnic University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wai-Ho Au is active.

Explore More

Publication

Featured researches published by Wai-Ho Au.

IEEE Transactions on Evolutionary Computation | 2003

A novel evolutionary data mining algorithm with applications to churn prediction

Wai-Ho Au; Keith C. C. Chan; Xin Yao

Classification is an important topic in data mining research. Given a set of data records, each of which belongs to one of a number of predefined classes, the classification problem is concerned with the discovery of classification rules that can allow records with unknown class membership to be correctly classified. Many algorithms have been developed to mine large data sets for classification models and they have been shown to be very effective. However, when it comes to determining the likelihood of each classification made, many of them are not designed with such purpose in mind. For this, they are not readily applicable to such problems as churn prediction. For such an application, the goal is not only to predict whether or not a subscriber would switch from one carrier to another, it is also important that the likelihood of the subscribers doing so be predicted. The reason for this is that a carrier can then choose to provide a special personalized offer and services to those subscribers who are predicted with higher likelihood to churn. Given its importance, we propose a new data mining algorithm, called data mining by evolutionary learning (DMEL), to handle classification problems of which the accuracy of each predictions made has to be estimated. In performing its tasks, DMEL searches through the possible rule space using an evolutionary approach that has the following characteristics: 1) the evolutionary process begins with the generation of an initial set of first-order rules (i.e., rules with one conjunct/condition) using a probabilistic induction technique and based on these rules, rules of higher order (two or more conjuncts) are obtained iteratively; 2) when identifying interesting rules, an objective interestingness measure is used; 3) the fitness of a chromosome is defined in terms of the probability that the attribute values of a record can be correctly determined using the rules it encodes; and 4) the likelihood of predictions (or classifications) made are estimated so that subscribers can be ranked according to their likelihood to churn. Experiments with different data sets showed that DMEL is able to effectively discover interesting classification rules. In particular, it is able to predict churn accurately under different churn rates when applied to real telecom subscriber data.

conference on information and knowledge management | 1997

Mining fuzzy association rules

Keith C. C. Chan; Wai-Ho Au

In his paper, we introduce a novel technique, called F-APACS, for mining jkzy association rules. &istlng algorithms involve discretizing the domains of quantitative attrilmtes into intervals so as to discover quantitative association rules. i%ese intervals may not be concise and meaning@ enough for human experts to easily obtain nontrivial knowledge from those rules discovered. Instead of using intervals, F-APACS employs linguistic terms to represent the revealed regularities and exceptions. The linguistic representation is especially usefil when those rules discovered are presented to human experts for examination. The definition of linguistic terms is based onset theory and hence we call the rides having these terms fuzzy association rules. The use of fq techniques makes F-APACS resilient to noises such as inaccuracies in physical measurements of real-life entities and missing values in the databases. Furthermore, F-APACS employs adjusted difference analysis which has the advantage that it does not require any user-supplied thresholds which are often hard to determine. The fact that F-APACS is able to mine fiuy association rules which utilize linguistic representation and that it uses an objective yet meanhg@ confidence measure to determine the interestingness of a rule makes it vety effective at the discovery of rules from a real-life transactional database of a PBX system provided by a telecommunication corporation

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005

Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

Wai-Ho Au; Keith C. C. Chan; Andrew K. C. Wong; Yang Wang

This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.

ieee international conference on fuzzy systems | 1999

FARM: a data mining system for discovering fuzzy association rules

Wai-Ho Au; Keith C. C. Chan

In this paper, we introduce a novel technique, called FARM, for mining fuzzy association rules. FARM employs linguistic terms to represent the revealed regularities and exceptions. The linguistic representation is especially useful when those rules discovered are presented to human experts for examination because of the affinity with the human knowledge representations. The definition of linguistic terms is based on fuzzy set theory and hence we call the rules having these terms fuzzy association rules. The use of fuzzy technique makes FARM resilient to noises such as inaccuracies in physical measurements of real-life entities and missing values in the databases. Furthermore, FARM utilizes adjusted difference analysis which has the advantage that it does not require any user-supplied thresholds which are often hard to determine. In addition to this interestingness measure, FARM has another unique feature that the conclusions of a fuzzy association rule can contain linguistic terms. Our technique also provides a mechanism to allow quantitative values be inferred from fuzzy association rules. Unlike other data mining techniques that can only discover association rules between different discretized values, FARM is able to reveal interesting relationships between different quantitative values. Our experimental results showed that FARM is capable of discovering meaningful and useful fuzzy association rules in an effective manner from a real-life database.

IEEE Transactions on Fuzzy Systems | 2003

Mining fuzzy association rules in a bank-account database

Wai-Ho Au; Keith C. C. Chan

This paper describes how we applied a fuzzy technique to a data-mining task involving a large database that was provided by an international bank with offices in Hong Kong. The database contains the demographic data of over 320,000 customers and their banking transactions, which were collected over a six-month period. By mining the database, the bank would like to be able to discover interesting patterns in the data. The bank expected that the hidden patterns would reveal different characteristics about different customers so that they could better serve and retain them. To help the bank achieve its goal, we developed a fuzzy technique, called fuzzy association rule mining II (FARM II). FARM II is able to handle both relational and transactional data. It can also handle fuzzy data. The former type of data allows FARM II to discover multidimensional association rules, whereas the latter data allows some of the patterns to be more easily revealed and expressed. To effectively uncover the hidden associations in the bank-account database, FARM II performs several steps which are described in detail in this paper. With FARM II, the bank discovered that they had identified some interesting characteristics about the customers who had once used the banks loan services but then decided later to cease using them. The bank translated what they discovered into actionable items by offering some incentives to retain their existing customers.

Fuzzy Sets and Systems | 2005

Mining changes in association rules: a fuzzy approach

Wai-Ho Au; Keith C. C. Chan

Association rule mining is concerned with the discovery of interesting association relationships hidden in databases. Existing algorithms typically assume that data characteristics are stable over time. Their main focus is therefore to mine association rules in an efficient manner. However, the world constantly changes. This makes the characteristics of real-life entities represented by the data and hence the associations hidden in the data change over time. Detecting and adapting to the changes are usually critical to the success of many business organizations. This paper presents the problem of mining changes in association rules. Given a set of database partitions, each of which contains a set of transactions collected in a specific time period, a set of association rules is discovered in each database partition. We propose to perform data mining in the discovered association rules so as to reveal the regularities governing how the rules change in different time periods. Since the nature of many real-life entities is rather fuzzy, we propose to use linguistic variables and linguistic terms to represent the changes in the discovered association rules. In particular, fuzzy decision trees are built to discover the changes in the discovered association rules. The fuzzy decision trees are then converted to a set of fuzzy rules, called fuzzy meta-rules because they are rules about rules. By doing so, the changes hidden in the data can be revealed and presented to human users in a comprehensible form. Furthermore, the discovered changes can also be used to predict any change in the future. To evaluate the performance of our approach, we make use of a set of synthetic datasets, which are database partitions collected in different time periods. A set of association rules is discovered in each dataset. Fuzzy decision trees are constructed in the discovered association rules in order to reveal the changes in these rules. The experimental results show that our approach is very promising.

international conference on data mining | 2001

Classification with degree of membership: a fuzzy approach

Wai-Ho Au; Keith C. C. Chan

Classification is an important topic in data mining research. It is concerned with the prediction of the values of some attribute in a database based on other attributes. To tackle this problem, most of the existing data mining algorithms adopt either a decision tree based approach or an approach that requires users to provide some user-specified thresholds to guide the search for interesting rules. The authors propose a new approach based on the use of an objective interestingness measure to distinguish interesting rules from uninteresting ones. Using linguistic terms to represent the revealed regularities and exceptions, this approach is especially useful when the discovered rules are presented to human experts for examination because of the affinity with the human knowledge representation. The use of a fuzzy technique allows the prediction of attribute values to be associated with degree of membership. Our approach is therefore able to deal with the cases where an object can belong to more than one class. Furthermore, our approach is more resilient to noise and missing data values because of the use of a fuzzy technique. To evaluate the performance of our approach, we tested it using several real-life databases. The experimental results show that it can be very effective at data mining tasks. When compared to popular data mining algorithms, the approach is better able to uncover useful rules hidden in databases.

acm symposium on applied computing | 1997

An effective algorithm for mining interesting quantitative association rules

Keith C. C. Chan; Wai-Ho Au

In this paper, we describe a novel technique, called APACS2, for mining interesting quantitative association rules from very large databases. To effectively mine such rules, APACS2 employs adjusted difference analysis. The use of this technique has the advantage that it does not require any user-supplied thresholds which are often hard to determine. Furthermore, APACS2 also has the advantage that it allows users to discover both positive and negative association rules. A positive association rule tells us that a record having certain attribute value will also have another attribute value whereas a negative association rule tells us that a record having certain attribute value will not have another attribute value. The fact that APACS2 is able to mine both positive and negative association rules and that it uses an objective yet meaningful measure to determine the interestingness of a rule makes it very effective at different data mining tasks.

IEEE Transactions on Knowledge and Data Engineering | 2006

A fuzzy approach to partitioning continuous attributes for classification

Wai-Ho Au; Keith C. C. Chan; Andrew K. C. Wong

Classification is an important topic in data mining research. To better handle continuous data, fuzzy sets are used to represent interval events in the domains of continuous attributes, allowing continuous data lying on the interval boundaries to partially belong to multiple intervals. Since the membership functions of fuzzy sets can profoundly affect the performance of the models or rules discovered, the determination of membership functions or fuzzy partitioning is crucial. In this paper, we present a new method to determine the membership functions of fuzzy sets directly from data to maximize the class-attribute interdependence and, hence, improve the classification results. In other words, it forms a fuzzy partition of the input space automatically, using an information-theoretic measure to evaluate the interdependence between the class membership and an attribute as the objective function for fuzzy partitioning. To find the optimum of the measure, it employs fractional programming. To evaluate the effectiveness of the proposed method, several real-world data sets are used in our experiments. The experimental results show that this method outperforms other well-known discretization and fuzzy partitioning approaches.

ieee international conference on fuzzy systems | 2004

Mining fuzzy rules for time series classification

Wai-Ho Au; Keith C. C. Chan

Time series classification is concerned about discovering classification models in a database of pre-classified time series and using them to classify unseen time series. To better handle the noises and fuzziness in time series data, we propose a new data mining technique to mine fuzzy rules in the data. The fuzzy rules discovered employ fuzzy sets to represent the revealed regularities and exceptions. The resilience of fuzzy sets to noises allows the proposed approach to better handle the noises embedded in the data. Furthermore, it uses the adjusted residual as an objective measure to evaluate the interestingness of association relationships hidden in the data. The adjusted residual analysis allows the differentiation of interesting relationships from uninteresting ones without any user-specified thresholds. To evaluate the performance of the proposed approach, we applied it to several well-known time series datasets. The experimental results showed that our approach is very promising.

Explore More