Keith C. C. Chan
Hong Kong Polytechnic University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Keith C. C. Chan.
Annals of Internal Medicine | 2011
Edward J Mills; Celestin Bakanda; Josephine Birungi; Keith C. C. Chan; Nathan Ford; Curtis Cooper; Jean B. Nachega; Mark Dybul; Robert S. Hogg
BACKGROUND Little is known about the effect of combination antiretroviral therapy (cART) on life expectancy in sub-Saharan Africa. OBJECTIVE To estimate life expectancy of patients once they initiate cART in Uganda. DESIGN Prospective cohort study. SETTING Public sector HIV and AIDS disease-management program in Uganda. PATIENTS 22 315 eligible patients initiated cART during the study period, of whom 1943 were considered to have died. MEASUREMENTS All-cause mortality rates were calculated and abridged life tables were constructed and stratified by sex and baseline CD4 cell count status to estimate life expectancies for patients receiving cART. The average number of years remaining to be lived by patients who received cART at varying age categories was estimated. RESULTS After adjustment for loss to follow-up, crude mortality rates (deaths per 1000 person-years) ranged from 26.9 (95% CI, 25.4 to 28.5) in women to 43.9 (CI, 40.7 to 47.0) in men. For patients with a baseline CD4 cell count less than 0.050 × 10(9) cells/L, the mortality rate was 67.3 (CI, 62.1 to 72.9) deaths per 1000 person-years, whereas among persons with a baseline CD4 cell count of 0.250 × 10(9) cells/L or more, the mortality rate was 19.1 (CI, 16.0 to 22.7) deaths per 1000 person-years. Life expectancy at age 20 years for the overall cohort was 26.7 (CI, 25.0 to 28.4) additional years and at age 35 years was 27.9 (CI, 26.7 to 29.1) additional years. Life expectancy increased substantially with increasing baseline CD4 cell count. Similar trends are observed for older age groups. LIMITATIONS A small (6.4%) proportion of patients were lost to follow-up, and it was imputed that 30% of these patients had died. Few patients with a CD4 cell count greater than 0.250 × 10(9) cells/L initiated cART. CONCLUSION Ugandan patients receiving cART can expect an almost normal life expectancy, although there is considerable variability among subgroups of patients. PRIMARY FUNDING SOURCE Canadian Institutes of Health Research.
IEEE Transactions on Evolutionary Computation | 2003
Wai-Ho Au; Keith C. C. Chan; Xin Yao
Classification is an important topic in data mining research. Given a set of data records, each of which belongs to one of a number of predefined classes, the classification problem is concerned with the discovery of classification rules that can allow records with unknown class membership to be correctly classified. Many algorithms have been developed to mine large data sets for classification models and they have been shown to be very effective. However, when it comes to determining the likelihood of each classification made, many of them are not designed with such purpose in mind. For this, they are not readily applicable to such problems as churn prediction. For such an application, the goal is not only to predict whether or not a subscriber would switch from one carrier to another, it is also important that the likelihood of the subscribers doing so be predicted. The reason for this is that a carrier can then choose to provide a special personalized offer and services to those subscribers who are predicted with higher likelihood to churn. Given its importance, we propose a new data mining algorithm, called data mining by evolutionary learning (DMEL), to handle classification problems of which the accuracy of each predictions made has to be estimated. In performing its tasks, DMEL searches through the possible rule space using an evolutionary approach that has the following characteristics: 1) the evolutionary process begins with the generation of an initial set of first-order rules (i.e., rules with one conjunct/condition) using a probabilistic induction technique and based on these rules, rules of higher order (two or more conjuncts) are obtained iteratively; 2) when identifying interesting rules, an objective interestingness measure is used; 3) the fitness of a chromosome is defined in terms of the probability that the attribute values of a record can be correctly determined using the rules it encodes; and 4) the likelihood of predictions (or classifications) made are estimated so that subscribers can be ranked according to their likelihood to churn. Experiments with different data sets showed that DMEL is able to effectively discover interesting classification rules. In particular, it is able to predict churn accurately under different churn rates when applied to real telecom subscriber data.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 1995
John Y. Ching; Andrew K. C. Wong; Keith C. C. Chan
Inductive learning systems can be effectively used to acquire classification knowledge from examples. Many existing symbolic learning algorithms can be applied in domains with continuous attributes when integrated with a discretization algorithm to transform the continuous attributes into ordered discrete ones. In this paper, a new information theoretic discretization method optimized for supervised learning is proposed and described. This approach seeks to maximize the mutual dependence as measured by the interdependence redundancy between the discrete intervals and the class labels, and can automatically determine the most preferred number of intervals for an inductive learning application. The method has been tested in a number of inductive learning examples to show that the class-dependent discretizer can significantly improve the classification performance of many existing learning algorithms in domains containing numeric attributes. >
conference on information and knowledge management | 1997
Keith C. C. Chan; Wai-Ho Au
In his paper, we introduce a novel technique, called F-APACS, for mining jkzy association rules. &istlng algorithms involve discretizing the domains of quantitative attrilmtes into intervals so as to discover quantitative association rules. i%ese intervals may not be concise and meaning@ enough for human experts to easily obtain nontrivial knowledge from those rules discovered. Instead of using intervals, F-APACS employs linguistic terms to represent the revealed regularities and exceptions. The linguistic representation is especially usefil when those rules discovered are presented to human experts for examination. The definition of linguistic terms is based onset theory and hence we call the rides having these terms fuzzy association rules. The use of fq techniques makes F-APACS resilient to noises such as inaccuracies in physical measurements of real-life entities and missing values in the databases. Furthermore, F-APACS employs adjusted difference analysis which has the advantage that it does not require any user-supplied thresholds which are often hard to determine. The fact that F-APACS is able to mine fiuy association rules which utilize linguistic representation and that it uses an objective yet meanhg@ confidence measure to determine the interestingness of a rule makes it vety effective at the discovery of rules from a real-life transactional database of a PBX system provided by a telecommunication corporation
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005
Wai-Ho Au; Keith C. C. Chan; Andrew K. C. Wong; Yang Wang
This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.
Journal of Materials Processing Technology | 2002
Joanne Yip; Keith C. C. Chan; Kwan Moon Sin; Kai Shui Lau
Abstract Nylon 6 fabrics were treated with low temperature plasma (LTP) with three non-polymerizing gases: (i) oxygen, (ii) argon and (iii) tetrafluoromethane. After plasma treatment, the properties of the fabric, including surface morphology, low-stress mechanical properties, air permeability and thermal properties, were investigated. The nylon fabrics treated with different plasma gases exhibited different morphological changes. Low-stress mechanical properties obtained by means of the Kawabata evaluation system fabric (KES-F) revealed that the surface friction, tensile, shearing, bending and compression properties altered after the treatments. The changes in these properties are believed to be related closely to the inter-fiber/inter-yarn frictional force induced by the LTP treatment. A slightly decrease in the air permeability of the treated fabrics was found which is probably due to plasma action effecting on increase in the fabric thickness and a change in the fabric surface morphology. The change in the thermal properties of the treated fabrics was in good agreement with the above findings and can be attributed to the amount of air trapped between the yarns. This experimental work suggests that the changed properties induced by LTP can effect an improvement in certain textile products.
International Journal of Quality & Reliability Management | 1999
Hareton K. N. Leung; Keith C. C. Chan; Tat Y. Lee
This paper presents the result of a study to identify the costs and benefits of obtaining ISO 9000 certification. Toward this goal, a survey of some 500 ISO 9000 certified companies has been carried out. Among them, more than 65 per cent believe that ISO 9001 certification is worthwhile, and more than 76 per cent believe that the cost of certification is inexpensive. The results indicate that companies which seek certification because of their customers’ request seem to gain less benefit from ISO 9000 certification. We also found that concern for high costs is much less after initial certification. In addition, we discovered that contrary to many people’s expectation, some factors do not have any bearing on whether benefits outweigh costs. These factors include time taken to get certified, number of years since certification, and reason for certification. Besides presenting the results of the survey, we also introduce a new classification scheme based on the company’s view on the “expensiveness” of the certification and the received benefits. There are some differences in responses from companies of different classes.
ieee international conference on fuzzy systems | 1999
Wai-Ho Au; Keith C. C. Chan
In this paper, we introduce a novel technique, called FARM, for mining fuzzy association rules. FARM employs linguistic terms to represent the revealed regularities and exceptions. The linguistic representation is especially useful when those rules discovered are presented to human experts for examination because of the affinity with the human knowledge representations. The definition of linguistic terms is based on fuzzy set theory and hence we call the rules having these terms fuzzy association rules. The use of fuzzy technique makes FARM resilient to noises such as inaccuracies in physical measurements of real-life entities and missing values in the databases. Furthermore, FARM utilizes adjusted difference analysis which has the advantage that it does not require any user-supplied thresholds which are often hard to determine. In addition to this interestingness measure, FARM has another unique feature that the conclusions of a fuzzy association rule can contain linguistic terms. Our technique also provides a mechanism to allow quantitative values be inferred from fuzzy association rules. Unlike other data mining techniques that can only discover association rules between different discretized values, FARM is able to reveal interesting relationships between different quantitative values. Our experimental results showed that FARM is capable of discovering meaningful and useful fuzzy association rules in an effective manner from a real-life database.
IEEE Transactions on Evolutionary Computation | 2006
Patrick C. H. Ma; Keith C. C. Chan; Xin Yao; David K. Y. Chiu
Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.
IEEE Transactions on Fuzzy Systems | 2003
Wai-Ho Au; Keith C. C. Chan
This paper describes how we applied a fuzzy technique to a data-mining task involving a large database that was provided by an international bank with offices in Hong Kong. The database contains the demographic data of over 320,000 customers and their banking transactions, which were collected over a six-month period. By mining the database, the bank would like to be able to discover interesting patterns in the data. The bank expected that the hidden patterns would reveal different characteristics about different customers so that they could better serve and retain them. To help the bank achieve its goal, we developed a fuzzy technique, called fuzzy association rule mining II (FARM II). FARM II is able to handle both relational and transactional data. It can also handle fuzzy data. The former type of data allows FARM II to discover multidimensional association rules, whereas the latter data allows some of the patterns to be more easily revealed and expressed. To effectively uncover the hidden associations in the bank-account database, FARM II performs several steps which are described in detail in this paper. With FARM II, the bank discovered that they had identified some interesting characteristics about the customers who had once used the banks loan services but then decided later to cease using them. The bank translated what they discovered into actionable items by offering some incentives to retain their existing customers.