Is this you? Create Your Porfile

Ada Wai-Chee Fu

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ada Wai-Chee Fu is active.

Explore More

Publication

Featured researches published by Ada Wai-Chee Fu.

international conference on data engineering | 1999

Efficient time series matching by wavelets

Kin-Pong Chan; Ada Wai-Chee Fu

Time series stored as feature vectors can be indexed by multidimensional index trees like R-Trees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete Fourier Transform (DFT) Discrete Wavelet Transform (DWT), Karhunen-Loeve (KL) transform or Singular Value Decomposition (SVD) can be applied. While the use of DFT and K-L transform or SVD have been studied on the literature, to our knowledge, there is no in-depth study on the application of DWT. In this paper we propose to use Haar Wavelet Transform for time series indexing. The major contributions are: (1) we show that Euclidean distance is preserved in the Haar transformed domain and no false dismissal will occur, (2) we show that Haar transform can outperform DFT through experiments, (3) a new similarity model is suggested to accommodate vertical shift of time series, and (4) a two-phase method is proposed for efficient n-nearest neighbor query in time series databases.

knowledge discovery and data mining | 2006

(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Raymond Chi-Wing Wong; Jiuyong Li; Ada Wai-Chee Fu; Ke Wang

Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information. In this paper, we propose an (α, k)-anonymity model to protect both identifications and relationships to sensitive information in data. We discuss the properties of (α, k)-anonymity model. We prove that the optimal (α, k)-anonymity problem is NP-hard. We first presentan optimal global-recoding method for the (α, k)-anonymity problem. Next we propose a local-recoding algorithm which is more scalable and result in less data distortion. The effectiveness and efficiency are shown by experiments. We also describe how the model can be extended to more general case.

knowledge discovery and data mining | 1999

Entropy-based subspace clustering for mining numerical data

Chun Hung Cheng; Ada Wai-Chee Fu; Yi Zhang

Mining numerical data is a relatively difficult problem in data mining. Clustering is one of the techniques. We consider a database with numerical attributes, in which each transaction is viewed as a multi-dimensional vector. By studying the clusters formed by these vectors, we can discover certain behaviors hidden in the data. Traditional clustering algorithms find clusters in the full space of the data sets. This results in high dimensional clusters, which are poorly comprehensible to human. One important task in this setting is the ability to discover clusters embedded in the subspaces of a high-dimensional data set. This problem is known as subspace clustering. We follow the basic assumptions of previous work CLIQUE. It is found that the number of subspaces with clustering is very large, and a criterion called the coverage is proposed in CLIQUE for the pruning. In addition to coverage, we identify new useful criteria for this problem and propose an entropybased algorithm called ENCLUS to handle the criteria. Our major contributions are: (1) identify new meaningful criteria of high density and correlation of dimensions for goodness of clustering in subspaces, (2) introduce the use of entropy and provide evidence to support its use, (3) make use of two closure properties based on entropy to prune away uninteresting subspaces efficiently, (4) propose a mechanism to mine non-minimally correlated subspaces which are of interest because of strong clustering, (5) experiments are carried out to show the effectiveness of the proposed method.

international conference on parallel and distributed information systems | 1996

A fast distributed algorithm for mining association rules

David W. Cheung; Jiawei Han; Vincent T. Y. Ng; Ada Wai-Chee Fu; Yongjian Fu

With the existence of many large transaction databases, the huge amounts of data, the high scalability of distributed systems, and the easy partitioning and distribution of a centralized database, it is important to investigate efficient methods for distributed mining of association rules. The study discloses some interesting relationships between locally large and globally large item sets and proposes an interesting distributed association rule mining algorithm, FDM (fast distributed mining of association rules), which generates a small number of candidate sets and substantially reduces the number of messages to be passed at mining association rules. A performance study shows that FDM has a superior performance over the direct application of a typical sequential algorithm. Further performance enhancement leads to a few variations of the algorithm.

international conference on data mining | 2005

HOT SAX: efficiently finding the most unusual time series subsequence

Eamonn J. Keogh; Jessica Lin; Ada Wai-Chee Fu

In this work, we introduce the new problem of finding time series discords. Time series discords are subsequences of a longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. Time series discords have many uses for data mining, including improving the quality of clustering, data cleaning, summarization, and anomaly detection. Discords are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. We evaluate our work with a comprehensive set of experiments. In particular, we demonstrate the utility of discords with objective experiments on domains as diverse as Space Shuttle telemetry monitoring, medicine, surveillance, and industry, and we demonstrate the effectiveness of our discord discovery algorithm with more than one million experiments, on 82 different datasets from diverse domains.

international database engineering and applications symposium | 1998

Mining association rules with weighted items

C. H. Cai; Ada Wai-Chee Fu; C. H. Cheng; W. W. Kwong

Discovery of association rules has been found useful in many applications. In previous work, all items in a basket database are treated uniformly. We generalize this to the case where items are given weights to reflect their importance to the user. The weights may correspond to special promotions on some products, or the profitability of different items. We can mine the weighted association rules with weights. The downward closure property of the support measure in the unweighted case no longer exists and previous algorithms cannot be applied. In this paper, two new algorithms are introduced to handle this problem. In these algorithms we make use of a metric called the k-support bound in the mining process. Experimental results show the efficiency of the algorithms for large databases.

IEEE Transactions on Knowledge and Data Engineering | 1996

Efficient mining of association rules in distributed databases

David W. Cheung; Vincent T. Y. Ng; Ada Wai-Chee Fu; Yongjian Fu

Many sequential algorithms have been proposed for the mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In this study, an efficient algorithm called DMA (Distributed Mining of Association rules), is proposed. It generates a small number of candidate sets and requires only O(n) messages for support-count exchange for each candidate set, where n is the number of sites in a distributed database. The algorithm has been implemented on an experimental testbed, and its performance is studied. The results show that DMA has superior performance, when compared with the direct application of a popular sequential algorithm, in distributed databases.

knowledge discovery and data mining | 2006

Utility-based anonymization using local recoding

Jian Xu; Wei Wang; Jian Pei; Xiaoyuan Wang; Baile Shi; Ada Wai-Chee Fu

Privacy becomes a more and more serious concern in applications involving microdata. Recently, efficient anonymization has attracted much research work. Most of the previous methods use global recoding, which maps the domains of the quasi-identifier attributes to generalized or changed values. However, global recoding may not always achieve effective anonymization in terms of discernability and query answering accuracy using the anonymized data. Moreover, anonymized data is often for analysis. As well accepted in many analytical applications, different attributes in a data set may have different utility in the analysis. The utility of attributes has not been considered in the previous methods.In this paper, we study the problem of utility-based anonymization. First, we propose a simple framework to specify utility of attributes. The framework covers both numeric and categorical data. Second, we develop two simple yet efficient heuristic local recoding methods for utility-based anonymization. Our extensive performance study using both real data sets and synthetic data sets shows that our methods outperform the state-of-the-art multidimensional global recoding methods in both discernability and query answering accuracy. Furthermore, our utility-based method can boost the quality of analysis using the anonymized data.

international conference on management of data | 2010

K-isomorphism: privacy preserving network publication against structural attacks

James Cheng; Ada Wai-Chee Fu; Jia Liu

Serious concerns on privacy protection in social networks have been raised in recent years; however, research in this area is still in its infancy. The problem is challenging due to the diversity and complexity of graph data, on which an adversary can use many types of background knowledge to conduct an attack. One popular type of attacks as studied by pioneer work [2] is the use of embedding subgraphs. We follow this line of work and identify two realistic targets of attacks, namely, NodeInfo and LinkInfo. Our investigations show that k-isomorphism, or anonymization by forming k pairwise isomorphic subgraphs, is both sufficient and necessary for the protection. The problem is shown to be NP-hard. We devise a number of techniques to enhance the anonymization efficiency while retaining the data utility. A compound vertex ID mechanism is also introduced for privacy preservation over multiple data releases. The satisfactory performance on a number of real datasets, including HEP-Th, EUemail and LiveJournal, illustrates that the high symmetry of social networks is very helpful in mitigating the difficulty of the problem.

IEEE Transactions on Knowledge and Data Engineering | 2003

Haar wavelets for efficient similarity search of time-series: with and without time warping

Franky Kin-Pong Chan; Ada Wai-Chee Fu; Clement T. Yu

We address the handling of time series search based on two important distance definitions: Euclidean distance and time warping distance. The conventional method reduces the dimensionality by means of a discrete Fourier transform. We apply the Haar wavelet transform technique and propose the use of a proper normalization so that the method can guarantee no false dismissal for Euclidean distance. We found that this method has competitive performance from our experiments. Euclidean distance measurement cannot handle the time shifts of patterns. It fails to match the same rise and fall patterns of sequences with different scales. A distance measure that handles this problem is the time warping distance. However, the complexity of computing the time warping distance function is high. Also, as time warping distance is not a metric, most indexing techniques would not guarantee any false dismissal. We propose efficient strategies to mitigate the problems of time warping. We suggest a Haar wavelet-based approximation function for time warping distance, called Low Resolution Time Warping, which results in less computation by trading off a small amount of accuracy. We apply our approximation function to similarity search in time series databases, and show by experiment that it is highly effective in suppressing the number of false alarms in similarity search.

Explore More