Is this you? Create Your Porfile

Raymond Chi-Wing Wong

Hong Kong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Raymond Chi-Wing Wong is active.

Explore More

Publication

Featured researches published by Raymond Chi-Wing Wong.

knowledge discovery and data mining | 2006

(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Raymond Chi-Wing Wong; Jiuyong Li; Ada Wai-Chee Fu; Ke Wang

Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information. In this paper, we propose an (α, k)-anonymity model to protect both identifications and relationships to sensitive information in data. We discuss the properties of (α, k)-anonymity model. We prove that the optimal (α, k)-anonymity problem is NP-hard. We first presentan optimal global-recoding method for the (α, k)-anonymity problem. Next we propose a local-recoding algorithm which is more scalable and result in less data distortion. The effectiveness and efficiency are shown by experiments. We also describe how the model can be extended to more general case.

very large data bases | 2005

Scaling and time warping in time series querying

Ada Wai-Chee Fu; Eamonn J. Keogh; Leo Yung Hang Lau; Chotirat Ann Ratanamahatana; Raymond Chi-Wing Wong

The last few years have seen an increasing understanding that dynamic time warping (DTW), a technique that allows local flexibility in aligning time series, is superior to the ubiquitous Euclidean distance for time series classification, clustering, and indexing. More recently, it has been shown that for some problems, uniform scaling (US), a technique that allows global scaling of time series, may just be as important for some problems. In this work, we note that for many real world problems, it is necessary to combine both DTW and US to achieve meaningful results. This is particularly true in domains where we must account for the natural variability of human actions, including biometrics, query by humming, motion-capture/animation, and handwriting recognition. We introduce the first technique which can handle both DTW and US simultaneously, our techniques involve search pruning by means of a lower bounding technique and multi-dimensional indexing to speed up the search. We demonstrate the utility and effectiveness of our method on a wide range of problems in industry, medicine, and entertainment.

very large data bases | 2009

Efficient method for maximizing bichromatic reverse nearest neighbor

Raymond Chi-Wing Wong; M. Tamer Özsu; Philip S. Yu; Ada Wai-Chee Fu; Lian Liu

Bichromatic reverse nearest neighbor (BRNN) has been extensively studied in spatial database literature. In this paper, we study a related problem called MaxBRNN: find an optimal region that maximizes the size of BRNNs. Such a problem has many real life applications, including the problem of finding a new server point that attracts as many customers as possible by proximity. A straightforward approach is to determine the BRNNs for all possible points that are not feasible since there are a large (or infinite) number of possible points. To the best of our knowledge, the fastest known method has exponential time complexity on the data size. Based on some interesting properties of the problem, we come up with an efficient algorithm called MaxOverlap. Extensive experiments are conducted to show that our algorithm is many times faster than the best-known technique.

international conference on computer communications | 2008

On the Capacity of Multi-Channel Wireless Networks Using Directional Antennas

Hong-Ning Dai; Kam-Wing Ng; Raymond Chi-Wing Wong; Min-You Wu

The capacity of wireless ad hoc networks is affected by two key factors: the interference among concurrent transmissions and the number of simultaneous transmissions on a single interface. Recent studies found that using multiple channels can separate concurrent transmissions and greatly improve network throughput. However, those studies only consider that wireless nodes are equipped with only omnidirectional antennas, which cause high collisions. On the other hand, some researchers found that directional antennas bring more benefits such as reduced interference and increased spatial reuse compared with omnidirectional antennas. But, they only focused on a single-channel network which only allows finite concurrent transmissions. Thus, combining the two technologies of multiple channels and directional antennas together potentially brings more benefits. In this paper, we propose a multi-channel network architecture (called MC-MDA) that equips each wireless node with multiple directional antennas. We derive the capacity bounds of MC-MDA networks under arbitrary and random placements. We will show that deploying directional antennas to multi-channel networks can greatly improve the network capacity due to increased network connectivity and reduced interference. We have also found that even a multi-channel network with a single directional antenna only at each node can give a significant improvement on the throughput capacity. Besides, using multiple channels mitigates interference caused by directional antennas. MC-MDA networks integrate benefits from multi-channel and directional antennas and thus have significant performance improvement.

data warehousing and knowledge discovery | 2006

Achieving k -anonymity by clustering in attribute hierarchical structures

Jiuyong Li; Raymond Chi-Wing Wong; Ada Wai-Chee Fu; Jian Pei

Individual privacy will be at risk if a published data set is not properly de-identified. k-anonymity is a major technique to de-identify a data set. A more general view of k-anonymity is clustering with a constraint of the minimum number of objects in every cluster. Most existing approaches to achieving k-anonymity by clustering are for numerical (or ordinal) attributes. In this paper, we study achieving k-anonymity by clustering in attribute hierarchical structures. We define generalisation distances between tuples to characterise distortions by generalisations and discuss the properties of the distances. We conclude that the generalisation distance is a metric distance. We propose an efficient clustering-based algorithm for k-anonymisation. We experimentally show that the proposed method is more scalable and causes significantly less distortions than an optimal global recoding k-anonymity method.

Data Mining and Knowledge Discovery | 2006

Mining top-K frequent itemsets from data streams

Raymond Chi-Wing Wong; Ada Wai-Chee Fu

Frequent pattern mining on data streams is of interest recently. However, it is not easy for users to determine a proper frequency threshold. It is more reasonable to ask users to set a bound on the result size. We study the problem of mining top K frequent itemsets in data streams. We introduce a method based on the Chernoff bound with a guarantee of the output quality and also a bound on the memory usage. We also propose an algorithm based on the Lossy Counting Algorithm. In most of the experiments of the two proposed algorithms, we obtain perfect solutions and the memory space occupied by our algorithms is very small. Besides, we also propose the adapted approach of these two algorithms in order to handle the case when we are interested in mining the data in a sliding window. The experiments show that the results are accurate.

very large data bases | 2008

Privacy preserving serial data publishing by role composition

Yingyi Bu; Ada Wai-Chee Fu; Raymond Chi-Wing Wong; Lei Chen; Jiuyong Li

Previous works about privacy preserving serial data publishing on dynamic databases have relied on unrealistic assumptions of the nature of dynamic databases. In many applications, some sensitive values changes freely while others never change. For example, in medical applications, the disease attribute changes with time when patients recover from one disease and develop another disease. However, patients do not recover from some diseases such as HIV. We call such diseases permanent sensitive values. To the best of our knowledge, none of the existing solutions handle these realistic issues. We propose a novel anonymization approach called HD-composition to solve the above problems. Extensive experiments with real data confirm our theoretical results.

Data Mining and Knowledge Discovery | 2005

Data Mining for Inventory Item Selection with Cross-Selling Considerations

Raymond Chi-Wing Wong; Ada Wai-Chee Fu; Ke Wang

Association rule mining, studied for over ten years in the literature of data mining, aims to help enterprises with sophisticated decision making, but the resulting rules typically cannot be directly applied and require further processing. In this paper, we propose a method for actionable recommendations from itemset analysis and investigate an application of the concepts of association rules—maximal-profit item selection with cross-selling effect (MPIS). This problem is about choosing a subset of items which can give the maximal profit with the consideration of cross-selling effect. A simple approach to this problem is shown to be NP-hard. A new approach is proposed with consideration of the loss rule—a rule similar to the association rule—to model the cross-selling effect. We show that MPIS can be approximated by a quadratic programming problem. We also propose a greedy approach and a genetic algorithm to deal with this problem. Experiments are conducted, which show that our proposed approaches are highly effective and efficient.

international conference on management of data | 2013

Collective spatial keyword queries: a distance owner-driven approach

Cheng Long; Raymond Chi-Wing Wong; Ke Wang; Ada Wai-Chee Fu

Recently, spatial keyword queries become a hot topic in the literature. One example of these queries is the collective spatial keyword query (CoSKQ) which is to find a set of objects in the database such that it covers a set of given keywords collectively and has the smallest cost. Unfortunately, existing exact algorithms have severe scalability problems and existing approximate algorithms, though scalable, cannot guarantee near-to-optimal solutions. In this paper, we study the CoSKQ problem and address the above issues. Firstly, we consider the CoSKQ problem using an existing cost measurement called the maximum sum cost. This problem is called MaxSum-CoSKQ and is known to be NP-hard. We observe that the maximum sum cost of a set of objects is dominated by at most three objects which we call the distance owners of the set. Motivated by this, we propose a distance owner-driven approach which involves two algorithms: one is an exact algorithm which runs faster than the best-known existing algorithm by several orders of magnitude and the other is an approximate algorithm which improves the best-known constant approximation factor from 2 to 1.375. Secondly, we propose a new cost measurement called diameter cost and CoSKQ with this measurement is called Dia-CoSKQ. We prove that Dia-CoSKQ is NP-hard. With the same distance owner-driven approach, we design two algorithms for Dia-CoSKQ: one is an exact algorithm which is efficient and scalable and the other is an approximate algorithm which gives a √3-factor approximation. We conducted extensive experiments on real datasets which verified that the proposed exact algorithms are scalable and the proposed approximate algorithms return near-to-optimal solutions.

ACM Transactions on Knowledge Discovery From Data | 2011

Can the Utility of Anonymized Data be Used for Privacy Breaches

Raymond Chi-Wing Wong; Ada Wai-Chee Fu; Ke Wang; Philip S. Yu; Jian Pei

Group based anonymization is the most widely studied approach for privacy-preserving data publishing. Privacy models/definitions using group based anonymization includes k-anonymity, l-diversity, and t-closeness, to name a few. The goal of this article is to raise a fundamental issue regarding the privacy exposure of the approaches using group based anonymization. This has been overlooked in the past. The group based anonymization approach by bucketization basically hides each individual record behind a group to preserve data privacy. If not properly anonymized, patterns can actually be derived from the published data and be used by an adversary to breach individual privacy. For example, from the medical records released, if patterns such as that people from certain countries rarely suffer from some disease can be derived, then the information can be used to imply linkage of other people in an anonymized group with this disease with higher likelihood. We call the derived patterns from the published data the foreground knowledge. This is in contrast to the background knowledge that the adversary may obtain from other channels, as studied in some previous work. Finally, our experimental results show such an attack is realistic in the privacy benchmark dataset under the traditional group based anonymization approach.

Explore More