Is this you? Create Your Porfile

Kian-Lee Tan

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kian-Lee Tan is active.

Explore More

Publication

Featured researches published by Kian-Lee Tan.

international conference on management of data | 2008

Private queries in location based services: anonymizers are not necessary

Gabriel Ghinita; Panos Kalnis; Ali Khoshgozaran; Cyrus Shahabi; Kian-Lee Tan

Mobile devices equipped with positioning capabilities (e.g., GPS) can ask location-dependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several drawbacks: (i) All users must trust the third party anonymizer, which is a single point of attack. (ii) A large number of cooperating, trustworthy users is needed. (iii) Privacy is guaranteed only for a single snapshot of user locations; users are not protected against correlation attacks (e.g., history of user movement). We propose a novel framework to support private location-dependent queries, based on the theoretical work on Private Information Retrieval (PIR). Our framework does not require a trusted third party, since privacy is achieved via cryptographic techniques. Compared to existing work, our approach achieves stronger privacy for snapshots of user locations; moreover, it is the first to provide provable privacy guarantees against correlation attacks. We use our framework to implement approximate and exact algorithms for nearest-neighbor search. We optimize query execution by employing data mining techniques, which identify redundant computations. Contrary to common belief, the experimental results suggest that PIR approaches incur reasonable overhead and are applicable in practice.

ACM Transactions on Database Systems | 2005

iDistance: An adaptive B + -tree based indexing method for nearest neighbor search

H. V. Jagadish; Beng Chin Ooi; Kian-Lee Tan; Cui Yu; Rui Zhang

In this article, we present an efficient B+-tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data based on a space- or data-partitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B+-tree structure and KNN search to be performed using one-dimensional range search. The choice of partition and reference points adapts the index structure to the data distribution.We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.

international conference on management of data | 2006

Finding k-dominant skylines in high dimensional space

Chee Yong Chan; H. V. Jagadish; Kian-Lee Tan; Anthony K. H. Tung; Zhenjie Zhang

Given a d-dimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision making applications.Unfortunately, as the number of dimensions increases, the chance of one point dominating another point is very low. As such, the number of skyline points become too numerous to offer any interesting insights. To find more important and meaningful skyline points in high dimensional space, we propose a new concept, called k-dominant skyline which relaxes the idea of dominance to k-dominance. A point p is said to k-dominate another point q if there are k ≤ d dimensions in which p is better than or equal to q and is better in at least one of these k dimensions. A point that is not k-dominated by any other points is in the k-dominant skyline.We prove various properties of k-dominant skyline. In particular, because k-dominant skyline points are not transitive, existing skyline algorithms cannot be adapted for k-dominant skyline. We then present several new algorithms for finding k-dominant skyline and its variants. Extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently.

extending database technology | 2006

On high dimensional skylines

Chee Yong Chan; H. V. Jagadish; Kian-Lee Tan; Anthony K. H. Tung; Zhenjie Zhang

In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multi-dimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding top-k frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in high-dimensional spaces.

international conference on management of data | 2005

Verifying completeness of relational query results in data publishing

Hwee Hwa Pang; Arpit Jain; Krithi Ramamritham; Kian-Lee Tan

In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the publisher may be untrusted or susceptible to attacks, it could produce incorrect query results. In this paper, we introduce a scheme for users to verify that their query results are complete (i.e., no qualifying tuples are omitted) and authentic (i.e., all the result values originated from the owner). The scheme supports range selection on key and non-key attributes, project as well as join queries on relational databases. Moreover, the proposed scheme complies with access control policies, is computationally secure, and can be implemented efficiently.

international conference on data engineering | 2004

Authenticating query results in edge computing

Hwee Hwa Pang; Kian-Lee Tan

Edge computing pushes application logic and the underlying data to the edge of the network, with the aim of improving availability and scalability. As the edge servers are not necessarily secure, there must be provisions for validating their outputs. This paper proposes a mechanism that creates a verification object (VO) for checking the integrity of each query result produced by an edge server - that values in the result tuples are not tampered with, and that no spurious tuples are introduced. The primary advantages of our proposed mechanism are that the VO is independent of the database size, and that relational operations can still be fulfilled by the edge servers. These advantages reduce transmission load and processing at the clients. We also show how insert and delete transactions can be supported.

IEEE Transactions on Knowledge and Data Engineering | 2015

In-Memory Big Data Management and Processing: A Survey

Hao Zhang; Gang Chen; Beng Chin Ooi; Kian-Lee Tan; Meihui Zhang

Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive data analytics. However, in-memory systems are much more sensitive to other sources of overhead that do not matter in traditional I/O-bounded disk-based systems. Some issues such as fault-tolerance and consistency are also more challenging to handle in in-memory environment. We are witnessing a revolution in the design of database systems that exploits main memory as its data storage layer. Many of these researches have focused along several dimensions: modern CPU and memory hierarchy utilization, time/space efficiency, parallelism, and concurrency control. In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks. We also give a comprehensive presentation of important technology in memory management, and some key factors that need to be considered in order to achieve efficient in-memory data management and processing.

international conference on management of data | 2005

Mining top-K covering rule groups for gene expression data

Gao Cong; Kian-Lee Tan; Anthony K. H. Tung; Xin Xu

In this paper, we propose a novel algorithm to discover the top-k covering rule groups for each row of gene expression profiles. Several experiments on real bioinformatics datasets show that the new top-k covering rule mining algorithm is orders of magnitude faster than previous association rule mining algorithms.Furthermore, we propose a new classification method RCBT. RCBT classifier is constructed from the top-k covering rule groups. The rule groups generated for building RCBT are bounded in number. This is in contrast to existing rule-based classification methods like CBA [19] which despite generating excessive number of redundant rules, is still unable to cover some training data with the discovered rules. Experiments show that the RCBT classifier can match or outperform other state-of-the-art classifiers on several benchmark gene expression datasets. In addition, the top-k covering rule groups themselves provide insights into the mechanisms responsible for diseases directly.

international conference on management of data | 2005

Stratified computation of skylines with partially-ordered domains

Chee Yong Chan; Pin-Kwang Eng; Kian-Lee Tan

In this paper, we study the evaluation of skyline queries with partially-ordered attributes. Because such attributes lack a total ordering, traditional index-based evaluation algorithms (e.g., NN and BBS) that are designed for totally-ordered attributes can no longer prune the space as effectively. Our solution is to transform each partially-ordered attribute into a two-integer domain that allows us to exploit index-based algorithms to compute skyline queries on the transformed space. Based on this framework, we propose three novel algorithms: BBS+ is a straightforward adaptation of BBS using the framework, and SDC (Stratification by Dominance Classification) and SDC+ are optimized to handle false positives and support progressive evaluation. Both SDC and SDC+ exploit a dominance relationship to organize the data into strata. While SDC generates its strata at run time, SDC+ partitions the data into strata offline. We also design two dominance classification strategies (MinPC and MaxPC) to further optimize the performance of SDC and SDC+. We implemented the proposed schemes and evaluated their efficiency. Our results show that our proposed techniques outperform existing approaches by a wide margin, with SDC+-MinPC giving the best performance in terms of both response time as well as progressiveness. To the best of our knowledge, this is the first paper to address the problem of skyline query evaluation involving partially-ordered attribute domains.

Indexing Techniques for Advanced Database Systems | 1997

Indexing Techniques for Advanced Database Systems

Elisa Bertino; Beng Chin Ooi; Ron Sacks-Davis; Kian-Lee Tan; Justin Zobel; Boris Shidlovsky; Daniele Andronico

Recent years have seen an explosive growth in the use of new database applications such as CAD/CAM systems, spatial information systems, and multimedia information systems. The needs of these applications are far more complex than traditional business applications. They call for support of objects with complex data types, such as images and spatial objects, and for support of objects with wildly varying numbers of index terms, such as documents. Traditional indexing techniques such as the B-tree and its variants do not efficiently support these applications, and so new indexing mechanisms have been developed. As a result of the demand for database support for new applications, there has been a proliferation of new indexing techniques. The need for a book addressing indexing problems in advanced applications is evident. For practitioners and database and application developers, this book explains best practice, guiding the selection of appropriate indexes for each application. For researchers, this book provides a foundation for the development of new and more robust indexes. For newcomers, this book is an overview of the wide range of advanced indexing techniques. Indexing Techniques for Advanced Database Systems is suitable as a secondary text for a graduate level course on indexing techniques, and as a reference for researchers and practitioners in industry.

Explore More