Is this you? Create Your Porfile

Chee Yong Chan

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chee Yong Chan is active.

Explore More

Publication

Featured researches published by Chee Yong Chan.

international conference on data engineering | 2002

Efficient filtering of XML documents with XPath expressions

Chee Yong Chan; Pascal Felber; Minos N. Garofalakis; Rajeev Rastogi

Abstract. The publish/subscribe paradigm is a popular model for allowing publishers (i.e., data generators) to selectively disseminate data to a large number of widely dispersed subscribers (i.e., data consumers) who have registered their interest in specific information items. Early publish/subscribe systems have typically relied on simple subscription mechanisms, such as keyword or ”bag of words” matching, or simple comparison predicates on attribute values. The emergence of XML as a standard for information exchange on the Internet has led to an increased interest in using more expressive subscription mechanisms (e.g., based on XPath expressions) that exploit both the structure and the content of published XML documents. Given the increased complexity of these new data-filtering mechanisms, the problem of effectively identifying the subscription profiles that match an incoming XML document poses a difficult and important research challenge. In this paper, we propose a novel index structure, termed XTrie, that supports the efficient filtering of XML documents based on XPath expressions. Our XTrie index structure offers several novel features that, we believe, make it especially attractive for large-scale publish/subscribe systems. First, XTrie is designed to support effective filtering based on complex XPath expressions (as opposed to simple, single-path specifications). Second, our XTrie structure and algorithms are designed to support both ordered and unordered matching of XML data. Third, by indexing on sequences of elements organized in a trie structure and using a sophisticated matching algorithm, XTrie is able to both reduce the number of unnecessary index probes as well as avoid redundant matchings, thereby providing extremely efficient filtering. Our experimental results over a wide range of XML document and XPath expression workloads demonstrate that our XTrie index structure outperforms earlier approaches by wide margins.

international conference on management of data | 2006

Finding k-dominant skylines in high dimensional space

Chee Yong Chan; H. V. Jagadish; Kian-Lee Tan; Anthony K. H. Tung; Zhenjie Zhang

Given a d-dimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision making applications.Unfortunately, as the number of dimensions increases, the chance of one point dominating another point is very low. As such, the number of skyline points become too numerous to offer any interesting insights. To find more important and meaningful skyline points in high dimensional space, we propose a new concept, called k-dominant skyline which relaxes the idea of dominance to k-dominance. A point p is said to k-dominate another point q if there are k ≤ d dimensions in which p is better than or equal to q and is better in at least one of these k dimensions. A point that is not k-dominated by any other points is in the k-dominant skyline.We prove various properties of k-dominant skyline. In particular, because k-dominant skyline points are not transitive, existing skyline algorithms cannot be adapted for k-dominant skyline. We then present several new algorithms for finding k-dominant skyline and its variants. Extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently.

international conference on management of data | 2004

Secure XML querying with security views

Wenfei Fan; Chee Yong Chan; Minos N. Garofalakis

The prevalent use of XML highlights the need for a generic, flexible access-control mechanism for XML documents that supports efficient and secure query access, without revealing sensitive information unauthorized users. This paper introduces a novel paradigm for specifying XML security constraints and investigates the enforcement of such constraints during XML query evaluation. Our approach is based on the novel concept of security views, which provide for each user group (a) an XML view consisting of all and only the information that the users are authorized to access, and (b) a view DTD that the XML view conforms to. Security views effectively protect sensitive data from access and potential inferences by unauthorized user, and provide authorized users with necessary schema information to facilitate effective query formulation and optimization. We propose an efficient algorithm for deriving security view definitions from security policies (defined on the original document DTD) for different user groups. We also develop novel algorithms for XPath query rewriting and optimization such that queries over security views can be efficiently answered without materializing the views. Our algorithms transform a query over a security view to an equivalent query over the original document, and effectively prune query nodes by exploiting the structural properties of the document DTD in conjunction with approximate XPath containment tests. Our work is the first to study a flexible, DTD-based access-control model for XML and its implications on the XML query-execution engine. Furthermore, it is among the first efforts for query rewriting and optimization in the presence of general DTDs for a rich a class of XPath queries. An empirical study based on real-life DTDs verifies the effectiveness of our approach.

extending database technology | 2006

On high dimensional skylines

Chee Yong Chan; H. V. Jagadish; Kian-Lee Tan; Anthony K. H. Tung; Zhenjie Zhang

In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multi-dimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding top-k frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in high-dimensional spaces.

international conference on management of data | 2005

Stratified computation of skylines with partially-ordered domains

Chee Yong Chan; Pin-Kwang Eng; Kian-Lee Tan

In this paper, we study the evaluation of skyline queries with partially-ordered attributes. Because such attributes lack a total ordering, traditional index-based evaluation algorithms (e.g., NN and BBS) that are designed for totally-ordered attributes can no longer prune the space as effectively. Our solution is to transform each partially-ordered attribute into a two-integer domain that allows us to exploit index-based algorithms to compute skyline queries on the transformed space. Based on this framework, we propose three novel algorithms: BBS+ is a straightforward adaptation of BBS using the framework, and SDC (Stratification by Dominance Classification) and SDC+ are optimized to handle false positives and support progressive evaluation. Both SDC and SDC+ exploit a dominance relationship to organize the data into strata. While SDC generates its strata at run time, SDC+ partitions the data into strata offline. We also design two dominance classification strategies (MinPC and MaxPC) to further optimize the performance of SDC and SDC+. We implemented the proposed schemes and evaluated their efficiency. Our results show that our proposed techniques outperform existing approaches by a wide margin, with SDC+-MinPC giving the best performance in terms of both response time as well as progressiveness. To the best of our knowledge, this is the first paper to address the problem of skyline query evaluation involving partially-ordered attribute domains.

international conference on computer communications | 2001

Efficiently monitoring bandwidth and latency in IP networks

Yuri Breitbart; Chee Yong Chan; Minos N. Garofalakis; Rajeev Rastogi; Abraham Silberschatz

Effective monitoring of network utilization and performance indicators is a key enabling technology for proactive and reactive resource management, flexible accounting, and intelligent planning in next-generation IP networks. In this paper, we address the challenging problem of efficiently monitoring bandwidth utilization and path latencies in an IP data network. Unlike earlier approaches, our measurement architecture assumes a single point-of-control in the network (corresponding to the network operations center) that is responsible for gathering bandwidth and latency information using widely-deployed management tools, like SNMP, RMON/NetFlow, and explicitly-routed IP probe packets. Our goal is to identify effective techniques for monitoring (a) bandwidth usage for a given set of links or packet flows, and (b) path latencies for a given set of paths, while minimizing the overhead imposed by the management tools on the underlying production network. We demonstrate that minimizing overheads under our measurement model gives rise to new combinatorial optimization problems, most of which prove to be NP-hard. We also propose novel approximation algorithms for these optimization problems and prove guaranteed upper bounds on their worst-case performance. Our simulation results validate our approach, demonstrating the effectiveness of our novel monitoring algorithms over a wide range of network topologies.

very large data bases | 2008

FINCH: evaluating reverse k-Nearest-Neighbor queries on location data

Wei Wu; Fei Yang; Chee Yong Chan; Kian-Lee Tan

A Reverse k-Nearest-Neighbor (RkNN) query finds the objects that take the query object as one of their k nearest neighbors. In this paper we propose new solutions for evaluating RkNN queries and its variant bichromatic RkNN queries on 2-dimensional location data. We present an algorithm named INCH that can compute a RkNN querys search region (from which the query result candidates are drawn). In our RkNN evaluation algorithm called FINCH, the search region restricts the search space, and the search region is tightened each time a new result candidate is found. We also propose a method that enables us to apply any RkNN algorithm on bichromatic RkNN queries. With that, our FINCH algorithm is also used to evaluate bichromatic RkNN queries. Experiments show that our solutions are more efficient than existing algorithms.

international conference on management of data | 2003

Capturing both types and constraints in data integration

Michael Benedikt; Chee Yong Chan; Wenfei Fan; Juliana Freire; Rajeev Rastogi

We propose a framework for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, AIG, that extends a DTD by (1) associating element types with semantic attributes (inherited and synthesized, inspired by the corresponding notions from Attribute Grammars), (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The novelty of AIG consists in semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, as well as for checking XML constraints in parallel with document-generation. We also present cost-based optimization techniques for efficiently evaluating AIGs, including algorithms for merging queries and for scheduling queries on multiple data sources. This provides a new grammar-based approach for data integration under both syntactic and semantic constraints.

very large data bases | 2002

DTD-directed publishing with attribute translation grammars

Michael Benedikt; Chee Yong Chan; Wenfei Fan; Rajeev Rastogi; Shihui Zheng; Aoying Zhou

We present a framework for publishing relational data in XML with respect to a fixed DTD. In data exchange on the Web, XML views of relational data are typically required to conform to a predefined DTD. The presence of recursion in a DTD as well as non-determinism makes it challenging to generate DTD-directed, efficient transformations. Our framework provides a language for defining views that are guaranteed to be DTD-conformant, as well as middleware for evaluating these views. It is based on a novel notion of attribute translation grammars (ATGs). An ATG extends a DTD by associating semantic rules via SQL queries. Directed by the DTD, it extracts data from a relational database, and constructs an XML document. We provide algorithms for efficiently evaluating ATGs, along with methods for statically analyzing them. This yields a systematic and effective approach to publishing data with respect to a predefined DTD.

mobile data management | 2008

Continuous Reverse k-Nearest-Neighbor Monitoring

Wei Wu; Fei Yang; Chee Yong Chan; Kian-Lee Tan

The processing of a Continuous Reverse k-Nearest-Neighbor (CRkNN) query on moving objects can be divided into two sub tasks: continuous filter, and continuous refinement. The algorithms for the two tasks can be completely independent. Existing CRkNN solutions employ Continuous k-Nearest-Neighbor (CkNN) queries for both continuous filter and continuous refinement. We analyze the CkNN based solution and point out that when k > 1 the refinement cost becomes the system bottleneck. We propose a new continuous refinement method called CRange-k. In CRange- k, we transform the continuous verification problem into a Continuous Range-k query, which is also defined in this paper, and process it efficiently. Experimental study shows that the CRkNN solution based on our CRange-k refinement method is more efficient and scalable than the state-of-the- art CRkNN solution.

Explore More