Is this you? Create Your Porfile

Cheng Hian Goh

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cheng Hian Goh is active.

Explore More

Publication

Featured researches published by Cheng Hian Goh.

international conference on management of data | 1992

On global multidatabase query optimization

Hongjun Lu; Beng Chin Ooi; Cheng Hian Goh

Multidatabase management systems (MDBMS) [6] enable da ta sharing among heterogeneous local databases (called component databases) and thus provide interoperability required by diverse applications. Among the research topics in mult idatabase systems, few work has been reported on global query optimization [3, 4] compared to other topics such as schema integration. We suggest that there are several reasons why this is so. First, global query optimization in multidatabase systems is a complex problem. Second, the potential returns, in terms of both academic value of this research as well as possible performance gain from optimization, are not clear. In this extended abstract, we claim that global query optimization is necessary for high performance MDBMS as in the case of conventional distributed database systems. However, contrary to what is commonly assumed, query optimization in multidatabase systems encompasses a number of additional issues which arise from autonomy and heterogeneity of component databases. To circumvent these difficulties, we extend the existing model for distributed query optimization to operate in the multidatabase context. Finally, we also outline the design of a multidatabase query optimizer.

international workshop on research issues in data engineering | 1993

Multidatabase query optimization: issues and solutions

Hongjun Lu; Beng Chin Ooi; Cheng Hian Goh

Despite the interest in multidatabase systems, research in multidatabase query optimization (MQO) has been scarce. Many researchers perceive the difficulty to be the lack of reliable cost estimates for autonomous component databases. Consequently, some have suggested that this problem degenerates to a distributed query optimization (DQO) problem once cost model coefficients for component databases are found. The authors argue that autonomy and heterogeneity of component databases give rise to a number of issues which make MQO a distinct problem from DQO. Consequently, existing solutions for DQO need to be re-evaluated in the light of these issues. The design of a multidatabase query optimizer, which accounts for the issues highlighted, is discussed. This provides a framework within which further research in MQO can be carried out.<<ETX>>

data and knowledge engineering | 1996

Indexing temporal data using existing B + -trees

Cheng Hian Goh; Hongjun Lu; Beng Chin Ooi; Kian-Lee Tan

Abstract Research in temporal databases has largely focused on extensions of existing data models for the proper handling of temporal information. One approach is to store temporal data on existing DBMS and build some new indexes to provide support for the efficient retrieval of temporal data. This paper describes mapping strategies to linearize the data such that existing B + -trees can be used directly. With such an implementation, a temporal relation is mapped to points in a multi-dimensional space, with each time interval being translated to a two-dimensional coordinate, and a temporal selection operation is constructed as a spatial search operation. The proposed approach has two advantages. First, mapping a temporal relation to a multi-dimensional space provides a uniform framework for dealing with temporal queries involving transaction and valid time, as well as other non-temporal attributes. Second, linearization of the multi-dimensional search space allows classical indexing methods (such as the B + -tree) to be used; this means that index support for temporal selection can be accomplished without modification to the underlying storage components of the DBMS. Both analytical and simulation study show that the proposed indexing scheme is more efficient than the time index in both its disk utilization and access time.

international conference on data engineering | 1992

Logical database design with inclusion dependencies

Tok Wang Ling; Cheng Hian Goh

Classical data dependencies are oblivious to important constraints which may exist between sets of attributes occurring in different relation schemes. The authors study how inclusion dependencies can be used to model these constraints, leading to the design of better database schemes. A normal form called the inclusion normal form (IN-NF) is proposed. Unlike classical normal forms, the IN-NF characterizes a database scheme as a whole rather than the individual relation schemes. It is shown that a database scheme in IN-NF is always in improved third normal form, while the converse is not true. It is demonstrated that the classical relational design framework may be extended to facilitate the design of database schemes in IN-NF.<<ETX>>

data and knowledge engineering | 2000

Efficient indexing of high-dimensional data through dimensionality reduction

Cheng Hian Goh; Agnes Lim; Beng Chin Ooi; Kian-Lee Tan

Abstract The performance of the R-tree indexing method is known to deteriorate rapidly when the dimensionality of data increases. In this paper, we present a technique for dimensionality reduction by grouping d distinct attributes into k disjoint clusters and mapping each cluster to a linear space. The resulting k-dimensional space (which may be much smaller than d) can then be indexed using an R-tree efficiently. We present algorithms for decomposing a query region on the native d-dimensional space to corresponding query regions in the k-dimensional space, as well as search and update operations for the “dimensionally-reduced” R-tree. Experiments using real data sets for point, region, and OLAP queries were conducted. The results indicate that there is potential for significant performance gains over a naive strategy in which an R-tree index is created on the native d-dimensional space.

IEEE Transactions on Knowledge and Data Engineering | 2002

Exploring into programs for the recovery of data dependencies designed

Hee Beng Kuan Tan; Tok Wang Ling; Cheng Hian Goh

Data dependencies play an important role in the design of a database. Many legacy database applications have been developed on old generation database management systems and conventional file systems. As a result, most of the data dependencies in legacy databases are not enforced in the database management systems. As such, they are not explicitly defined in database schema and are enforced in the transactions, which update the databases. It is very difficult and time consuming to find out the designed data dependencies manually during the maintenance and reengineering of database applications. In software engineering, program analysis has long been developed and proven as a useful aid in many areas. With the use of program analysis, this paper proposes a novel approach for the recovery of common data dependencies, i.e., functional dependencies, key constraints, inclusion dependencies, referential constraints, and sum dependencies, designed in a database from the behavior of transactions, which update the database. The approach is based on detecting program path patterns for implementing most commonly used methods to enforce these data dependencies.

international conference on data engineering | 1997

Indexing OODB instances based on access proximity

Chee Yong Chan; Cheng Hian Goh; Beng Chin Ooi

Queries in object-oriented databases (OODBs) may be asked with respect to different class scopes: a query may either request for object-instances which belong exclusively to a given class c, or those which belong to any class in the hierarchy rooted at c. To facilitate retrieval of objects both from a single class as well as from multiple classes in a class hierarchy, we propose a multi-dimensional class-hierarchy index called the /spl chi/-tree. The /spl chi/-tree dynamically partitions the data space using both the class and indexed attribute dimensions by taking into account the semantics of the class dimension as well as access patterns of queries. Experimental results show that it is an efficient index.

Information & Software Technology | 1998

Indexing bitemporal databases as points

Beng Chin Ooi; Cheng Hian Goh; Kian-Lee Tan

Abstract A bitemporal database supports both the valid and transaction time dimensions. Records in such a database can be viewed as a rectangle in a 2-dimensional space (corresponding to the valid and transaction time dimensions). Hence, a spatial access method can be employed to facilitate speedy retrieval of the database. In this paper, we re-examine the issue of designing efficient access methods for bitemporal databases. In particular, we transform a record into a point in a multi-dimensional space, where the valid time and transaction time are each mapped to a 2-dimensional coordinate. A temporal selection operation can then be implemented as a region search operation. This allows us to tap into the many point access methods that are commercially available without modification. We implemented and evaluated three R-tree based methods on key-range time-slice queries: the naive Point R-tree, the Dual Point R-tree and the Dual Spatial R-tree. Our experimental results show that while the simple Point R-tree is inferior to the Dual Spatial R-tree, the Dual Point R-tree has the best performance of the three.

very large data bases | 2000

Progressive evaluation of nested aggregate queries

Kian-Lee Tan; Cheng Hian Goh; Beng Chin Ooi

Abstract. In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers.

Knowledge and Information Systems | 1999

Efficient Join Processing Using Partial Precomputation

Kian-Lee Tan; Cheng Hian Goh; Mong Li Lee; Beng Chin Ooi

In this paper, we generalize conventional join indexes to a cluster-based join index, in which objects are grouped into clusters based on proximity. Each record of our join index represents a pair of clusters in which the join condition is satisfied by some members of the cluster. This strategy is especially useful for spatial and high-dimensional databases because of their typically large data volume and complex operations. Our approach leverages on the structure of R-trees by exploiting the internal nodes of an R-tree in effectively determining the precomputed clusters which can be used in our join index. By varying the size of the cluster, we are able to fine-tune the join index to achieve a balance between update cost and retrieval cost to suit individual applications. Different implementations of the join index are examined to determine how the join index can be efficiently maintained. To this end, we also conduct a number of experiments on intersection join and window queries, and the results confirm that semi-precomputation of join results is a robust and cost effective approach to join processing.

Explore More