Is this you? Create Your Porfile

Beng Chin Ooi

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Beng Chin Ooi is active.

Explore More

Publication

Featured researches published by Beng Chin Ooi.

ACM Transactions on Database Systems | 2005

iDistance: An adaptive B + -tree based indexing method for nearest neighbor search

H. V. Jagadish; Beng Chin Ooi; Kian-Lee Tan; Cui Yu; Rui Zhang

In this article, we present an efficient B+-tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data based on a space- or data-partitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B+-tree structure and KNN search to be performed using one-dimensional range search. The choice of partition and reference points adapts the index structure to the data distribution.We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.

international conference on management of data | 2008

EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Guoliang Li; Beng Chin Ooi; Jianhua Feng; Jianyong Wang; Lizhu Zhou

Conventional keyword search engines are restricted to a given data model and cannot easily adapt to unstructured, semi-structured or structured data. In this paper, we propose an efficient and adaptive keyword search method, called EASE, for indexing and querying large collections of heterogenous data. To achieve high efficiency in processing keyword queries, we first model unstructured, semi-structured and structured data as graphs, and then summarize the graphs and construct graph indices instead of using traditional inverted indices. We propose an extended inverted index to facilitate keyword-based search, and present a novel ranking mechanism for enhancing search effectiveness. We have conducted an extensive experimental study using real datasets, and the results show that EASE achieves both high search efficiency and high accuracy, and outperforms the existing approaches significantly.

very large data bases | 2004

Query and update efficient B + -tree based indexing of moving objects

Christian S. Jensen; Dan Lin; Beng Chin Ooi

A number of emerging applications of data management technology involve the monitoring and querying of large quantities of continuous variables, e.g., the positions of mobile service users, termed moving objects. In such applications, large quantities of state samples obtained via sensors are streamed to a database. Indexes for moving objects must support queries efficiently, but must also support frequent updates. Indexes based on minimum bounding regions (MBRs) such as the R-tree exhibit high concurrency overheads during node splitting, and each individual update is known to be quite costly. This motivates the design of a solution that enables the B+ -tree to manage moving objects. We represent moving-object locations as vectors that are timestamped based on their update time. By applying a novel linearization technique to these values, it is possible to index the resulting values using a single B+-tree that partitions values according to their timestamp and otherwise preserves spatial proximity. We develop algorithms for range and k nearest neighbor queries, as well as continuous queries. The proposal can be grafted into existing database systems cost effectively. An extensive experimental study explores the performance characteristics of the proposal and also shows that it is capable of substantially outperforming the R-tree based TPR-tree for both single and concurrent access scenarios.

international conference on data engineering | 2003

XR-tree: indexing XML data for efficient structural joins

Haifeng Jiang; Hongjun Lu; Wei Wang; Beng Chin Ooi

XML documents are typically queried with a combination of value search and structure search. While querying by values can leverage traditional database technologies, evaluating structural relationship, specifically parent-child or ancestor-descendant relationship, between XML element sets has imposed a great challenge on efficient XML query processing. We propose XR-tree, namely, XML region tree, which is a dynamic external memory index structure specially designed for strictly nested XML data. The unique feature of XR-tree is that, for a given element, all its ancestors (or descendants) in an element set indexed by an XR-tree can be identified with optimal worst case I/O cost. We then propose a new structural join algorithm that can evaluate the structural relationship between two XR-tree indexed element sets by effectively skipping ancestors and descendants that do not participate in the join. Our extensive performance study shows that the XR-tree based join algorithm significantly outperforms previous algorithms.

international conference on management of data | 2011

Collective spatial keyword querying

Xin Cao; Gao Cong; Christian S. Jensen; Beng Chin Ooi

With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the groups keywords cover the querys keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We present empirical studies that offer insight into the efficiency and accuracy of the solutions.

international conference on data engineering | 2006

Skyline Queries Against Mobile Lightweight Devices in MANETs

Zhiyong Huang; Christian S. Jensen; Hua Lu; Beng Chin Ooi

Skyline queries are well suited when retrieving data according to multiple criteria. While most previous work has assumed a centralized setting this paper considers skyline querying in a mobile and distributed setting, where each mobile device is capable of holding only a portion of the whole dataset; where devices communicate through mobile ad hoc networks; and where a query issued by a mobile user is interested only in the user’s local area, although a query generally involves data stored on many mobile devices due to the storage limitations. We present techniques that aim to reduce the costs of communication among mobile devices and reduce the execution time on each single mobile device. For the former, skyline query requests are forwarded among mobile devices in a deliberate way, such that the amount of data to be transferred is reduced. For the latter, specific optimization measures are proposed for resource-constrained mobile devices. We conduct extensive experiments to show that our proposal performs efficiently in real mobile devices and simulated wireless ad hoc networks.

very large data bases | 2012

Efficient processing of k nearest neighbor joins using MapReduce

Wei Lu; Yanyan Shen; Su Chen; Beng Chin Ooi

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.

international conference on data engineering | 2006

VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes

H. V. Jagadish; Beng Chin Ooi; Quang Hieu Vu; Rong Zhang; Aoying Zhou

Multi-dimensional data indexing has received much attention in a centralized database. However, not so much work has been done on this topic in the context of Peerto- Peer systems. In this paper, we propose a new Peer-to- Peer framework based on a balanced tree structure overlay, which can support extensible centralized mapping methods and query processing based on a variety of multidimensional tree structures, including R-Tree, X-Tree, SSTree, and M-Tree. Specifically, in a network with N nodes, our framework guarantees that point queries and range queries can be answered within O(logN) hops. We also provide an effective load balancing strategy to allow nodes to balance their work load efficiently. An experimental assessment validates the practicality of our proposal.

IEEE Transactions on Knowledge and Data Engineering | 2015

In-Memory Big Data Management and Processing: A Survey

Hao Zhang; Gang Chen; Beng Chin Ooi; Kian-Lee Tan; Meihui Zhang

Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive data analytics. However, in-memory systems are much more sensitive to other sources of overhead that do not matter in traditional I/O-bounded disk-based systems. Some issues such as fault-tolerance and consistency are also more challenging to handle in in-memory environment. We are witnessing a revolution in the design of database systems that exploits main memory as its data storage layer. Many of these researches have focused along several dimensions: modern CPU and memory hierarchy utilization, time/space efficiency, parallelism, and concurrency control. In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks. We also give a comprehensive presentation of important technology in memory management, and some key factors that need to be considered in order to achieve efficient in-memory data management and processing.

international conference on management of data | 2010

Indexing multi-dimensional data in a cloud system

Jinbao Wang; Sai Wu; Hong Gao; Beng Chin Ooi

Providing scalable database services is an essential requirement for extending many existing applications of the Cloud platform. Due to the diversity of applications, database services on the Cloud must support large-scale data analytical jobs and high concurrent OLTP queries. Most existing work focuses on some specific type of applications. To provide an integrated framework, we are designing a new system, epiC, as our solution to next-generation database systems. In epiC, indexes play an important role in improving overall performance. Different types of indexes are built to provide efficient query processing for different applications. In this paper, we propose RT-CAN, a multi-dimensional indexing scheme in epiC. RT-CAN integrates CAN [23] based routing protocol and the R-tree based indexing scheme to support efficient multi-dimensional query processing in a Cloud system. RT-CAN organizes storage and compute nodes into an overlay structure based on an extended CAN protocol. In our proposal, we make a simple assumption that each compute node uses an R-tree like indexing structure to index the data that are locally stored. We propose a query-conscious cost model that selects beneficial local R-tree nodes for publishing. By keeping the number of persistently connected nodes small and maintaining a global multi-dimensional search index, we can locate the compute nodes that may contain the answer with a few hops, making the scheme scalable in terms of data volume and number of compute nodes. Experiments on Amazons EC2 show that our proposed routing protocol and indexing scheme are robust, efficient and scalable.

Explore More