Is this you? Create Your Porfile

Quang Hieu Vu

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Quang Hieu Vu is active.

Explore More

Publication

Featured researches published by Quang Hieu Vu.

international conference on data engineering | 2006

VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes

H. V. Jagadish; Beng Chin Ooi; Quang Hieu Vu; Rong Zhang; Aoying Zhou

Multi-dimensional data indexing has received much attention in a centralized database. However, not so much work has been done on this topic in the context of Peerto- Peer systems. In this paper, we propose a new Peer-to- Peer framework based on a balanced tree structure overlay, which can support extensible centralized mapping methods and query processing based on a variety of multidimensional tree structures, including R-Tree, X-Tree, SSTree, and M-Tree. Specifically, in a network with N nodes, our framework guarantees that point queries and range queries can be answered within O(logN) hops. We also provide an effective load balancing strategy to allow nodes to balance their work load efficiently. An experimental assessment validates the practicality of our proposal.

very large data bases | 2009

Skyframe: a framework for skyline query processing in peer-to-peer systems

Shiyuan Wang; Quang Hieu Vu; Beng Chin Ooi; Anthony K. H. Tung; Lizhen Xu

This paper looks at the processing of skyline queries on peer-to-peer (P2P) networks. We propose Skyframe, a framework for efficient skyline query processing in P2P systems, which addresses the challenges of quick response time, low network communication cost and query load balancing among peers. Skyframe consists of two querying methods: one is optimized for network communication while the other focuses on query response time. These methods are different in the way in which the query search space is defined. In particular, the first method uses a high dominating point that has a large dominating region to prune the search space to achieve a low cost in network communication. On the other hand, the second method relaxes the search space in order to allow parallel query processing to speed up query response. Skyframe achieves query load balancing by both query load conscious data space splitting/merging during the join/departure of nodes and dynamic load migration. We further show how to apply Skyframe to both the P2P systems supporting multi-dimensional indexing and the P2P systems supporting single-dimensional indexing. Finally, we have conducted extensive experiments on both real and synthetic data sets over two existing P2P systems: CAN (Ratnasamy in A scalable content-addressable network. In: Proceedings of SIGCOMM Conference, pp. 161–172, 2001) and BATON (Jagadish et al. in A balanced tree structure for peer-to-peer networks. In: Proceedings of VLDB Conference, pp. 661–672, 2005) to evaluate the effectiveness and scalability of Skyframe.

Archive | 2010

Systems and Applications

Quang Hieu Vu; Mihai Lupu; Beng Chin Ooi

Chapter 9 describes some representative P2P systems and applications that have been deployed. We look at how different application environments and requirements drive the design and architecture of the systems. We discuss popular techniques employed in each type of applications. In particular, we first present representatives of P2P file sharing systems. After that, we introduce a variety of P2P systems that are used to support data backup purposes. We then analyze in details the two main architectures used in P2P database management systems. Finally, we discuss cases where P2P systems are used to support web caching, communication and collaboration. Additionally, since handheld devices become more and more powerful and connected, we begin to see a trend of moving applications from PCs to PDAs and cell phones. As a result, we also introduce two types of applications for mobile devices: file sharing and text messaging and voice communication.

international conference on management of data | 2008

A graph method for keyword-based selection of the top-K databases

Quang Hieu Vu; Beng Chin Ooi; Dimitris Papadias; Anthony K. H. Tung

While database management systems offer a comprehensive solution to data storage, they require deep knowledge of the schema, as well as the data manipulation language, in order to perform effective retrieval. Since these requirements pose a problem to lay or occasional users, several methods incorporate keyword search (KS) into relational databases. However, most of the existing techniques focus on querying a single DBMS. On the other hand, the proliferation of distributed databases in several conventional and emerging applications necessitates the support for keyword-based data sharing and querying over multiple DMBSs. In order to avoid the high cost of searching in numerous, potentially irrelevant, databases in such systems, we propose G-KS, a novel method for selecting the top-K candidates based on their potential to contain results for a given query. G-KSsummarizes each database by a keyword relationship graph, where nodes represent terms and edges describe relationships between them. Keyword relationship graphs are utilized for computing the similarity between each database and a KS query, so that, during query processing, only the most promising databases are searched. An extensive experimental evaluation demonstrates that G-KS outperforms the current state-of-the-art technique on all aspects, including precision, recall, efficiency, space overhead and flexibility of accommodating different semantics.

IEEE Transactions on Knowledge and Data Engineering | 2009

Histogram-Based Global Load Balancing in Structured Peer-to-Peer Systems

Quang Hieu Vu; Beng Chin Ooi; Martin C. Rinard; Kian-Lee Tan

Over the past few years, peer-to-peer (P2P) systems have rapidly grown in popularity and have become a dominant means for sharing resources. In these systems, load balancing is a key challenge because nodes are often heterogeneous. While several load-balancing schemes have been proposed in the literature, these solutions are typically ad hoc, heuristic based, and localized. In this paper, we present a general framework, HiGLOB, for global load balancing in structured P2P systems. Each node in HiGLOB has two key components: 1) a histogram manager maintains a histogram that reflects a global view of the distribution of the load in the system, and 2) a load-balancing manager that redistributes the load whenever the node becomes overloaded or underloaded. We exploit the routing metadata to partition the P2P network into nonoverlapping regions corresponding to the histogram buckets. We propose mechanisms to keep the cost of constructing and maintaining the histograms low. We further show that our scheme can control and bound the amount of load imbalance across the system. Finally, we demonstrate the effectiveness of HiGLOB by instantiating it over three existing structured P2P systems: Skip Graph, BATON, and Chord. Our experimental results indicate that our approach works well in practice.

Archive | 2010

Security in Peer-to-Peer Networks

Quang Hieu Vu; Mihai Lupu; Beng Chin Ooi

Chapter 6 addresses security, privacy and anonymity issues. We begin this chapter with a discussion about techniques designed to secure data as well as the overall P2P environment from different types of attacks, including routing attacks, storage and retrieval attacks, and denial-of-service attacks. We then introduce solutions to guarantee integrity of data as well as computation over P2P systems. After that, we present methods that prevent users from taking advantage of the system by freeloading off the resources contributed by others. These methods are important because guaranteeing the fairness among participants and encouraging them to contribute sharing resources represent the central strength of P2P systems. Finally, we look at techniques that are designed to support anonymity and privacy, to protect both the users that disseminate the data, as well as the nodes that store the data. We also examine techniques that authenticate third-party data publication.

international conference on data engineering | 2009

Adaptive Multi-join Query Processing in PDBMS

Sai Wu; Quang Hieu Vu; Kian-Lee Tan

Traditionally, distributed databases assume that the small) set of nodes participating in a query is known apriori, the data is well placed, and the statistics are readily available. However, these assumptions are no longer valid in a Peer-based DataBase Management System (PDBMS). As such, it is a challenge to process and optimize queries in a PDBMS. In this paper, we present our distributed solution to this problem for multi-way join queries. Our approach first processes a multi-way join query based on an initial query evaluation plan (generated using statistical data that may be obsolete or inaccurate); as the query is beingprocessed, statistics obtained on-the-fly are used to (continuously) refine the current plan dynamically into a more effective one. We have conducted an extensive performance study which shows that our adaptive query processing strategy can reduce the network traffic significantly.

data and knowledge engineering | 2008

Adaptive indexing for content-based search in P2P systems

Aoying Zhou; Rong Zhang; Weining Qian; Quang Hieu Vu; Tianming Hu

One of the major challenges in Peer-to-Peer (P2P) file sharing systems is to support content-based search. Although there have been some proposals to address this challenge, they share the same weakness of using either servers or super-peers to keep global knowledge, which is required to identify importance of terms to avoid popular terms in query processing. As a result, they are not scalable and are prone to the bottleneck problem, which is caused by the high visiting load at the global knowledge maintainers. To that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge. Our method is based on an adaptive indexing structure that combines a Chord ring and a balanced tree. The tree is used to aggregate and classify terms adaptively, while the Chord ring is used to index terms of nodes in the tree. Specifically, at each node of the tree, the system classifies terms as either important or unimportant. Important terms, which can distinguish the node from its neighbor nodes, are indexed in the Chord ring. On the other hand, unimportant terms, which are either popular or rare terms, are aggregated to higher level nodes. Such classification enables the system to process queries on the fly without the need for global knowledge. Besides, compared to the methods that index terms separately, term aggregation reduces the indexing cost significantly. Taking advantage of the tree structure, we also develop an efficient search algorithm to tackle the bottleneck problem near the root. Finally, our extensive experiments on both benchmark and Wikipedia datasets validated the effectiveness and efficiency of the proposed method.

european conference on parallel processing | 2009

SiMPSON: Efficient Similarity Search in Metric Spaces over P2P Structured Overlay Networks

Quang Hieu Vu; Mihai Lupu; Sai Wu

Similarity search in metric spaces over centralized systems has been significantly studied in the database research community. However, not so much work has been done in the context of P2P networks. This paper introduces SiMPSON: a P2P system supporting similarity search in metric spaces. The aim is to answer queries faster and using less resources than existing systems. For this, each peer first clusters its own data using any off-the-shelf clustering algorithms. Then, the resulting clusters are mapped to one-dimensional values. Finally, these one-dimensional values are indexed into a structured P2P overlay. Our method slightly increases the indexing overhead, but allows us to greatly reduce the number of peers and messages involved in query processing: we trade a small amount of overhead in the data publishing process for a substantial reduction of costs in the querying phase. Based on this architecture, we propose algorithms for processing range and kNN queries. Extensive experimental results validate the claims of efficiency and effectiveness of SiMPSON.

Archive | 2010

Architecture of Peer-to-Peer Systems

Quang Hieu Vu; Mihai Lupu; Beng Chin Ooi

Chapter 2 presents various architectures of P2P systems. We first introduce a taxonomy of P2P architectures, where we classify P2P systems into three main categories: centralized P2P systems, decentralized P2P systems, and hybrid P2P systems. While centralized P2P systems are systems that are supported by centralized servers, decentralized P2P systems are pure P2P systems, which are completely decentralized. On the other hand, hybrid systems are systems where nodes are organized into two layers: the upper tier super nodes act as servers for lower tier nodes. After the introduction, for each of these P2P categories, we discuss in details its properties as well as outstanding P2P systems belonging to its category. In particular, for centralized P2P systems, we introduce Napster and SETI@home. For decentralized P2P systems, we present Gnutella, PAST, Canon, and Skip Graph. Finally, for hybrid P2P systems, we show BestPeer, a self-configurable P2P system.

Explore More