Ken Q. Pu
University of Ontario Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ken Q. Pu.
international conference on data engineering | 2005
Xiaohui Yu; Ken Q. Pu; Nick Koudas
Many location-based applications require constant monitoring of k-nearest neighbor (k-NN) queries over moving objects within a geographic area. Existing approaches to this problem have focused on predictive queries, and relied on the assumption that the trajectories of the objects are fully predictable at query processing time. We relax this assumption, and propose two efficient and scalable algorithms using grid indices. One is based on indexing objects, and the other on queries. For each approach, a cost model is developed, and a detailed analysis along with the respective applicability is presented. The object-indexing approach is further extended to multi-levels to handle skewed data. We show by experiments that our grid-based algorithms significantly outperform R-tree-based solutions. Extensive experiments are also carried out to study the properties and evaluate the performance of the proposed approaches under a variety of settings.
very large data bases | 2008
Ken Q. Pu; Xiaohui Yu
Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task. In this paper, we introduce the problem of query cleaning for keyword search queries in a database context and propose a set of effective and efficient solutions. Query cleaning involves semantic linkage and spelling corrections of database relevant query words, followed by segmentation of nearby query words such that each segment corresponds to a high quality data term. We define a quality metric of a keyword query, and propose a number of algorithms for cleaning keyword queries optimally. It is demonstrated that the basic optimal query cleaning problem can be solved using a dynamic programming algorithm. We further extend the basic algorithm to address incremental query cleaning and top-k optimal query cleaning. The incremental query cleaning is efficient and memory-bounded, hence is ideal for scenarios in which the keywords are streamed. The top-k query cleaning algorithm is guaranteed to return the best k cleaned keyword queries in ranked order. Extensive experiments are conducted on three real-life data sets, and the results confirm the effectiveness and efficiency of the proposed solutions.
IEEE Transactions on Parallel and Distributed Systems | 2009
Ying Zhu; Baochun Li; Ken Q. Pu
In a peer-to-peer overlay network, the phenomenon of multiple overlay links sharing bottleneck physical links leads to correlation of overlay link capacities. We are able to more accurately model the overlay by incorporating these linear capacity constraints (LCCs). We formulate the problem of maximizing bandwidth in overlay multicast using our LCC model. We show that finding a maximum bandwidth multicast tree in an overlay network with LCC is NP-complete. Therefore, an efficient heuristics algorithm is designed to solve the problem. Extensive simulations show that our algorithm is able to construct multicast trees that are optimal or extremely close to optimal, with significantly higher bandwidth than trees formed in overlays with no LCC. Furthermore, we develop a fully distributed algorithm for obtaining near-optimal multicast trees, by means of gossip-based algorithms and a restricted but inherently distributed class of LCC (node-based LCC). We demonstrate that the distributed algorithm converges quickly to the centralized optimal and is highly scalable.
very large data bases | 2013
Oktie Hassanzadeh; Ken Q. Pu; Soheil Hassas Yeganeh; Renée J. Miller; Lucian Popa; Mauricio A. Hernández; Howard Ho
A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator, that associates attributes of one database to another. However, the massive growth in the amount and variety of unstructured and semi-structured data on the Web has created new challenges for this task. Such data sources often do not have a fixed pre-defined schema and contain large numbers of diverse attributes. Furthermore, the end goal is not schema alignment as these schemas may be too heterogeneous (and dynamic) to meaningfully align. Rather, the goal is to align any overlapping data shared by these sources. We will show that even attributes with different meanings (that would not qualify as schema matches) can sometimes be useful in aligning data. The solution we propose in this paper replaces the basic schema-matching step with a more complex instance-based schema analysis and linkage discovery. We present a framework consisting of a library of efficient lexical analyzers and similarity functions, and a set of search algorithms for effective and efficient identification of linkage points over Web data. We experimentally evaluate the effectiveness of our proposed algorithms in real-world integration scenarios in several domains.
symposium on principles of database systems | 2003
Alberto O. Mendelzon; Ken Q. Pu
We study the problem of economical representation of subsets of structured sets, which are sets equipped with a set cover or a family of preorders. Given a structured set U, and a language L whose expressions define subsets of U, the problem of minimum description length in L (L-MDL) is: “given a subset V of U, find a shortest string in L that defines V.” Depending on the structure and the language, the MDL-problem is in general intractable. We study the complexity of the MDL-problem for various structures and show that certain specializations are tractable. The families of focus are hierarchy, linear order, and their multidimensional extensions; these are found in the context of statistical and OLAP databases. In the case of general OLAP databases, data organization is a mixture of multidimensionality, hierarchy, and ordering, which can also be viewed naturally as a cover-structured ordered set. Efficient algorithms are provided for the MDL-problem for hierarchical and linearly ordered structures, and we prove that the multidimensional extensions are NP-complete. Finally, we illustrate the application of the theory to summarization of large result sets and (multi) query optimization for ROLAP queries.
IEEE Transactions on Knowledge and Data Engineering | 2015
Ziqiang Yu; Yang Liu; Xiaohui Yu; Ken Q. Pu
Central to many applications involving moving objects is the task of processing k-nearest neighbor (k-NN) queries. Most of the existing approaches to this problem are designed for the centralized setting where query processing takes place on a single server; it is difficult, if not impossible, for them to scale to a distributed setting to handle the vast volume of data and concurrent queries that are increasingly common in those applications. To address this problem, we propose a suite of solutions that can support scalable distributed processing of k-NN queries. We first present a new index structure called Dynamic Strip Index (DSI), which can better adapt to different data distributions than exiting grid indexes. Moreover, it can be naturally distributed across the cluster, therefore lending itself well to distributed processing. We further propose a distributed k-NN search (DKNN) algorithm based on DSI. DKNN avoids having an uncertain number of potentially expensive iterations, and is thus more efficient and more predictable than existing approaches. DSI and DKNN are implemented on Apache S4, an open-source platform for distributed stream processing. We perform extensive experiments to study the characteristics of DSI and DKNN, and compare them with three baseline methods. Experimental results show that our proposal scales well and significantly outperforms the alternative methods.
international conference on data engineering | 2006
Ken Q. Pu; Vagelis Hristidis; Nick Koudas
This paper studies a problem of web service composition from a syntactic approach. In contrast with other approaches on enriched semantic description such as statetransition description of web services, our focus is in the case when only the input-output type information from the WSDL specifications is available. The web service composition problem is formally formulated as deriving a given desired type from a collection of available types and web services using a prescribed set of rules with costs. We show that solving the minimal cost composition is NP-complete in general, and present a practical solution based on dynamic programming. Experiements using a mixture of synthetic and real data sets show that our approach is viable and produces good results.
conference on information and knowledge management | 2010
Ken Q. Pu; Oktie Hassanzadeh; Richard Drake; Renée J. Miller
We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-k relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn the user preference and personalize the annotation for individual users.
international conference on data engineering | 2009
Ken Q. Pu; Xiaohui Yu
In this demo, we will showcase the prototype system FRISK (for Finding Relational Views Using Structured Keyword Queries) for supporting keyword queries in relational databases. Two salient features that set our prototype apart from existing systems and will be demonstrated are: (1) It supports keyword query cleaning; and (2) It offers an efficient way to search for the proper data subspace related to the keyword query.
data warehousing and olap | 2005
Ken Q. Pu
We propose a new functional framework for modeling, querying and reasoning about OLAP databases. The framework represents data (data cubes and dimensional hierarchies) and querying constructs as first-order and second-order functional symbols respectively. A polymorphic attribute-based type system is used to annotate the functional symbols with proper type information. Furthermore, semantic knowledge about the functional symbols, such as the properties of dimensional hierarchical structures and algebraic identities among query constructs, can be specified by equations which permits equational reasoning on equivalence of OLAP queries and generalized summarizability of aggregate views.