M. Tamer Özsu
University of Waterloo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by M. Tamer Özsu.
international conference on management of data | 2003
Lukasz Golab; M. Tamer Özsu
Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
international conference on management of data | 2005
Lei Chen; M. Tamer Özsu; Vincent Oria
An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance functions, such as Euclidean distance, Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), and Longest Common Subsequences (LCSS), indicate that EDR is more robust than Euclidean distance, DTW and ERP, and it is on average 50% more accurate than LCSS. We also develop three pruning techniques to improve the retrieval efficiency of EDR and show that these techniques can be combined effectively in a search, increasing the pruning power significantly. The experimental results confirm the superior efficiency of the combined methods.
ACM Computing Surveys | 1996
M. Tamer Özsu; Patrick Valduriez
The maturation of database management system (DBMS) technology has coincided with significant developments in distributed computing and parallel processing technologies. The end result is the development of distributed database management systems and parallel database management systems that are now the dominant data management tools for highly data-intensive applications. With the emergence of cloud computing, distributed and parallel database systems have started to converge. In this chapter, we present an overview of the distributed DBMS and parallel DBMS technologies, highlight the unique characteristics of each, and indicate the similarities between them. We also discuss the new challenges and emerging solutions.
very large data bases | 2003
Lukasz Golab; M. Tamer Özsu
We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under the assumption that all the sliding windows fit in main memory. The algorithms include multiway incremental nested loop joins (NLJs) and multi-way incremental hash joins. We also propose join ordering heuristics to minimize the processing cost per unit time. We test a possible implementation of these algorithms and show that, as expected, hash joins are faster than NLJs for performing equi-joins, and that the overall processing cost is influenced by the strategies used to remove expired tuples from the sliding windows.
Archive | 2011
M. Tamer Özsu; Patrick Valduriez
Third edition of leading textbook on the topic Distributed data management re-emerging as key topic with increasing growth of web, cloud, cluster computing Covers both traditional material and emerging areas Ancillary teaching materials available This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud computing and streaming data applications, has forced a renewal of interest in distributed and parallel data management, while, at the same time, requiring a rethinking of some of the traditional techniques. This book covers the breadth and depth of this re-emerging field. The coverage consists of two parts. The first part discusses the fundamental principles of distributed data management and includes distribution design, data integration, distributed query processing and optimization, distributed transaction management, and replication. The second part focuses on more advanced topics and includes discussion of parallel database systems, distributed object management, peer-to-peer data management, web data management, data stream systems, and cloud computing. New in this Edition: * New chapters, covering database replication, database integration, multidatabase query processing, peer-to-peer data management, and web data management. * Coverage of emerging topics such as data streams and cloud computing * Extensive revisions and updates based on years of class testing and feedback Ancillary teaching materials are available.
international conference on management of data | 2003
David DeHaan; David Toman; Mariano P. Consens; M. Tamer Özsu
The W3C XQuery language recommendation, based on a hierarchical and ordered document model, supports a wide variety of constructs and use cases. There is a diversity of approaches and strategies for evaluating XQuery expressions, in many cases only dealing with limited subsets of the language. In this paper we describe an implementation approach that handles XQuery with arbitrarily-nested FLWR expressions, element constructors and built-in functions (including structural comparisons). Our proposal maps an XQuery expression to a single equivalent SQL query using a novel dynamic interval encoding of a collection of XML documents as relations, augmented with information tied to the query evaluation environment. The dynamic interval technique enables (suitably enhanced) relational engines to produce predictably good query plans that do not preclude the use of sort-merge join query operators. The benefits are realized despite the challenges presented by intermediate results that create arbitrary documents and the need to preserve document order as prescribed by semantics of XQuery. Finally, our experimental results demonstrate that (native or relational) XML systems can benefit from the above technique to avoid a quadratic scale up penalty that effectively prevents the evaluation of nested FLWR expressions for large documents.
very large data bases | 2009
Lei Zou; Lei Chen; M. Tamer Özsu
The growing popularity of graph databases has generated interesting data management problems, such as subgraph search, shortest-path query, reachability verification, and pattern match. Among these, a pattern match query is more flexible compared to a subgraph search and more informative compared to a shortest-path or reachability query. In this paper, we address pattern match problems over a large data graph G. Specifically, given a pattern graph (i.e., query Q), we want to find all matches (in G) that have the similar connections as those in Q. In order to reduce the search space significantly, we first transform the vertices into points in a vector space via graph embedding techniques, coverting a pattern match query into a distance-based multi-way join problem over the converted vector space. We also propose several pruning strategies and a join order selection method to process join processing efficiently. Extensive experiments on both real and synthetic datasets show that our method outperforms existing ones by orders of magnitude.
ACM Transactions on Database Systems | 1989
Abdel Aziz Farrag; M. Tamer Özsu
When the only information available about transactions is syntactic information, serializability is the main correctness criterion for concurrency control. Serializability requires that the execution of each transaction must appear to every other transaction as a single atomic step (i.e., the execution of the transaction cannot be interrupted by other transactions). Many researchers, however, have realized that this requirement is unnecessarily strong for many applications and can significantly increase transaction response time. To overcome this problem, a new approach for controlling concurrency that exploits the semantic information available about transactions to allow controlled nonserializable interleavings has recently been proposed. This approach is useful when the cost of producing only serializable interleavings is unacceptably high. The main drawback of the approach is the extra overhead incurred by utilizing the semantic information. We examine this new approach in this paper and discuss its strengths and weaknesses. We introduce a new formalization for the concurrency control problem when semantic information is available about the transactions. This semantic information takes the form of transaction types, transaction steps, and transaction break-points. We define a new class of “safe” schedules called relatively consistent (RC) schedules. This class contains serializable as well as nonserializable schedules. We prove that the execution of an RC schedule cannot violate consistency and propose a new concurrency control mechanism that produces only RC schedules. Our mechanism assumes fewer restrictions on the interleavings among transactions than previously introduced semantic-based mechanisms.
very large data bases | 2011
Lei Zou; Jinghui Mo; Lei Chen; M. Tamer Özsu; Dongyan Zhao
Due to the increasing use of RDF data, efficient processing of SPARQL queries over RDF datasets has become an important issue. However, existing solutions suffer from two limitations: 1) they cannot answer SPARQL queries with wildcards in a scalable manner; and 2) they cannot handle frequent updates in RDF repositories efficiently. Thus, most of them have to reprocess the dataset from scratch. In this paper, we propose a graph-based approach to store and query RDF data. Rather than mapping RDF triples into a relational database as most existing methods do, we store RDF data as a large graph. A SPARQL query is then converted into a corresponding subgraph matching query. In order to speed up query processing, we develop a novel index, together with some effective pruning rules and efficient search algorithms. Our method can answer exact SPARQL queries and queries with wildcards in a uniform manner. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solution.
knowledge discovery and data mining | 2003
Şule Gündüz; M. Tamer Özsu
Predicting the next request of a user as she visits Web pages has gained importance as Web-based activity increases. Markov models and their variations, or models based on sequence mining have been found well suited for this problem. However, higher order Markov models are extremely complicated due to their large number of states whereas lower order Markov models do not capture the entire behavior of a user in a session. The models that are based on sequential pattern mining only consider the frequent sequences in the data set, making it difficult to predict the next request following a page that is not in the sequential pattern. Furthermore, it is hard to find models for mining two different kinds of information of a user session. We propose a new model that considers both the order information of pages in a session and the time spent on them. We cluster user sessions based on their pair-wise similarity and represent the resulting clusters by a click-stream tree. The new user session is then assigned to a cluster based on a similarity measure. The click-stream tree of that cluster is used to generate the recommendation set. The model can be used as part of a cache prefetching system as well as a recommendation model.