Is this you? Create Your Porfile

Karl Schnaitter

University of California, Santa Cruz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karl Schnaitter is active.

Explore More

Publication

Featured researches published by Karl Schnaitter.

international conference on management of data | 2006

COLT: continuous on-line tuning

Karl Schnaitter; Serge Abiteboul; Tova Milo; Neoklis Polyzotis

The physical schema of a database plays a critical role in performance. Self-tuning is a cost-effective and elegant solution to optimize the physical configuration for the characteristics of the query load. Existing techniques operate in an off-line fashion, by choosing a fixed configuration that is tailored to a subset of the query load. The generated configurations therefore ignore any temporal patterns that may exist in the actual load submitted to the system.This demonstration introduces COLT (Continuous On-Line Tuning), a novel self-tuning framework that continuously monitors the incoming queries and adjusts the system configuration in order to maximize query performance. The key idea behind COLT is to gather performance statistics at different levels of detail and to carefully allocate profiling resources to the most promising candidate configurations. Moreover, COLT uses effective heuristics to regulate its own performance, lowering its overhead when the system is well-tuned, and being more aggressive when the workload shifts and it becomes necessary to re-tune the system. We present a specialization of COLT to the important problem of selecting an effective set of relational indices for the current query load. Our demonstration will use an implementation of our proposed framework in the PostgreSQL database system, showing the internal operation of COLT and the adaptive selection of indices as we vary the query load of the server.

symposium on principles of database systems | 2008

Evaluating rank joins with optimal cost

Karl Schnaitter; Neoklis Polyzotis

In the rank join problem, we are given a set of relations and a scoring function, and the goal is to return the join results with the top K scores. It is often the case in practice that the inputs may be accessed in ranked order and the scoring function is monotonic. These conditions allow for efficient algorithms that solve the rank join problem without reading all of the input. In this paper, we present a thorough analysis of such rank join algorithms. A strong point of our analysis is that it is based on a more general problem statement than previous work, making it more relevant to the execution model that is employed by database systems. One of our results indicates that the well known HRJN algorithm has shortcomings, because it does not stop reading its input as soon as possible. We find that it is NP-hard to overcome this weakness in the general case, but cases of limited query complexity are tractable. We prove the latter with an algorithm that infers provably tight bounds on the potential benefit of reading more input in order to stop as soon as possible. As a result, the algorithm achieves a cost that is within a constant factor of optimal.

international conference on data engineering | 2007

On-Line Index Selection for Shifting Workloads

Karl Schnaitter; Serge Abiteboul; Tova Milo; Neoklis Polyzotis

This paper introduces COLT (continuous on-line tuning), a novel framework that continuously monitors the workload of a database system and enriches the existing physical design with a set of effective indices. The key idea behind COLT is to gather performance statistics at different levels of detail and to carefully allocate profiling resources to the most promising candidate configurations. Moreover, COLT uses effective heuristics to self-regulate its own performance, lowering its overhead when the system is well tutted and being more aggressive when the workload shifts and it becomes necessary to re-tune the system. We describe an implementation of the proposed framework in the PostgreSQL database system and evaluate its performance experimentally. Our results validate the effectiveness of COLT and demonstrate its ability to modify the system configuration in response to changes in the query load.

very large data bases | 2007

Depth estimation for ranking query optimization

Karl Schnaitter; Joshua Spiegel; Neoklis Polyzotis

A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators that integrate the selection of top tuples with join processing. These operators access just “enough” of the input in order to generate just “enough” output and can offer significant speed-ups for query evaluation. The number of input tuples that an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized physical plans. We introduce an estimation methodology, termed deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization of the formal deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the performance of deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of deep as an estimation method and demonstrate its advantages over previously proposed techniques.

very large data bases | 2009

Index interactions in physical design tuning: modeling, analysis, and applications

Karl Schnaitter; Neoklis Polyzotis; Lise Getoor

One of the key tasks of a database administrator is to optimize the set of materialized indices with respect to the current workload. To aid administrators in this challenging task, commercial DBMSs provide advisors that recommend a set of indices based on a sample workload. It is left for the administrator to decide which of the recommended indices to materialize and when. This decision requires some knowledge of how the indices benefit the workload, which may be difficult to understand if there are any dependencies or interactions among indices. Unfortunately, advisors do not provide this crucial information as part of the recommendation. Motivated by this shortcoming, we propose a framework and associated tools that can help an administrator understand the interactions within the recommended set of indices. We formalize the notion of index interactions and develop a novel algorithm to identify the interaction relationships that exist within a set of indices. We present experimental results with a prototype implementation over IBM DB2 that demonstrate the efficiency of our approach. We also describe two new database tuning tools that utilize information about index interactions. The first tool visualizes interactions based on a partitioning of the index-set into non-interacting subsets, and the second tool computes a schedule that materializes the indices over several maintenance windows with maximal overall benefit. In both cases, we provide strong analytical results showing that index interactions can enable enhanced functionality.

ACM Transactions on Database Systems | 2010

Optimal algorithms for evaluating rank joins in database systems

Karl Schnaitter; Neoklis Polyzotis

In the rank join problem, we are given a set of relations and a scoring function, and the goal is to return the join results with the top k scores. It is often the case in practice that the inputs may be accessed in ranked order and the scoring function is monotonic. These conditions allow for efficient algorithms that solve the rank join problem without reading all of the input. In this article, we present a thorough analysis of such rank join algorithms. A strong point of our analysis is that it is based on a more general problem statement than previous work, making it more relevant to the execution model that is employed by database systems. One of our results indicates that the well-known HRJN algorithm has shortcomings, because it does not stop reading its input as soon as possible. We find that it is NP-hard to overcome this weakness in the general case, but cases of limited query complexity are tractable. We prove the latter with an algorithm that infers provably tight bounds on the potential benefit of reading more input in order to stop as soon as possible. As a result, the algorithm achieves a cost that is within a constant factor of optimal.

international conference on data engineering | 2009

A Benchmark for Online Index Selection

Karl Schnaitter; Neoklis Polyzotis

Online approaches to physical design tuning have received considerable attention in the recent literature, with a focus on the problem of online index selection. However, it is difficult to draw conclusions on the relative merits of the proposed techniques, as they have been evaluated in isolation using different methodologies. In this paper, we make two concrete contributions to address this issue. First, we propose a benchmark for evaluating the performance of an online tuning algorithm in a principled fashion. Second, using the benchmark, we present a comparison of two representative online tuning algorithms that are implemented in the same database system. The results provide interesting insights on the behavior of these algorithms and validate the usefulness of the proposed benchmark.

international conference on management of data | 2010

An automated, yet interactive and portable DB designer

Debabrata Dash; Karl Schnaitter; Anastasia Ailamaki; Neoklis Polyzotis

Tuning tools attempt to configure a database to achieve optimal performance for a given workload. Selecting an optimal set of physical structures is computationally hard since it involves searching a vast space of possible configurations. Commercial DBMSs offer tools that can address this problem. The usefulness of such tools, however, is limited by their dependence on greedy heuristics, the need for a-priori (offline) knowledge of the workload, and lack of an optimal materialization schedule to get the best out of suggested design features. Moreover, the open source DBMSs do not provide any automated tuning tools. This demonstration introduces a comprehensive physical designer for the PostgreSQL open source DBMS. The tool suggests design features for both offline and online workloads. It provides close to optimal suggestions for indexes for a given workload by modeling the problem as a combinatorial optimization problem and solving it by sophisticated and mature solvers. It also determines the interaction between indexes to suggest an effective materialization strategy for the selected indexes. The tool is interactive as it allows the database administrator (DBA) to suggest a set of candidate features and shows their benefits and interactions visually. For the demonstration we use large real-world scientific datasets and query workloads.

symposium on principles of database systems | 2010

Computing query probability with incidence algebras

Nilesh N. Dalvi; Karl Schnaitter; Dan Suciu

international conference on management of data | 2008

The repeatability experiment of SIGMOD 2008

Ioana Manolescu; Loredana Afanasiev; Andrei Arion; Jens Dittrich; Stefan Manegold; Neoklis Polyzotis; Karl Schnaitter; Pierre Senellart; Spyros Zoupanos; Dennis E. Shasha

Explore More