Florian Waas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Florian Waas is active.

Explore More

Publication

Featured researches published by Florian Waas.

international workshop on testing database systems | 2011

The mixed workload CH-benCHmark

Richard L. Cole; Florian Funke; Leo Giakoumakis; Wey Guy; Alfons Kemper; Stefan Krompass; Harumi A. Kuno; Raghunath Nambiar; Thomas Neumann; Meikel Poess; Kai-Uwe Sattler; Michael Seibold; Eric Simon; Florian Waas

While standardized and widely used benchmarks address either operational or real-time Business Intelligence (BI) workloads, the lack of a hybrid benchmark led us to the definition of a new, complex, mixed workload benchmark, called mixed workload CH-benCHmark. This benchmark bridges the gap between the established single-workload suites of TPC-C for OLTP and TPC-H for OLAP, and executes a complex mixed workload: a transactional workload based on the order entry processing of TPC-C and a corresponding TPC-H-equivalent OLAP query suite run in parallel on the same tables in a single database system. As it is derived from these two most widely used TPC benchmarks, the CH-benCHmark produces results highly relevant to both hybrid and classic single-workload systems.

International Journal of Data Warehousing and Mining | 2013

On-Demand ELT Architecture for Right-Time BI: Extending the Vision

Florian Waas; Robert Wrembel; Tobias Freudenreich; Maik Thiele; Christian Koncilia; Pedro Furtado

In a typical BI infrastructure, data, extracted from operational data sources, is transformed, cleansed, and loaded into a data warehouse by a periodic ETL process, typically executed on a nightly basis, i.e., a full days worth of data is processed and loaded during off-hours. However, it is desirable to have fresher data for business insights at near real-time. To this end, the authors propose to leverage a data warehouses capability to directly import raw, unprocessed records and defer the transformation and data cleaning until needed by pending reports. At that time, the databases own processing mechanisms can be deployed to process the data on-demand. Event-processing capabilities are seamlessly woven into our proposed architecture. Besides outlining an overall architecture, the authors also developed a roadmap for implementing a complete prototype using conventional database technology in the form of hierarchical materialized views.

international conference on data engineering | 2011

Dynamic prioritization of database queries

Sivaramakrishnan Narayanan; Florian Waas

Enterprise database systems handle a variety of diverse query workloads that are of different importance to the business. For example, periodic reporting queries are usually mission critical whereas ad-hoc queries by analysts tend to be less crucial. It is desirable to enable database administrators to express (and modify) the importance of queries at a simple and intuitive level. The mechanism used to enforce these priorities must be robust, adaptive and efficient.

international workshop on testing database systems | 2012

Testing the accuracy of query optimizers

Zhongxian Gu; Mohamed Soliman; Florian Waas

The accuracy of a query optimizer is intricately connected with a database system performance and its operational cost: the more accurate the optimizers cost model, the better the resulting execution plans. Database application programmers and other practitioners have long provided anecdotal evidence that database systems differ widely with respect to the quality of their optimizers, yet, to date no formal method is available to database users to assess or refute such claims.n In this paper, we develop a framework to quantify an optimizers accuracy for a given workload. We make use of the fact that optimizers expose switches or hints that let users influence the plan choice and generate plans other than the default plan. Using these implements, we force the generation of multiple alternative plans for each test case, time the execution of all alternatives and rank the plans by their effective costs. We compare this ranking with the ranking of the estimated cost and compute a score for the accuracy of the optimizer.n We present initial results of an anonymized comparisons for several major commercial database systems demonstrating that there are in fact substantial differences between systems. We also suggest ways to incorporate this knowledge into the commercial development process.

international workshop on testing database systems | 2011

Plan space analysis: an early warning system to detect plan regressions in cost-based optimizers

Florian Waas; Leo Giakoumakis; Shin Zhang

Plan regressions pose a significant problem in commercial database systems: Seemingly innocuous changes to a query optimizer component such as the cost model or the search strategy in order to enhance optimization results may result in unexpected and detrimental changes to previously satisfactory query plans.n Database vendors spend substantial resources on quality assurance to guard against this very issue, yet, testing for plan regressions in optimizers has proven hard and inconclusive. This is due to the nature of the problem: the optimizer chooses a single plan---Best Plan Found (bpf)---from a search space of literally up to hundreds of millions of different plan alternatives. It is standard practice to use a known good bpf and test for changes to this plan, i. e., ensure that no changes have occurred. However, in the vast majority of cases the bpf is not be affected by a code-level change, even though the change is known to affect many plans in the search space.n In this paper, we propose a holistic approach to address this issue. Instead of focusing on test suites consisting of BPFS we take the entire search space into account. We introduce a metric to assess the optimizers accuracy across the entire search space.n We present preliminary results using a commercial database system, demonstrate the usefulness of our methodology with a standard benchmark, and illustrate how to build such an early warning system.

international conference on data engineering | 2012

Automatic Data Placement in MPP Databases

Carlos Garcia-Alvarado; Venkatesh Raghavan; Sivaramakrishnan Narayanan; Florian Waas

Physical design for shared-nothing databases includes decisions regarding the placement of data across a cluster of database servers. In particular, for each table in the database a distribution policy must be specified. In general, the choice of distribution policy affects the performance of query workloads significantly as individual queries may have to redistribute data on-the-fly as part of the execution. As is the case with a number of other physical design decisions, the problem is hard and poses substantial difficulties for database administrators. In this paper, we present FINDER, a design tool that optimizes data placement decisions for a database schema with respect to any given query workload. We designed FINDER with portability in mind: The tool is fully external to the target database system, i.e., does not require any code-level integration with the system, and avoids reverse engineering of query optimization techniques. Our experiments show FINDER converges quickly and delivers superior results compared to state-of-the-art solutions.

international workshop on testing database systems | 2012

Automatic capture of minimal, portable, and executable bug repros using AMPERe

Lyublena Antova; Konstantinos Krikellas; Florian Waas

Query optimizers are among the most complex software components in any database system and are naturally prone to contain software defects. Despite significant efforts in quality assurance, customers occasionally encounter unexpected errors in production systems. A self-contained repro of the problem is often the best approach toward a speedy resolution of the issue. However, repros are notoriously difficult to obtain as they require schema definition, the offending query, and potentially many other pieces of data that are difficult to capture in a consistent and accurate way. As a result, query optimizer issues have a reputation of being hard to tackle and requiring dedicated resources and exceedingly long turnaround time to provide solution to the customer.n In this paper we present AMPERe, a mechanism to automatically secure fully self-contained bug repros, as implemented in the optimizer of Greenplum Database. Raising an internal error or run-time assertion automatically triggers the generation of AMPERe-dumps. Similar in nature to error reports of operating systems, AMPERe goes beyond such tools as it delivers a complete minimal repro that allows replaying the problem instantly in isolation on any lab machine. We present the overall architecture of this framework and report on initial experiences with AMPERe as part of Greenplums regular software development practices.

business intelligence for the real-time enterprises | 2012

An On-Demand ELT Architecture for Real-Time BI

Tobias Freudenreich; Pedro Furtado; Christian Koncilia; Maik Thiele; Florian Waas; Robert Wrembel

Online or real-time BI has remained elusive despite significant efforts by academic and industrial research. Some of the most prominent problems in accomplishing faster turnaround are related to the data ingest. The process of extracting data from source systems, transforming, and loading (ETL) it is often bottlenecked by architectural choices and fragmentation of the processing chain.

international conference on data engineering | 2013

Total operator state recall — Cost-effective reuse of results in Greenplum Database

George Constantin Caragea; Carlos Garcia-Alvarado; Michalis Petropoulos; Florian Waas

Recurring queries or partial queries occur very frequently in production workloads. Reusing results or intermediates presents a highly intuitive opportunity for performance enhancements and has been explored to various degrees. Strategies suggested so far in the literature depend largely on speculative materialization of results in the hopes that they can be reused later on. However, a materialization is costly and conventional strategies run the risk that the initial investment cannot be amortized reliably unless the exact composition of the workload is known to the strategy a priori.

international conference on data engineering | 2010

Database architecture (R)evolution: New hardware vs. new software

Stavros Harizopoulos; Tassos Argyros; Peter A. Boncz; Dan Dietterich; Samuel Madden; Florian Waas

The last few years have been exciting for data management system designers. The explosion in user and enterprise data coupled with the availability of newer, cheaper, and more capable hardware have lead system designers and researchers to rethink and, in some cases, reinvent the traditional DBMS architecture. In the space of data warehousing and analytics alone, more than a dozen new database product offerings have recently appeared, and dozens of research system papers are routinely published each year. Among these efforts, one school of thought promotes research on exploiting and anticipating new hardware (many-core CPUs [4, 7, 8], GPUs [3], FPGAs [5, 11], flash SSDs [6], other non-volatile storage technologies). Another school of thought focuses on software and algorithmic issues (column and hybrid stores [1, 10, 13], scale out architectures using commodity hardware [2, 9, 10, 13], optimizations in network and OS software stack [9]). And, at the same time, there are approaches that combine hardware-specific optimizations with from-scratch database software design [12].

Explore More