Norman May | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Norman May is active.

Explore More

Publication

Featured researches published by Norman May.

international conference on data engineering | 2004

Nested queries and quantifiers in an ordered context

Norman May; Sven Helmer; Guido Moerkotte

We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by unnesting.

international conference on management of data | 2013

Timeline index: a unified data structure for processing queries on temporal data in SAP HANA

Martin Kaufmann; Amin Amiri Manjili; Panagiotis Vagenas; Peter Fischer; Donald Kossmann; Franz Färber; Norman May

Managing temporal data is becoming increasingly important for many applications. Several database systems already support the time dimension, but provide only few temporal operators, which also often exhibit poor performance characteristics. On the academic side, a large number of algorithms and data structures have been proposed, but they often address a subset of these temporal operators only. In this paper, we develop the Timeline Index as a novel, unified data structure that efficiently supports temporal operators such as temporal aggregation, time travel, and temporal joins. As the Timeline Index is independent of the physical order of the data, it provides flexibility in physical design; e.g., it supports any kind of compression scheme, which is crucial for main memory column stores. Our experiments show that the Timeline Index has predictable performance and beats state-of-the-art approaches significantly, sometimes by orders of magnitude.

ACM Transactions on Database Systems | 2006

Strategies for query unnesting in XML databases

Norman May; Sven Helmer; Guido Moerkotte

Queries formulated in a nested way are very common in XQuery. Unfortunately, their evaluation is usually very inefficient when done in a straightforward fashion. We present a framework for handling nested queries that is based on unnesting the queries after having translated them into an algebra. We not only present a collection of algebraic equivalences, but also supply a strategy on how to use them effectively. The full potential of the approach is demonstrated by applying our rewrites to actual queries and showing that performance gains of several orders of magnitude are possible.

international xml database symposium | 2006

Index vs. navigation in XPath evaluation

Norman May; Matthias Brantner; Alexander Böhm; Carl-Christian Kanne; Guido Moerkotte

A well-known rule of thumb claims, it is better to scan than to use an index when more than 10% of the data are accessed. This rule was formulated for relational databases. But is it still valid for XML queries? In this paper we develop similar rules of thumb for XML queries by experimentally comparing different execution strategies, e.g. using navigation or indices. These rules can be used immediately for heuristic optimization of XML queries, and in the long run, they may serve as a foundation for cost-based query optimization in XQuery.

very large data bases | 2015

Scaling up concurrent main-memory column-store scans: towards adaptive NUMA-aware data and task placement

Iraklis Psaroudakis; Tobias Scheuer; Norman May; Abdelkader Sellami; Anastasia Ailamaki

Main-memory column-stores are called to efficiently use modern non-uniform memory access (NUMA) architectures to service concurrent clients on big data. The efficient usage of NUMA architectures depends on the data placement and scheduling strategy of the column-store. Most column-stores choose a static strategy that involves partitioning all data across the NUMA architecture, and employing a stealing-based task scheduler. In this paper, we implement different strategies for data placement and task scheduling for the case of concurrent scans. We compare these strategies with an extensive sensitivity analysis. Our most significant findings include that unnecessary partitioning can hurt throughput by up to 70%, and that stealing memory-intensive tasks can hurt throughput by up to 58%. Based on our analysis, we envision a design that adapts the data placement and task scheduling strategy to the workload.

international conference on data engineering | 2007

Unnesting Scalar SQL Queries in the Presence of Disjunction

Matthias Brantner; Norman May; Guido Moerkotte

Optimizing nested queries is an intricate problem. It becomes even harder if in a nested query the linking predicate or the correlation predicate occurs disjunctively. We present the first unnesting strategy that can effectively deal with such queries. The starting point of our approach is to translate SQL into the relational algebra extended by bypass operators. Then we present for the first time unnesting equivalences which are valid for algebraic expressions containing bypass operators. Applying these to the translated queries results in our effective unnesting strategy for nested SQL queries with disjunction. With an extensive experimental study (including three commercial DBMSs), we demonstrate the possible performance gains of our approach.

international xml database symposium | 2003

Three Cases for Query Decorrelation in XQuery

Norman May; Sven Helmer; Guido Moerkotte

tpc technology conference | 2014

Scaling Up Mixed Workloads: A Battle of Data Freshness, Flexibility, and Scheduling

Iraklis Psaroudakis; Florian Wolf; Norman May; Thomas Neumann; Alexander Böhm; Anastasia Ailamaki; Kai-Uwe Sattler

The common “one size does not fit all” paradigm isolates transactional and analytical workloads into separate, specialized database systems. Operational data is periodically replicated to a data warehouse for analytics. Competitiveness of enterprises today, however, depends on real-time reporting on operational data, necessitating an integration of transactional and analytical processing in a single database system. The mixed workload should be able to query and modify common data in a shared schema. The database needs to provide performance guarantees for transactional workloads, and, at the same time, efficiently evaluate complex analytical queries. In this paper, we share our analysis of the performance of two main-memory databases that support mixed workloads, SAP HANA and HyPer, while evaluating the mixed workload CH-benCHmark. By examining their similarities and differences, we identify the factors that affect performance while scaling the number of concurrent transactional and analytical clients. The three main factors are (a) data freshness, i.e., how recent is the data processed by analytical queries, (b) flexibility, i.e., restricting transactional features in order to increase optimization choices and enhance performance, and (c) scheduling, i.e., how the mixed workload utilizes resources. Specifically for scheduling, we show that the absence of workload management under cases of high concurrency leads to analytical workloads overwhelming the system and severely hurting the performance of transactional workloads.

international conference on management of data | 2014

Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA

Guido Moerkotte; David DeHaan; Norman May; Anisoara Nica; Alexander Boehm

Histograms that guarantee a maximum multiplicative error (q-error) for estimates may significantly improve the plan quality of query optimizers. However, the construction time for histograms with maximum q-error was too high for practical use cases. In this paper we extend this concept with a threshold, i.e., an estimate or true cardinality θ, below which we do not care about the q-error because we still expect optimal plans. This allows us to develop far more efficient construction algorithms for histograms with bounded error. The test for θ, q-acceptability developed also exploits the order-preserving dictionary encoding of SAP HANA. We have integrated this family of histograms into SAP HANA, and we report on the construction time, histograms size, and estimation errors on real-world data sets. In virtually all cases the histograms can be constructed in far less than one second, requiring less than 5% of space compared to the original compressed data.

international conference on data engineering | 2015

Bi-temporal Timeline Index: A data structure for Processing Queries on bi-temporal data

Martin Kaufmann; Peter Fischer; Norman May; Chang Ge; Anil K. Goel; Donald Kossmann

Following the adoption of basic temporal features in the SQL:2011 standard, there has been a tremendous interest within the database industry in supporting bi-temporal features, as a significant number of real-life workloads would greatly benefit from efficient temporal operations. However, current implementations of bi-temporal storage systems and operators are far from optimal. In this paper, we present the Bi-temporal Timeline Index, which supports a broad range of temporal operators and exploits the special properties of an in-memory column store database system. Comprehensive performance experiments with the TPC-BiH benchmark show that algorithms based on the Bi-temporal Timeline Index outperform significantly both existing commercial database systems and state-of-the-art data structures from research.

Explore More