Is this you? Create Your Porfile

David Broneske

Otto-von-Guericke University Magdeburg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Broneske is active.

Explore More

Publication

Featured researches published by David Broneske.

very large data bases | 2015

Database Scan Variants on Modern CPUs: A Performance Study

David Broneske; Sebastian Breß; Gunter Saake

Main-memory databases rely on highly tuned database operations to achieve peak performance. Recently, it has been shown that different code optimizations for database operations favor different processors. However, it is still not clear how the combination of code optimizations (e.g., loop unrolling and vectorization) will affect the performance of database algorithms on different processors.

international conference on data engineering | 2017

Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine's Perspective

Marcus Pinnecke; David Broneske; Gabriel Campero Durand; Gunter Saake

Employing special-purpose processors (e.g., GPUs) in database systems has been studied throughout the last decade. Research on heterogeneous database systems that use both general-and special-purpose processors has addressed either transaction-or analytic processing, but not the combination of them. Support for hybrid transaction-and analytic processing (HTAP) has been studied exclusively for CPU-only systems. In this paper we ask the question whether current systems are ready for HTAP workload management with cooperating generaland special-purpose processors. For this, we take the perspective of the backbone of database systems: the storage engine. We propose a unified terminology and a comprehensive taxonomy to compare state-of-the-art engines from both domains. We show similarities and differences, and determine a necessary set of features for engines supporting HTAP workload on CPUs and GPUs. Answering our research question, our findings yield a resolute: not yet.

database and expert systems applications | 2017

Interactive Chord Visualization for Metaproteomics

Roman Zoun; Kay Schallert; David Broneske; Robert Heyer; Dirk Benndorf; Gunter Saake

Metaproteomics is an analytic approach to research microorganisms that live in complex microbial communities. A key aspect of understanding microbial communities is to link the functions of proteins identified by metaproteomics to their taxonomy. In this paper we demonstrate the interactive chord visualization as a powerful tool to explore such data. To evaluate the tools efficacy, we use the relation data between functions and taxonomies from a large metaproteomics experiment. We evaluated the work flow in comparison to previous methods of data analysis and showed that interactive exploration of data using the chord diagram is significantly faster in four of five tasks. Therefore, the chord visualization improves the users ability to discover complex biological relationships.

Information Technology | 2017

Exploiting capabilities of modern processors in data intensive applications

David Broneske; Gunter Saake

Abstract In main-memory database systems, the time to process the data has become a limiting factor due to the missing access gap. With changing processing capabilities (e.g., branch prediction, pipelining) in every new CPU architecture, code that was optimal once will probably not stay the best code forever. In this article, we analyze processing capabilities of the classical CPU and describe code optimizations to exploit the capabilities. Furthermore, we present state-of-the-art compiler techniques that already implement code optimizations, while also showing gaps for further code optimization integration.

international conference on data engineering | 2017

Accelerating Multi-Column Selection Predicates in Main-Memory - The Elf Approach

David Broneske; Veit Köppen; Gunter Saake; Martin Schäler

Evaluating selection predicates is a data-intensive task that reduces intermediate results, which are the input for further operations. With analytical queries getting more and more complex, the number of evaluated selection predicates per query and table rises, too. This leads to numerous multicolumn selection predicates. Recent approaches to increase the performance of main-memory databases for selection-predicate evaluation aim at optimally exploiting the speed of the CPU by using accelerated scans. However, scanning each column one by one leaves tuning opportunities open that arise if all predicates are considered together. To this end, we introduce Elf, an index structure that is able to exploit the relation between several selection predicates. Elf features cache sensitivity, an optimized storage layout, fixed search paths, and slight data compression. In our evaluation, we compare its query performance to state-of the-art approaches and a sequential scan using SIMD capabilities. Our results indicate a clear superiority of our approach for multicolumn selection predicate queries with a low combined selectivity. For TPC-H queries with multi-column selection predicates, we achieve a speed-up between a factor of five and two orders of magnitude, mainly depending on the selectivity of the predicates.

international conference on management of data | 2015

Adaptive Reprogramming for Databases on Heterogeneous Processors

David Broneske

It is clear by now that modern processing hardware gets increasingly heterogeneous, which forces data processing algorithms to care about the underlying hardware. However, current approaches for implementing data intensive operators (e.g., in database systems) either cause enormous programming effort for tuning one algorithm to several processors (the hardware-sensitive way), or do not fully exploit possible performance possibilities because of an abstract operator description (the hardware-oblivious way). In this thesis, we propose an algorithm optimizer, which automatically tunes a hardware-oblivious operator description to the underlying hardware. This way, the DBMS can rewrite its operator code until it runs optimally on the given hardware.

very large data bases | 2018

An eight-dimensional systematic evaluation of optimized search algorithms on modern processors

Lars-Christian Schulz; David Broneske; Gunter Saake

Searching in sorted arrays of keys is a common task with a broad range of applications. Often searching is part of the performance critical sections of a database query or index access, raising the question what kind of search algorithm to choose and how to optimize it to obtain the best possible performance on real-world hardware. This paper strives to answer this question by evaluating a large set of optimized sequential, binary and k-ary search algorithms on a modern processor. In this context, we consider hardware-sensitive optimization strategies as well as algorithmic variations resulting in an eight-dimensional evaluation space. As a result, we give insights on expected interactions between search algorithms and optimizations on modern hardware. In fact, there is no single best optimized algorithm, leading to a set of advices on which variants should be considered first given a particular array size. PVLDB Reference Format: Lars-Christian Schulz, David Broneske, Gunter Saake. An EightDimensional Systematic Evaluation of Optimized Search Algorithms on Modern Processors. PVLDB, 11 (11): 1550-1562, 2018. DOI: https://doi.org/10.14778/3236187.3236205

international conference on management of data | 2018

GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning

Gabriel Campero Durand; Marcus Pinnecke; Rufat Piriyev; Mahmoud Mohsen; David Broneske; Gunter Saake; Maya S. Sekeran; Fabián Rodriguez; Laxmi Balami

In this paper we define a research agenda to develop a general framework supporting online autonomous tuning of data partitioning and layouts with a reinforcement learning formulation. We establish the core elements of our approach: agent, environment, action space and supporting components. Externally predicted workloads and the current physical design serve as input to our agent. The environment guides the search process by generating immediate rewards based on fresh cost estimates, for either the entirety or a sample of queries from the workload, and by deciding the possible actions given a state. This set of actions is configurable, enabling the representation of different partitioning problems. For use in an online setting the agent learns a fixed-length sequence of n actions that maximize the temporal reward for the predicted workload. Through an initial implementation we assert the feasibility of our approach. To conclude, we list open challenges for this work.

database and expert systems applications | 2018

Protein Identification as a Suitable Application for Fast Data Architecture

Roman Zoun; Gabriel Campero Durand; Kay Schallert; Apoorva Patrikar; David Broneske; Wolfram Fenske; Robert Heyer; Dirk Benndorf; Gunter Saake

Metaproteomics is a field of biology research that relies on mass spectrometry to characterize the protein complement of microbiological communities. Since only identified data can be analyzed, identification algorithms such as X!Tandem, OMSSA and Mascot are essential in the domain, to get insights into the biological experimental data. However, protein identification software has been developed for proteomics. Metaproteomics, in contrast, involves large biological communities, gigabytes of experimental data per sample, and greater amounts of comparisons, given the mixed culture of species in the protein database. Furthermore, the file-based nature of current protein identification tools makes them ill-suited for future metaproteomics research. In addition, possible medical use cases of metaproteomics require near real-time identification. From the technology perspective, Fast Data seems promising to increase throughput and performance of protein identification in a metaproteomics workflow. In this paper we analyze the core functions of the established protein identification engine X!Tandem and show that streaming Fast Data architectures are suitable for protein identification. Furthermore, we point out the bottlenecks of the current algorithms and how to remove them with our approach.

advances in databases and information systems | 2018

Streaming FDR Calculation for Protein Identification.

Roman Zoun; Kay Schallert; Atin Janki; Rohith Ravindran; Gabriel Campero Durand; Wolfram Fenske; David Broneske; Robert Heyer; Dirk Benndorf; Gunter Saake

Identification of proteins is a key step of metaproteomics research. This protein identification task should be migrated to a fast data streaming architecture to increase horizontal scalability and performance. A protein database search involves two steps: the pairwise matching of experimental spectra against protein sequences creating peptide-spectrum-matches (PSM) and the statistical validation of PSMs. The peptide-spectrum-matching is inherently parallelizable since each match is independent. However, false positive matches are inherent to this method due to measurement errors and artifacts, thus requiring statistical validation. State of the art validation is achieved using the target-decoy method, which estimates the false discovery rate (FDR) by searching against a shuffled version of the original protein database. In contrast to the protein database search, validation by target-decoy is not parallelizable, because the FDR approximation requires all experimental data at once. In short, when using a fast data architecture for the workflow, the target-decoy approach is no longer feasible. Hence a novel approach is required to avoid false discovery of PSM on streaming single-pass experimental data. To this end, the recently proposed nokoi classifier seems promising to solve the aforementioned problems. In this paper, we present a general nokoi pipeline to create such a decoy-free classifier, that reach over 95% accuracy for general metaproteomics data.

Explore More