Philipp Rösch
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Philipp Rösch.
very large data bases | 2012
Philipp Rösch; Lars Dannecker; Franz Färber; Gregor Hackenbroich
With the SAP HANA database, SAP offers a high-performance in-memory hybrid-store database. Hybrid-store databases---that is, databases supporting row- and column-oriented data management---are getting more and more prominent. While the columnar management offers high-performance capabilities for analyzing large quantities of data, the row-oriented store can handle transactional point queries as well as inserts and updates more efficiently. To effectively take advantage of both stores at the same time the novel question whether to store the given data row- or column-oriented arises. We tackle this problem with a storage advisor tool that supports database administrators at this decision. Our proposed storage advisor recommends the optimal store based on data and query characteristics; its core is a cost model to estimate and compare query execution times for the different stores. Besides a per-table decision, our tool also considers to horizontally and vertically partition the data and manage the partitions on different stores. We evaluated the storage advisor for the use in the SAP HANA database; we show the recommendation quality as well as the benefit of having the data in the optimal store with respect to increased query performance.
statistical and scientific database management | 2008
Rainer Gemulla; Philipp Rösch; Wolfgang Lehner
Random sampling is a popular technique for providing fast approximate query answers, especially in data warehouse environments. Compared to other types of synopses, random sampling bears the advantage of retaining the datasets dimensionality; it also associates probabilistic error bounds with the query results. Most of the available sampling techniques focus on table-level sampling, that is, they produce a sample of only a single database table. Queries that contain joins over multiple tables cannot be answered with such samples because join results on random samples are often small and skewed. On the contrary, schema-level sampling techniques by design support queries containing joins. In this paper, we introduce Linked Bernoulli Synopses, a schema-level sampling scheme based upon the well-known Join Synopses. Both schemes rely on the idea of maintaining foreign-key integrity in the synopses; they are therefore suited to process queries containing arbitrary foreign-key joins. In contrast to Join Synopses, however, Linked Bernoulli Synopses correlate the sampling processes of the different tables in the database so as to minimize the space overhead, without destroying the uniformity of the individual samples. We also discuss how to compute Linked Bernoulli Synopses which maximize the effective sampling fraction for a given memory budget. The computation of the optimum solution is often computationally prohibitive so that approximate solutions are needed. We propose a simple heuristic approach which is fast and seems to produce close-to-optimum results in practice. We conclude the paper with an evaluation of our methods on both synthetic and real-world datasets.
extending database technology | 2009
Philipp Rösch; Wolfgang Lehner
With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. Typically, those analytical queries partition the data into groups and aggregate the values within the groups. Further, with the commonly used roll-up and drill-down operations a broad range of group-by queries is posed to the system, which makes the construction of highly-specialized synopses difficult. In this paper, we propose a general-purpose sampling scheme that is biased in order to answer group-by queries with high accuracy. While existing techniques focus on the size of the group when computing its sample size, our technique is based on its standard deviation. The basic idea is that the more homogeneous a group is, the less representatives are required in order to give a good estimate. With an extensive set of experiments, we show that our approach reduces both the estimation error and the construction cost compared to existing techniques.
international conference on management of data | 2006
Anja Klein; Rainer Gemulla; Philipp Rösch; Wolfgang Lehner
Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS.
business intelligence for the real-time enterprises | 2010
Katrin Eisenreich; Gregor Hackenbroich; Volker Markl; Philipp Rösch; Robert Schulze
Enabling experts to not only analyze current and historic data but also to evaluate the impact of decisions on the future state of the business greatly increased the value of decision support. However, the highly relevant aspect of representing and processing uncertain and temporally indeterminate data is often ignored in this context. Although the management of uncertainty has been researched intensely in the last decade, its role in decision support has not attracted much attention. We hold that not considering such information restricts the analyses users can run and the insights they can get into their data. In this paper, we complement large-scale data analyses with support for what-if analyses over uncertain and temporally indeterminate data. We use a histogram-based model to represent arbitrary uncertainty and temporal indeterminacy and allow its processing in a flexible manner using operators for analyzing, deriving, and modifying uncertainty in decision support tasks. We describe a prototypical implementation and approaches for parallelization on a commercial column store and present an initial evaluation of our solution.
statistical and scientific database management | 2013
Robert Lorenz; Lars Dannecker; Philipp Rösch; Wolfgang Lehner; Gregor Hackenbroich; Benjamin Schlegel
Forecasting is an important data analysis technique and serves as the basis for business planning in many application areas such as energy, sales and traffic management. The currently employed statistical models already provide very accurate predictions, but the forecasting calculation process is very time consuming. This is especially true since many application domains deal with hierarchically organized data. Forecasting in these environments is especially challenging due to ensuring forecasting consistency between hierarchy levels, which leads to an increased data processing and communication effort. For this purpose, we introduce our novel hierarchical forecasting approach, where we propose to push forecast models to the entities on the lowest hierarch level and reuse these models to efficiently create forecast models on higher hierarchical levels. With that we avoid the time-consuming parameter estimation process and allow an almost instant calculation of forecasts.
conference on information and knowledge management | 2013
Lars Dannecker; Philipp Rösch; Ulrike Fischer; Gordon Gaumnitz; Wolfgang Lehner; Gregor Hackenbroich
Continuous balancing of energy demand and supply is a fundamental prerequisite for the stability of energy grids and requires accurate forecasts of electricity consumption and production at any point in time. Todays Energy Data Management (EDM) systems already provide accurate predictions, but typically employ a very time-consuming and inflexible forecasting process. However, emerging trends such as intra-day trading and an increasing share of renewable energy sources need a higher forecasting efficiency. Additionally, the wide variety of applications in the energy domain pose different requirements with respect to runtime and accuracy and thus, require flexible control of the forecasting process. To solve this issue, we introduce our novel online forecasting process as part of our EDM system called pEDM. The online forecasting process rapidly provides forecasting results and iteratively refines them over time. Thus, we avoid long calculation times and allow applications to adapt the process to their needs. Our evaluation shows that our online forecasting process offers a very efficient and flexible way of providing forecasts to the requesting applications.
International Journal of Knowledge-Based Organizations archive | 2013
Philipp Rösch; Wolfgang Lehner
The rapid increase of data volumes makes sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatically determine the optimal sample for a given query remained almost unaddressed. To tackle this problem the authors propose a sample advisor based on a novel cost model. Primarily designed for advising samples of a few queries specified by an expert, the authors additionally propose two extensions of the sample advisor. The first extension enhances the applicability by utilizing recorded workload information and taking memory bounds into account. The second extension increases the effectiveness by merging samples in case of overlapping pieces of sample advice. For both extensions, the authors present exact and heuristic solutions. Within their evaluation, the authors analyze the properties of the cost model and demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments.
international conference on data engineering | 2012
Katrin Eisenreich; Jochen Adamek; Philipp Rösch; Volker Markl; Gregor Hackenbroich
Investigating potential dependencies in data and their effect on future business developments can help experts to prevent misestimations of risks and chances. This makes correlation a highly important factor in risk analysis tasks. Previous research on correlation in uncertain data management addressed foremost the handling of dependencies between discrete rather than continuous distributions. Also, none of the existing approaches provides a clear method for extracting correlation structures from data and introducing assumptions about correlation to independently represented data. To enable risk analysis under correlation assumptions, we use an approximation technique based on copula functions. This technique enables analysts to introduce arbitrary correlation structures between arbitrary distributions and calculate relevant measures over thus correlated data. The correlation information can either be extracted at runtime from historic data or be accessed from a parametrically precomputed structure. We discuss the construction, application and querying of approximate correlation representations for different analysis tasks. Our experiments demonstrate the efficiency and accuracy of the proposed approach, and point out several possibilities for optimization.
advances in databases and information systems | 2010
Philipp Rösch; Wolfgang Lehner
The rapid growth of current data warehouse systems makes random sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatic sample selection remained (almost) unaddressed. In this paper, we tackle the problem with a sample advisor. We propose a cost model to evaluate a sample for a given query. Based on this, our sample advisor determines the optimal set of samples for a given set of queries specified by an expert. We further propose an extension to utilize recorded workload information. In this case, the sample advisor takes the set of queries and a given memory bound into account for the computation of a sample advice. Additionally, we consider the merge of samples in case of overlapping sample advice and present both an exact and a heuristic solution. Within our evaluation, we analyze the properties of the cost model and compare the proposed algorithms. We further demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments.