Tilmann Rabl
Technical University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tilmann Rabl.
very large data bases | 2012
Tilmann Rabl; Sergio Gómez-Villamor; Mohammad Sadoghi; Victor Muntés-Mulero; Hans-Arno Jacobsen; Serge Mankovskii
As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case. In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.
tpc technology conference | 2010
Tilmann Rabl; Michael Frank; Hatem Mousselly Sergieh; Harald Kosch
In many fields of research and business data sizes are breaking the petabyte barrier. This imposes new problems and research possibilities for the database community. Usually, data of this size is stored in large clusters or clouds. Although clouds have become very popular in recent years, there is only little work on benchmarking cloud applications. In this paper we present a data generator for cloud sized applications. Its architecture makes the data generator easy to extend and to configure. A key feature is the high degree of parallelism that allows linear scaling for arbitrary numbers of nodes. We show how distributions, relationships and dependencies in data can be computed in parallel with linear speed up.
Technology Conference on Performance Evaluation and Benchmarking | 2012
Chaitanya K. Baru; Milind Bhandarkar; Raghunath Nambiar; Meikel Poess; Tilmann Rabl
The Workshop on Big Data Benchmarking (WBDB2012), held on May 8-9, 2012 in San Jose, CA, served as an incubator for several promising approaches to define a big data benchmark standard for industry. Through an open forum for discussions on a number of issues related to big data benchmarking—including definitions of big data terms, benchmark processes and auditing — the attendees were able to extend their own view of big data benchmarking as well as communicate their own ideas, which ultimately led to the formation of small working groups to continue collaborative work in this area. In this paper, we summarize the discussions and outcomes from this first workshop, which was attended by about 60 invitees representing 45 different organizations, including industry and academia. Workshop attendees were selected based on their experience and expertise in the areas of management of big data, database systems, performance benchmarking, and big data applications. There was consensus among participants about both the need and the opportunity for defining benchmarks to capture the end-to-end aspects of big data applications. Following the model of TPC benchmarks, it was felt that big data benchmarks should not only include metrics for performance, but also price/performance, along with a sound foundation for fair comparison through audit mechanisms. Additionally, the benchmarks should consider several costs relevant to big data systems including total cost of acquisition, setup cost, and the total cost of ownership, including energy cost. The second Workshop on Big Data Benchmarking will be held in December 2012 in Pune, India, and the third meeting is being planned for July 2013 in Xi’an, China.
very large data bases | 2014
Meikel Poess; Tilmann Rabl; Hans-Arno Jacobsen; Brian K. Caufield
Historically, the process of synchronizing a decision support system with data from operational systems has been referred to as Extract, Transform, Load (ETL) and the tools supporting such process have been referred to as ETL tools. Recently, ETL was replaced by the more comprehensive acronym, data integration (DI). DI describes the process of extracting and combining data from a variety of data source formats, transforming that data into a unified data model representation and loading it into a data store. This is done in the context of a variety of scenarios, such as data acquisition for business intelligence, analytics and data warehousing, but also synchronization of data between operational applications, data migrations and conversions, master data management, enterprise data sharing and delivery of data services in a service-oriented architecture context, amongst others. With these scenarios relying on up-to-date information it is critical to implement a highly performing, scalable and easy to maintain data integration system. This is especially important as the complexity, variety and volume of data is constantly increasing and performance of data integration systems is becoming very critical. Despite the significance of having a highly performing DI system, there has been no industry standard for measuring and comparing their performance. The TPC, acknowledging this void, has released TPC-DI, an innovative benchmark for data integration. This paper motivates the reasons behind its development, describes its main characteristics including workload, run rules, metric, and explains key decisions.
Technology Conference on Performance Evaluation and Benchmarking | 2014
Chaitanya K. Baru; Milind Bhandarkar; Carlo Curino; Manuel Danisch; Michael Frank; Bhaskar Gowda; Hans-Arno Jacobsen; Huang Jie; Dileep Kumar; Raghunath Nambiar; Meikel Poess; Francois Raab; Tilmann Rabl; Nishkam Ravi; Kai Sachs; Saptak Sen; Lan Yi; Choonhan Youn
Enterprises perceive a huge opportunity in mining information that can be found in big data. New storage systems and processing paradigms are allowing for ever larger data sets to be collected and analyzed. The high demand for data analytics and rapid development in technologies has led to a sizable ecosystem of big data processing systems. However, the lack of established, standardized benchmarks makes it difficult for users to choose the appropriate systems that suit their requirements. To address this problem, we have developed the BigBench benchmark specification. BigBench is the first end-to-end big data analytics benchmark suite. In this paper, we present the BigBench benchmark and analyze the workload from technical as well as business point of view. We characterize the queries in the workload along different dimensions, according to their functional characteristics, and also analyze their runtime behavior. Finally, we evaluate the suitability and relevance of the workload from the point of view of enterprise applications, and discuss potential extensions to the proposed specification in order to cover typical big data processing use cases.
international conference on performance engineering | 2013
Tilmann Rabl; Meikel Poess; Hans-Arno Jacobsen; Patrick E. O'Neil; Elizabeth J. O'Neil
The Star Schema Benchmark (SSB), now in its third revision, has been widely used to evaluate the performance of database management systems when executing star schema queries. SSB, based on the well known industry standard benchmark TPC-H, shares some of its drawbacks, most notably, its uniform data distributions. Todays systems rely heavily on sophisticated cost-based query optimizers to generate the most efficient query execution plans. A benchmark that evaluates optimizers capability to generate optimal execution plans under all circumstances must provide the rich data set details on which optimizers rely (uniform and non-uniform distributions, data sparsity, etc.). This is also true for other database system parts, such as indices and operators, and ultimately holds for an end-to-end benchmark as well. SSBs data generator, based on TPC-Hs dbgen, is not easy to adapt to different data distributions as its meta data and actual data generation implementations are not separated. In this paper, we motivate the need for a new revision of SSB that includes non-uniform data distributions. We list what specific modifications are required to SSB to implement non-uniform data sets and we demonstrate how to implement these modifications in the Parallel Data Generator Framework to generate both the data and query sets.
international conference on performance engineering | 2012
Michael Frank; Meikel Poess; Tilmann Rabl
It is without doubt that industry standard benchmarks have been proven to be crucial to the innovation and productivity of the computing industry. They are important to the fair and standardized assessment of performance across different vendors, different system versions from the same vendor and across different architectures. Good benchmarks are even meant to drive industry and technology forward. Since at some point, after all reasonable advances have been made using a particular benchmark even good benchmarks become obsolete over time. This is why standard consortia periodically overhaul their existing benchmarks or develop new benchmarks. An extremely time and resource consuming task in the creation of new benchmarks is the development of benchmark generators, especially because benchmarks tend to become more and more complex. The first version of the Parallel Data Generation Framework (PDGF), a generic data generator, was capable of generating data for the initial load of arbitrary relational schemas. It was, however, not able to generate data for the actual workload, i.e. input data for transactions (insert, delete and update), incremental load etc., mainly because it did not understand the notion of updates. Updates are data changes that occur over time, e.g. a customer changes address, switches job, gets married or has children. Many benchmarks, need to reflect these changes during their workloads. In this paper we present PDGF Version 2, which contains extensions enabling the generation of update data.
international workshop on testing database systems | 2011
Tilmann Rabl; Meikel Poess
The exponential growth in the amount of data retained by todays systems is fostered by a recent paradigm shift towards cloud computing and the vast deployment of data-hungry applications, such as social media sites. At the same time systems are capturing more sophisticated data. Running realistic benchmarks to test the performance and robustness of these applications is becoming increasingly difficult, because of the amount of data that needs to be generated, the number of systems that need to generate the data and the complex structure of the data. These three reasons are intrinsically connected. Whenever large amounts of data are needed, its generation process needs to be highly parallel, in many cases across-systems. Since the structure of the data is becoming more and more complex, its parallel generation is extremely challenging. Over the years there have been many papers about data generators, but there has not been a comprehensive overview of the requirements of todays data generators covering the most complex problems to be solved. In this paper we present such an overview by analyzing the requirements of todays data generators and either explaining how the problems have been solved in existing data generators, or showing why the problems have not been solved yet.
international conference on data engineering | 2014
Prashanth Menon; Tilmann Rabl; Mohammad Sadoghi; Hans-Arno Jacobsen
With the ever growing size and complexity of enterprise systems there is a pressing need for more detailed application performance management. Due to the high data rates, traditional database technology cannot sustain the required performance. Alternatives are the more lightweight and, thus, more performant key-value stores. However, these systems tend to sacrifice read performance in order to obtain the desired write throughput by avoiding random disk access in favor of fast sequential accesses. With the advent of SSDs, built upon the philosophy of no moving parts, the boundary between sequential vs. random access is now becoming blurred. This provides a unique opportunity to extend the storage memory hierarchy using SSDs in key-value stores. In this paper, we extensively evaluate the benefits of using SSDs in commercialized key-value stores. In particular, we investigate the performance of hybrid SSD-HDD systems and demonstrate the benefits of our SSD caching and our novel dynamic schema model.
Multimedia Tools and Applications | 2012
Marco Anisetti; Claudio Agostino Ardagna; Valerio Bellandi; Ernesto Damiani; Mario Döller; Florian Stegmaier; Tilmann Rabl; Harald Kosch; Lionel Brunie
Modern mobile devices integrating sensors, like accelerometers and cameras, are paving the way to the definition of high-quality and accurate geolocation solutions based on the informations acquired by these sensors, and data collected and managed by GSM/3G networks. In this paper, we present a technique that provides geolocation and mobility prediction of mobile devices, mixing the location information acquired with the GSM/3G infrastructure and the results of a landmark matching achieved thanks to the camera integrated on the mobile devices. Our geolocation approach is based on an advanced Time-Forwarding algorithm and on database correlation technique over Received Signal Strength Indication (RSSI) data, and integrates information produced by a landmark recognition infrastructure, to enhance algorithm performances in those areas with poor signal and low accurate geolocation. Performances of the algorithm are evaluated on real data from a complex urban environment.