Is this you? Create Your Porfile

Tim Kiefer

Dresden University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tim Kiefer is active.

Explore More

Publication

Featured researches published by Tim Kiefer.

data management on new hardware | 2013

Scalable frequent itemset mining on many-core processors

Benjamin Schlegel; Tomas Karnagel; Tim Kiefer; Wolfgang Lehner

Frequent-itemset mining is an essential part of the association rule mining process, which has many application areas. It is a computation and memory intensive task with many opportunities for optimization. Many efficient sequential and parallel algorithms were proposed in the recent years. Most of the parallel algorithms, however, cannot cope with the huge number of threads that are provided by large multiprocessor or many-core systems. In this paper, we provide a highly parallel version of the well-known Eclat algorithm. It runs on both, multiprocessor systems and many-core coprocessors, and scales well up to a very large number of threads---244 in our experiments. To evaluate mcEclats performance, we conducted many experiments on realistic datasets. mcEclat achieves high speedups of up to 11.5x and 100x on a 12-core multiprocessor system and a 61-core Xeon Phi many-core coprocessor, respectively. Furthermore, mcEclat is competitive with highly optimized existing frequent-itemset mining implementations taken from the FIMI repository.

utility and cloud computing | 2011

Private Table Database Virtualization for DBaaS

Tim Kiefer; Wolfgang Lehner

Growing number of applications store data in relational databases. Moving database applications to the cloud faces challenges related to flexible and scalable management of data. The obvious strategy of hosting legacy database management systems (DMBSs) on virtualized cloud resources leads to sub optimal utilization and performance. However, the layered architecture inside the DBMS allows for virtualization and consolidation above the OS level which can lead to significantly better system utilization and application performance. Finding an optimal database cloud solution requires finding an assignment from virtual to physical resources as well as configurations for all components. Our goal is to provide a virtualization advisor that aids in setting up and operating a database cloud. By formulating analytic cost, workload, and resource models performance of cloud-hosted relational database services can be significantly improved.

high performance distributed computing | 2010

Pairwise Element Computation with MapReduce

Tim Kiefer; Peter Benjamin Volk; Wolfgang Lehner

In this paper, we present a parallel method to evaluate functions on pairs of elements. It is a challenge to partition the Cartesian product of a set with itself in order to parallelize the function evaluation on all pairs. Our solution uses (a) replication of set elements to allow for partitioning and (b) aggregation of the results gathered for different copies of an element. Based on an execution model with nodes that execute tasks on local data without online communication, we present a generic algorithm and show how it can be implemented with MapReduce. Three different distribution schemes that define the partitioning of the Cartesian product are introduced, compared, and evaluated. Any one of the distribution schemes can be used to derive and implement a specific algorithm for parallel pairwise element computation.

Technology Conference on Performance Evaluation and Benchmarking | 2012

MulTe: A Multi-Tenancy Database Benchmark Framework

Tim Kiefer; Benjamin Schlegel; Wolfgang Lehner

Multi-tenancy in relational databases has been a topic of interest for a couple of years. On the one hand, ever increasing capabilities and capacities of modern hardware easily allow for multiple database applications to share one system. On the other hand, cloud computing leads to outsourcing of many applications to service architectures, which in turn leads to offerings for relational databases in the cloud, as well.

conference on information and knowledge management | 2009

Generating SQL/XML query and update statements

Matthias Nicola; Tim Kiefer

The XML support in relational databases and the SQL/XML language are still relatively new as compared to purely relational databases and traditional SQL. Today, most database users have a strong relational and SQL background. SQL/XML enables users to perform queries and updates across XML and relational data, but many struggle with writing SQL/XML statements or XQuery update expressions. One reason is the novelty of SQL/XML and of the XQuery expressions that must be included. Another problem is that the tree structure of the XML data may be unknown or difficult to understand for the user. Evolving XML Schemas as well as hybrid XML/relational schemas make it even harder to write SQL/XML statements. Also, legacy applications use SQL but may require access to XML data without costly code changes. Motivated by these challenges, we developed a method to generate SQL/XML query and update statements automatically. The input is either a GUI or a regular SQL statement that uses logical data item names irrespective of their actual location in relational or XML columns in the database. The output is a SQL/XML statement that queries or updates relational and XML data as needed to carry out the original user statement. This relieves the user and simplifies schema evolution and integration. We have prototyped and tested the proposed method on top of DB2 9.5.

Technology Conference on Performance Evaluation and Benchmarking | 2014

A Query, a Minute: Evaluating Performance Isolation in Cloud Databases

Tim Kiefer; Hendrik Schön; Dirk Habich; Wolfgang Lehner

Several cloud providers offer reltional databases as part of their portfolio. It is however not obvious how resource virtualization and sharing, which is inherent to cloud computing, influence performance and predictability of these cloud databases.

international conference on management of data | 2014

ERIS live: a NUMA-aware in-memory storage engine for tera-scale multiprocessor systems

Tim Kiefer; Thomas Kissinger; Benjamin Schlegel; Dirk Habich; Daniel Molka; Wolfgang Lehner

The ever-growing demand for more computing power forces hardware vendors to put an increasing number of multiprocessors into a single server system, which usually exhibits a non-uniform memory access (NUMA). In-memory database systems running on NUMA platforms face several issues such as the increased latency and the decreased bandwidth when accessing remote main memory. To cope with these NUMA-related issues, a DBMS has to allow flexible data partitioning and data placement at runtime. In this demonstration, we present ERIS, our NUMA-aware in-memory storage engine. ERIS uses an adaptive partitioning approach that exploits the topology of the underlying NUMA platform and significantly reduces NUMA-related issues. We demonstrate throughput numbers and hardware performance counter evaluations of ERIS and a NUMA-unaware index for different workloads and configurations. All experiments are conducted on a standard server system as well as on a system consisting of 64 multiprocessors, 512 cores, and 8 TBs main memory.

statistical and scientific database management | 2013

pcApriori: scalable apriori for multiprocessor systems

Benjamin Schlegel; Tim Kiefer; Thomas Kissinger; Wolfgang Lehner

Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores.

data warehousing and olap | 2009

Cardinality estimation in ETL processes

Maik Thiele; Tim Kiefer; Wolfgang Lehner

The cardinality estimation in ETL processes is particularly difficult. Aside from the well-known SQL operators, which are also used in ETL processes, there are a variety of operators without exact counterparts in the relational world. In addition to those, we find operators that support very specific data integration aspects. For such operators, there are no well-examined statistic approaches for cardinality estimations. Therefore, we propose a black-box approach and estimate the cardinality using a set of statistic models for each operator. We discuss different model granularities and develop an adaptive cardinality estimation framework for ETL processes. We map the abstract model operators to specific statistic learning approaches (regression, decision trees, support vector machines, etc.) and evaluate our cardinality estimations in an extensive experimental study.

privacy in statistical databases | 2016

A Rule-Based Approach to Local Anonymization for Exclusivity Handling in Statistical Databases

Jens Albrecht; Marc Fiedler; Tim Kiefer

Statistical databases in general and data warehouses in particular are used to analyze large amounts of business data in pre-defined as well as ad-hoc reports. Operators of statistical databases must ensure that individual sensitive data, e.g., personal data, medical data, or business-critical data, are not revealed to unprivileged users while making use of these data in aggregates. Business rules must be defined and enforced to prevent disclosure. The unsupervised nature of ad-hoc reports, defined by the user and unknown to the database operator upfront, adds to the complexity of keeping data secure. Storing sensitive data in statistical databases demands automated methods to prevent direct or indirect disclosure of such sensitive data.

Explore More