Dirk Habich
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dirk Habich.
data management on new hardware | 2012
Thomas Kissinger; Benjamin Schlegel; Dirk Habich; Wolfgang Lehner
Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates. In this paper, we present the KISS-Tree, a latch-free in-memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint.
international conference on data engineering | 2009
Peter Benjamin Volk; Frank Rosenthal; Martin Hahmann; Dirk Habich; Wolfgang Lehner
The topic of managing uncertain data has been explored in many ways. Different methodologies for data storage and query processing have been proposed. As the availability of management systems grows, the research on analytics of uncertain data is gaining in importance. Similar to the challenges faced in the field of data management, algorithms for uncertain data mining also have a high performance degradation compared to their certain algorithms. To overcome the problem of performance degradation, the MCDB approach was developed for uncertain data management based on the possible world scenario. As this methodology shows significant performance and scalability enhancement, we adopt this method for the field of mining on uncertain data. In this paper, we introduce a clustering methodology for uncertain data and illustrate current issues with this approach within the field of clustering uncertain data.
very large data bases | 2003
Alexander Hinneburg; Dirk Habich; Wolfgang Lehner
Database support for data mining has become an important research topic. Especially for large high-dimensional data volumes, comprehensive support from the database side is necessary. In this paper we identify the data intensive subproblem of aggregating high-dimensional data in all possible low-dimensional projections (for instance estimating low-dimensional histograms), which occurs in several established data mining techniques. Second, we show that existing OLAP SQL-extensions are insufficient for high-dimensional data and propose a new SQL-operator, which seamlessly fits into the set of existing OLAP Group By operators. Third, we propose efficient implementations for the operator, which take the limited resources of main memory into account. We demonstrate on a number of real and synthetic data sets that for the identified subproblem our new implementations yield a large speedup (up to factor 10) over existing methods built in commercially available database systems.
international conference on data engineering | 2008
Matthias Böhm; Dirk Habich; Wolfgang Lehner; Uwe Wloka
So far the optimization of integration processes between heterogeneous data sources is still an open challenge. A first step towards sufficient techniques was the specification of a universal benchmark for integration systems. This DIPBench allows to compare solutions under controlled conditions and would help generate interest in this research area. However, we see the requirement for providing a sophisticated toolsuite in order to minimize the effort for benchmark execution. This demo illustrates the use of the DIPBench toolsuite. We show the macro-architecture as well as the micro-architecture of each tool. Furthermore, we also present the first reference benchmark implementation using a federated DBMS. Thereby, we discuss the impact of the defined benchmark scale factors. Finally, we want to give guidance on how to benchmark other integration systems and how to extend the toolsuite with new distribution functions or other functionalities.
data management on new hardware | 2013
Tomas Karnagel; Dirk Habich; Benjamin Schlegel; Wolfgang Lehner
Upcoming processors are combining different computing units in a tightly-coupled approach using a unified shared memory hierarchy. This tightly-coupled combination leads to novel properties with regard to cooperation and interaction. This paper demonstrates the advantages of those processors for a stream-join operator as an important data-intensive example. In detail, we propose our HELLS-Join approach employing all heterogeneous devices by outsourcing parts of the algorithm on the appropriate device. Our HELLS-Join performs better than CPU stream joins, allowing wider time windows, higher stream frequencies, and more streams to be joined as before.
international conference on cloud computing | 2010
Dirk Habich; Wolfgang Lehner; Sebastian Richly; Uwe Assmann
The role of data analytics increases in several application domains to cope with the large amount of captured data. Generally, data analytics are data-intensive processes, whose efficient execution is a challenging task. Each process consists of a collection of related structured activities, where huge data sets have to be exchanged between several loosely coupled services. The implementation of such processes in a service-oriented environment offers some advantages, but the efficient realization of data flows is difficult. Therefore, we use this paper to propose a novel SOA-aware approach with a special focus on the data flow. The tight interaction of new cloud technologies with SOA technologies enables us to optimize the execution of data-intensive service applications by reducing the data exchange tasks to a minimum. Fundamentally, our core concept to optimize the data flows is found in data clouds. Moreover, we can exploit our approach to derive efficient process execution strategies regarding different optimization objectives for the data flows.
acm symposium on applied computing | 2006
Dirk Habich; Thomas Wächter; Wolfgang Lehner; Christian Pilarsky
In the context of genome research, the method of gene expression analysis has been used for several years. Related microarray experiments are conducted all over the world, and consequently, a vast amount of microarray data sets are produced. Having access to this variety of repositories, researchers would like to incorporate this data in their analyses to increase the statistical significance of their results. In this paper, we present a new two-phase clustering strategy which is based on the combination of local clustering results to obtain a global clustering. The advantage of such a technique is that each microarray data set can be normalized and clustered separately. The set of different relevant local clustering results is then used to calculate the global clustering result. Furthermore, we present an approach based on technical as well as biological quality measures to determine weighting factors for quantifying the local results proportion within the global result. The better the attested quality of the local results, the stronger their impact on the global result.
ieee congress on services | 2008
Dirk Habich; Sebastian Richly; Andreas Ruempel; Wolfgang Buecke; Steffen Preissler
Service-oriented architectures (SOA) have revolutionized software-engineering over the last years. As a result, many domains have begun to switch to this architecture style. However, to efficiently support different domains from an SOA perspective, several extensions have to be included, e.g., concepts for efficient data transfer or runtime adaptation of processes. On the one hand, our Open Service Process Platform 2.0 is a central SOA infrastructure including several extensions. On the other hand, this platform can be used as the basis for the development of further extensions. The unique features of our platform are: (i) orchestration and execution of processes in an easy way (ii) arbitrary extensibility with regard to simple specialization for various domains, (iii) central infrastructure within an organization, and (iv) full accessibility through Web 2.0 technologies.
international conference on data mining | 2007
Dirk Habich; Peter Benjamin Volk; Wolfgang Lehner; Ralf Dittmann; Clemens Utzny
Manufacturing process development is under constant pressure to achieve a good yield for stable processes. The development of new technologies, especially in the field of photomask and semiconductor development, is at its phys- ical limits. In this area, data, e.g. sensor data, has to be collected and analyzed for each process in order to ensure process quality. With increasing complexity of manufactur- ing processes, the volume of data that has to be evaluated rises accordingly. The complexity and data volume exceeds the possibility of a manual data analysis. At this point, data mining techniques become interesting. The application of current techniques is complex because most of the data is captured with sensor measurement tools. Therefore, every measured value contains a specific error. In this paper we propose an error-aware extension of the density-based al- gorithm DBSCAN. Furthermore, we present some quality measures which could be utilized for further interpretation of the determined clustering results. With this new cluster algorithm, we can ensure that masks are classified into the correct cluster with respect to the measurement errors, thus ensuring a more likely correlation between the masks.
international conference on data mining | 2007
Wael Emara; Mehmed Kantardzic Marcel Karnstedt; Kai-Uwe Sattler; Dirk Habich; Wolfgang Lehner
In this paper we propose an approach for incremental learning of semi-supervised SVM. The proposed approach makes use of the locality of radial basis function kernels to do local and incremental training of semi-supervised support vector machines. The algorithm introduces a se- quential minimal optimization based implementation of the branch and bound technique for training semi-supervised SVM problems. The novelty of our approach lies in the