Ralph Krieger | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ralph Krieger is active.

Explore More

Publication

Featured researches published by Ralph Krieger.

international conference on data mining | 2007

DUSC: Dimensionality Unbiased Subspace Clustering

Ira Assent; Ralph Krieger; Emmanuel Müller; Thomas Seidl

To gain insight into todays large data resources, data mining provides automatic aggregation techniques. Clustering aims at grouping data such that objects within groups are similar while objects in different groups are dissimilar. In scenarios with many attributes or with noise, clusters are often hidden in subspaces of the data and do not show up in the full dimensional space. For these applications, subspace clustering methods aim at detecting clusters in any sub- space. Existing subspace clustering approaches fall prey to an effect we call dimensionality bias. As dimensionality of subspaces varies, approaches which do not take this effect into account fail to separate clusters from noise. We give a formal definition of dimensionality bias and analyze consequences for subspace clustering. A dimensionality unbiased subspace clustering (DUSC) definition based on statistical foundations is proposed. In thorough experiments on synthetic and real world data, we show that our approach outperforms existing subspace clustering algorithms.

international conference on data mining | 2008

INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy

Ira Assent; Ralph Krieger; Emmanuel Müller; Thomas Seidl

Subspace clustering aims at detecting clusters in any subspace projection of a high dimensional space. As the number of projections is exponential in the number of dimensions, efficiency is crucial. Moreover, the resulting subspace clusters are often highly redundant, i.e. many clusters are detected multiply in several projections. We propose a novel index for efficient subspace clustering in a novel depth-first processing with in-process-removal of redundant clusters for better pruning. Thorough experiments on real and synthetic data show that INSCY yields substantial efficiency and quality improvements.

international conference on data mining | 2009

Relevant Subspace Clustering: Mining the Most Interesting Non-redundant Concepts in High Dimensional Data

Emmanuel Müller; Ira Assent; Stephan Günnemann; Ralph Krieger; Thomas Seidl

Subspace clustering aims at detecting clusters in any subspace projection of a high dimensional space. As the number of possible subspace projections is exponential in the number of dimensions, the result is often tremendously large. Recent approaches fail to reduce results to relevant subspace clusters. Their results are typically highly redundant, i.e. many clusters are detected multiple times in several projections. In this work, we propose a novel model for relevant subspace clustering (RESCU). We present a global optimization which detects the most interesting non-redundant subspace clusters. We prove that computation of this model is NP-hard. For RESCU, we propose an approximative solution that shows high accuracy with respect to our relevance model. Thorough experiments on synthetic and real world data show that RESCU successfully reduces the result to manageable sizes. It reliably achieves top clustering quality while competing approaches show greatly varying performance.

extending database technology | 2008

The TS-tree: efficient time series search and retrieval

Ira Assent; Ralph Krieger; Farzad Afschari; Thomas Seidl

Continuous growth in sensor data and other temporal data increases the importance of retrieval and similarity search in time series data. Efficient time series query processing is crucial for interactive applications. Existing multidimensional indexes like the R-tree provide efficient querying for only relatively few dimensions. Time series are typically long which corresponds to extremely high dimensional data in multidimensional indexes. Due to massive overlap of index descriptors, multidimensional indexes degenerate for high dimensions and access the entire data by random I/O. Consequently, the efficiency benefits of indexing are lost. In this paper, we propose the TS-tree (time series tree), an index structure for efficient time series retrieval and similarity search. Exploiting inherent properties of time series quantization and dimensionality reduction, the TS-tree indexes high-dimensional data in an overlap-free manner. During query processing, powerful pruning via quantized separator and meta data information greatly reduces the number of pages which have to be accessed, resulting in substantial speed-up. In thorough experiments on synthetic and real world time series data we demonstrate that our TS-tree outperforms existing approaches like the R*-tree or the quantized A-tree.

Sigkdd Explorations | 2007

VISA: visual subspace clustering analysis

Ira Assent; Ralph Krieger; Emmanuel Müller; Thomas Seidl

To gain insight into todays large data resources, data mining extracts interesting patterns. To generate knowledge from patterns and benefit from human cognitive abilities, meaningful visualization of patterns are crucial. Clustering is a data mining technique that aims at grouping data to patterns based on mutual (dis)similarity. For high dimensional data, subspace clustering searches patterns in any subspace of the attributes as patterns are typically obscured by many irrelevant attributes in the full space. For visual analysis of subspace clusters, their comparability has to be ensured. Existing subspace clustering approaches, however, lack interactive visualization and show bias with respect to the dimensionality of subspaces. In this work, dimensionality unbiased subspace clustering and a novel distance function for subspace clusters are proposed. We suggest two visualization techniques that allow users to browse the entire subspace clustering, to zoom into individual objects, and to analyze subspace cluster characteristics in-depth. Bracketing of different parameter settings enable users to immediately see the effect of parameters on their data and hence to choose the best clustering result for further analysis. Usage of user analysis for feedback to the subspace clustering algorithm directly improves the subspace clustering. We demonstrate our visualization techniques on real world data and confirm results through additional accuracy measurements and comparison with existing subspace clustering algorithms.

very large data bases | 2009

Anticipatory DTW for efficient similarity search in time series databases

Ira Assent; Marc Wichterich; Ralph Krieger; Hardy Kremer; Thomas Seidl

Time series arise in many different applications in the form of sensor data, stocks data, videos, and other time-related information. Analysis of this data typically requires searching for similar time series in a database. Dynamic Time Warping (DTW) is a widely used high-quality distance measure for time series. As DTW is computationally expensive, efficient algorithms for fast computation are crucial. In this paper, we propose a novel filter-and-refine DTW algorithm called Anticipatory DTW. Existing algorithms aim at efficiently finding similar time series by filtering the database and computing the DTW in the refinement step. Unlike these algorithms, our approach exploits previously unused information from the filter step during the refinement, allowing for faster rejection of false candidates. We characterize a class of applicable filters for our approach, which comprises state-of-the-art lower bounds of the DTW. Our novel anticipatory pruning incurs hardly any over-head and no false dismissals. We demonstrate substantial efficiency improvements in thorough experiments on synthetic and real world time series databases and show that our technique is highly scalable to multivariate, long time series and wide DTW bands.

extending database technology | 2009

Indexing density models for incremental learning and anytime classification on data streams

Thomas Seidl; Ira Assent; Philipp Kranen; Ralph Krieger; Jennifer Herrmann

Classification of streaming data faces three basic challenges: it has to deal with huge amounts of data, the varying time between two stream data items must be used best possible (anytime classification) and additional training data must be incrementally learned (anytime learning) for applying the classifier consistently to fast data streams. In this work, we propose a novel index-based technique that can handle all three of the above challenges using the established Bayes classifier on effective kernel density estimators. Our novel Bayes tree automatically generates (adapted efficiently to the individual object to be classified) a hierarchy of mixture densities that represent kernel density estimators at successively coarser levels. Our probability density queries together with novel classification improvement strategies provide the necessary information for very effective classification at any point of interruption. Moreover, we propose a novel evaluation method for anytime classification using Poisson streams and demonstrate the anytime learning performance of the Bayes tree.

knowledge discovery and data mining | 2008

Morpheus: interactive exploration of subspace clustering

Emmanuel Müller; Ira Assent; Ralph Krieger; Timm Jansen; Thomas Seidl

Data mining techniques extract interesting patterns out of large data resources. Meaningful visualization and interactive exploration of patterns are crucial for knowledge discovery. Visualization techniques exist for traditional clustering in low dimensional spaces. In high dimensional data, clusters typically only exist in subspace projections. This subspace clustering, however, lacks interactive visualization tools. Challenges arise from typically large result sets in different subspace projections that hinder comparability, visualization and understandability. In this work, we describe Morpheus, a tool that supports the knowledge discovery process through visualization and interactive exploration of subspace clusterings. Users may browse an overview of the entire subspace clustering, analyze subspace cluster characteristics in-depth and zoom into object groupings. Bracketing of different parameter settings enables users to immediately see the effects of parameters and to provide feedback to further improve the subspace clustering. Furthermore, Morpheus may serve as a teaching and exploration tool for the data mining community to visually assess different subspace clustering paradigms.

conference on information and knowledge management | 2008

EDSC: efficient density-based subspace clustering

Ira Assent; Ralph Krieger; Emmanuel Müller; Thomas Seidl

Subspace clustering mines clusters hidden in subspaces of high-dimensional data sets. Density-based approaches have been shown to successfully mine clusters of arbitrary shape even in the presence of noise in full space clustering. Exhaustive search of all density-based subspace clusters, however, results in infeasible runtimes for large high-dimensional data sets. This is due to the exponential number of possible subspace projections in addition to the high computational cost of density-based clustering. In this paper, we propose lossless efficient detection of density-based subspace clusters. In our EDSC (efficient density-based subspace clustering) algorithm we reduce the high computational cost of density-based subspace clustering by a complete multistep filter-and-refine algorithm. Our first hypercube filter step avoids exhaustive search of all regions in all subspaces by enclosing potentially density-based clusters in hypercubes. Our second filter step provides additional pruning based on a density monotonicity property. In the final refinement step, the exact unbiased density-based subspace clustering result is detected. As we prove that pruning is lossless in both filter steps, we guarantee completeness of the result. In thorough experiments on synthetic and real world data sets, we demonstrate substantial efficiency gains. Our lossless EDSC approach outperforms existing density-based subspace clustering algorithms by orders of magnitude.

very large data bases | 2003

Efficient Structure Oriented Storage of XML Documents Using ORDBMS

Alexander Kuckelberg; Ralph Krieger

In this paper we will present different storage approaches for XML documents, the document centered, the data and the structure centered approach. We will then focus on the structure centered approach and will introduce an abstract view on XML documents using tree graphs. To make these tree graphs persistent different storage techniques are mentioned and evaluated concerning the creation and retrieval of complete documents. Measurement results are shown and shortly discussed for creation and retrieval of different (complete) XML documents. Moreover we will shortly introduce the partial mapping extension, which helps to optimize the generic structure based storage approach for specific documents whose structure is known in advance.The results presented come from the ongoing implementation of a high performance generic document server with an analytic decision support agent.

Explore More