Hardy Kremer
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hardy Kremer.
knowledge discovery and data mining | 2011
Hardy Kremer; Philipp Kranen; Timm Jansen; Thomas Seidl; Albert Bifet; Geoff Holmes; Bernhard Pfahringer
Due to the ever growing presence of data streams, there has been a considerable amount of research on stream mining algorithms. While many algorithms have been introduced that tackle the problem of clustering on evolving data streams, hardly any attention has been paid to appropriate evaluation measures. Measures developed for static scenarios, namely structural measures and ground-truth-based measures, cannot correctly reflect errors attributable to emerging, splitting, or moving clusters. These situations are inherent to the streaming context due to the dynamic changes in the data distribution. In this paper we develop a novel evaluation measure for stream clustering called Cluster Mapping Measure (CMM). CMM effectively indicates different types of errors by taking the important properties of evolving data streams into account. We show in extensive experiments on real and synthetic data that CMM is a robust measure for stream clustering evaluation.
very large data bases | 2009
Ira Assent; Marc Wichterich; Ralph Krieger; Hardy Kremer; Thomas Seidl
Time series arise in many different applications in the form of sensor data, stocks data, videos, and other time-related information. Analysis of this data typically requires searching for similar time series in a database. Dynamic Time Warping (DTW) is a widely used high-quality distance measure for time series. As DTW is computationally expensive, efficient algorithms for fast computation are crucial. In this paper, we propose a novel filter-and-refine DTW algorithm called Anticipatory DTW. Existing algorithms aim at efficiently finding similar time series by filtering the database and computing the DTW in the refinement step. Unlike these algorithms, our approach exploits previously unused information from the filter step during the refinement, allowing for faster rejection of false candidates. We characterize a class of applicable filters for our approach, which comprises state-of-the-art lower bounds of the DTW. Our novel anticipatory pruning incurs hardly any over-head and no false dismissals. We demonstrate substantial efficiency improvements in thorough experiments on synthetic and real world time series databases and show that our technique is highly scalable to multivariate, long time series and wide DTW bands.
european conference on machine learning | 2011
Albert Bifet; Geoff Holmes; Bernhard Pfahringer; Jesse Read; Philipp Kranen; Hardy Kremer; Timm Jansen; Thomas Seidl
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problems of scaling up the implementation of state of the art algorithms to real world dataset sizes and of making algorithms comparable in benchmark streaming settings. It contains a collection of offline and online algorithms for classification, clustering and graph mining as well as tools for evaluation. For researchers the framework yields insights into advantages and disadvantages of different approaches and allows for the creation of benchmark streaming data sets through stored, shared and repeatable settings for the data feeds. Practitioners can use the framework to easily compare algorithms and apply them to real world data sets and settings. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis. Besides providing algorithms and measures for evaluation and comparison, MOA is easily extensible with new contributions and allows for the creation of benchmark scenarios.
international conference on data mining | 2010
Philipp Kranen; Hardy Kremer; Timm Jansen; Thomas Seidl; Albert Bifet; Geoff Holmes; Bernhard Pfahringer
In todays applications, evolving data streams are ubiquitous. Stream clustering algorithms were introduced to gain useful knowledge from these streams in real-time. The quality of the obtained clusterings, i.e. how good they reflect the data, can be assessed by evaluation measures. A multitude of stream clustering algorithms and evaluation measures for clusterings were introduced in the literature, however, until now there is no general tool for a direct comparison of the different algorithms or the evaluation measures. In our demo, we present a novel experimental framework for both tasks. It offers the means for extensive evaluation and visualization and is an extension of the Massive Online Analysis (MOA) software environment released under the GNU GPL License.
extending database technology | 2011
Stephan Günnemann; Hardy Kremer; Dominik Lenhard; Thomas Seidl
Fast similarity search in high dimensional feature spaces is crucial in todays applications. Since the performance of traditional index structures degrades with increasing dimensionality, concepts were developed to cope with this curse of dimensionality. Most of the existing concepts exploit global correlations between dimensions to reduce the dimensionality of the feature space. In high dimensional data, however, correlations are often locally constrained to a subset of the data and every object can participate in several of these correlations. Accordingly, discarding the same set of dimensions for each object based on global correlations and ignoring the different correlations of single objects leads to significant loss of information. These aspects are relevant due to the direct correspondence between the degree of information preserved and the achievable query performance. We introduce a novel main memory index structure with increased information content for each single object compared to a global approach. This is achieved by using individual dimensions for each data object by applying the method of subspace clustering. The structure of our index is based on a multi-representation of objects reflecting their multiple correlations; that is, besides the general increase of information per object, we provide several individual representations for each single data object. These multiple views correspond to different local reductions per object and enable more effective pruning. In thorough experiments on real and synthetic data, we demonstrate that our novel solution achieves low query times and outperforms existing approaches designed for high dimensional data.
database systems for advanced applications | 2012
Philipp Kranen; Hardy Kremer; Timm Jansen; Thomas Seidl; Albert Bifet; Geoff Holmes; Bernhard Pfahringer; Jesse Read
Massive Online Analysis (MOA) is a software framework that provides algorithms and evaluation methods for mining tasks on evolving data streams. In addition to supervised and unsupervised learning, MOA has recently been extended to support multi-label classification and graph mining. In this demonstrator we describe the main features of MOA and present the newly added methods for outlier detection on streaming data. Algorithms can be compared to established baseline methods such as LOF and ABOD using standard ranking measures including Spearman rank coefficient and the AUC measure. MOA is an open source project and videos as well as tutorials are publicly available on the MOA homepage.
Data Mining and Knowledge Discovery | 2012
Stephan Günnemann; Hardy Kremer; Charlotte Laufkötter; Thomas Seidl
Analysis of temporal climate data is an active research area. Advanced data mining methods designed especially for these temporal data support the domain expert’s pursuit to understand phenomena as the climate change, which is crucial for a sustainable world. Important solutions for mining temporal data are cluster tracing approaches, which are used to mine temporal evolutions of clusters. Generally, clusters represent groups of objects with similar values. In a temporal context like tracing, similar values correspond to similar behavior in one snapshot in time. Each cluster can be interpreted as a behavior type and cluster tracing corresponds to tracking similar behaviors over time. Existing tracing approaches are for datasets satisfying two specific conditions: The clusters appear in all attributes, i.e., fullspace clusters, and the data objects have unique identifiers. These identifiers are used for tracking clusters by measuring the number of objects two clusters have in common, i.e. clusters are traced based on similar object sets. These conditions, however, are strict: First, in complex data, clusters are often hidden in individual subsets of the dimensions. Second, mapping clusters based on similar objects sets does not reflect the idea of tracing similar behavior types over time, because similar behavior can even be represented by clusters having no objects in common. A tracing method based on similar object values is needed. In this paper, we introduce a novel approach that traces subspace clusters based on object value similarity. Neither subspace tracing nor tracing by object value similarity has been done before.
knowledge discovery and data mining | 2011
Stephan Günnemann; Hardy Kremer; Charlotte Laufkötter; Thomas Seidl
Cluster tracing algorithms are used to mine temporal evolutions of clusters. Generally, clusters represent groups of objects with similar values. In a temporal context like tracing, similar values correspond to similar behavior in one snapshot in time. Each cluster can be interpreted as a behavior type and cluster tracing corresponds to tracking similar behaviors over time. Existing tracing approaches are designed for datasets satisfying two specific conditions: The clusters appear in all attributes, i.e. fullspace clusters, and the data objects have unique identifiers. These identifiers are used for tracking clusters by measuring the number of objects two clusters have in common, i.e. clusters are traced based on similar object sets. These conditions, however, are strict: First, in complex data, clusters are often hidden in individual subsets of the dimensions. Second, mapping clusters based on similar objects sets does not reflect the idea of tracing similar behavior types over time, because similar behavior can even be represented by clusters having no objects in common. A tracing method based on similar object values is needed. In this paper, we introduce a novel approach that traces subspace clusters based on object value similarity. Neither subspace tracing nor tracing by object value similarity has been done before.
symposium on large spatial databases | 2009
Ira Assent; Hardy Kremer
Video copy detection should be capable of identifying video copies subject to alterations e.g. in video contrast or frame rates. We propose a video copy detection scheme that allows for adaptable detection of videos that are altered temporally (e.g. frame rate change) and/or visually (e.g. change in contrast). Our query processing combines filtering and indexing structures for efficient multistep computation of video copies under this model. We show that our model successfully identifies altered video copies and does so more reliably than existing models.
international conference on data mining | 2010
Hardy Kremer; Stephan Günnemann; Thomas Seidl
Climate change can be detected in several scientific domains including hydrology, meteorology, and oceanography. In this paper we describe our on-going work for detecting change in multivariate time series data from these domains. For the detection, we extract climate patterns from the data, represented by clusters of time series, and trace the clusters over time. A climate pattern is categorized as a changing pattern if it shows a similar tendency over a significant amount of time, e.g. several years. Since existing clustering and cluster tracing approaches are not suitable for time series data, we are working on novel clustering and tracing approaches specifically for this purpose.