Philipp Kranen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philipp Kranen is active.

Explore More

Publication

Featured researches published by Philipp Kranen.

Knowledge and Information Systems | 2011

The ClusTree: indexing micro-clusters for anytime stream mining

Philipp Kranen; Ira Assent; Corinna Baldauf; Thomas Seidl

Clustering streaming data requires algorithms that are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data and within the possibly varying inter-arrival times of the stream. Likewise, memory is limited, making it impossible to store all data. For clustering, we are faced with the challenge of maintaining a current result that can be presented to the user at any given time. In this work, we propose a parameter-free algorithm that automatically adapts to the speed of the data stream. It makes best use of the time available under the current constraints to provide a clustering of the objects seen up to that point. Our approach incorporates the age of the objects to reflect the greater importance of more recent data. For efficient and effective handling, we introduce the ClusTree, a compact and self-adaptive index structure for maintaining stream summaries. Additionally we present solutions to handle very fast streams through aggregation mechanisms and propose novel descent strategies that improve the clustering result on slower streams as long as time permits. Our experiments show that our approach is capable of handling a multitude of different stream characteristics for accurate and scalable anytime stream clustering.

international conference on data mining | 2009

Self-Adaptive Anytime Stream Clustering

Philipp Kranen; Ira Assent; Corinna Baldauf; Thomas Seidl

Clustering streaming data requires algorithms which are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data and within the possibly varying inter-arrival times of the stream. Likewise, memory is limited, making it impossible to store all data. For clustering, we are faced with the challenge of maintaining a current result that can be presented to the user at any given time. In this work, we propose a parameter free algorithm that automatically adapts to the speed of the data stream. It makes best use of the time available under the current constraints to provide a clustering of the objects seen up to that point. Our approach incorporates the age of the objects to reflect the greater importance of more recent data. Moreover, we are capable of detecting concept drift, novelty and outliers in the stream. For efficient and effective handling, we introduce the ClusTree, a compact and self-adaptive index structure for maintaining stream summaries. Our experiments show that our approach is capable of handling a multitude of different stream characteristics for accurate and scalable anytime stream clustering.

international conference on management of data | 2008

Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction

Marc Wichterich; Ira Assent; Philipp Kranen; Thomas Seidl

The Earth Movers Distance (EMD) was developed in computer vision as a flexible similarity model that utilizes similarities in feature space to define a high quality similarity measure in feature representation space. It has been successfully adopted in a multitude of applications with low to medium dimensionality. However, multimedia applications commonly exhibit high-dimensional feature representations for which the computational complexity of the EMD hinders its adoption. An efficient query processing approach that mitigates and overcomes this effect is crucial. We propose novel dimensionality reduction techniques for the EMD in a filter-and-refine architecture for efficient lossless retrieval. Thorough experimental evaluation on real world data sets demonstrates a substantial reduction of the number of expensive high-dimensional EMD computations and thus remarkably faster response times. Our techniques are fully flexible in the number of reduced dimensions, which is a novel feature in approximation techniques for the EMD.

database systems for advanced applications | 2012

AnyOut: anytime outlier detection on streaming data

Ira Assent; Philipp Kranen; Corinna Baldauf; Thomas Seidl

With the increase of sensor and monitoring applications, data mining on streaming data is receiving increasing research attention. As data is continuously generated, mining algorithms need to be able to analyze the data in a one-pass fashion. In many applications the rate at which the data objects arrive varies greatly. This has led to anytime mining algorithms for classification or clustering. They successfully mine data until the a priori unknown point of interruption by the next data in the stream. In this work we investigate anytime outlier detection. Anytime outlier detection denotes the problem of determining within any period of time whether an object in a data stream is anomalous. The more time is available, the more reliable the decision should be. We introduce AnyOut, an algorithm capable of solving anytime outlier detection, and investigate different approaches to build up the underlying data structure. We propose a confidence measure for AnyOut that allows to improve the performance on constant data streams. We evaluate our method in thorough experiments and demonstrate its performance in comparison with established algorithms for outlier detection.

knowledge discovery and data mining | 2011

An effective evaluation measure for clustering on evolving data streams

Hardy Kremer; Philipp Kranen; Timm Jansen; Thomas Seidl; Albert Bifet; Geoff Holmes; Bernhard Pfahringer

Due to the ever growing presence of data streams, there has been a considerable amount of research on stream mining algorithms. While many algorithms have been introduced that tackle the problem of clustering on evolving data streams, hardly any attention has been paid to appropriate evaluation measures. Measures developed for static scenarios, namely structural measures and ground-truth-based measures, cannot correctly reflect errors attributable to emerging, splitting, or moving clusters. These situations are inherent to the streaming context due to the dynamic changes in the data distribution. In this paper we develop a novel evaluation measure for stream clustering called Cluster Mapping Measure (CMM). CMM effectively indicates different types of errors by taking the important properties of evolving data streams into account. We show in extensive experiments on real and synthetic data that CMM is a robust measure for stream clustering evaluation.

european conference on machine learning | 2011

MOA: a real-time analytics open source framework

Albert Bifet; Geoff Holmes; Bernhard Pfahringer; Jesse Read; Philipp Kranen; Hardy Kremer; Timm Jansen; Thomas Seidl

Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problems of scaling up the implementation of state of the art algorithms to real world dataset sizes and of making algorithms comparable in benchmark streaming settings. It contains a collection of offline and online algorithms for classification, clustering and graph mining as well as tools for evaluation. For researchers the framework yields insights into advantages and disadvantages of different approaches and allows for the creation of benchmark streaming data sets through stored, shared and repeatable settings for the data feeds. Practitioners can use the framework to easily compare algorithms and apply them to real world data sets and settings. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis. Besides providing algorithms and measures for evaluation and comparison, MOA is easily extensible with new contributions and allows for the creation of benchmark scenarios.

extending database technology | 2009

Indexing density models for incremental learning and anytime classification on data streams

Thomas Seidl; Ira Assent; Philipp Kranen; Ralph Krieger; Jennifer Herrmann

Classification of streaming data faces three basic challenges: it has to deal with huge amounts of data, the varying time between two stream data items must be used best possible (anytime classification) and additional training data must be incrementally learned (anytime learning) for applying the classifier consistently to fast data streams. In this work, we propose a novel index-based technique that can handle all three of the above challenges using the established Bayes classifier on effective kernel density estimators. Our novel Bayes tree automatically generates (adapted efficiently to the individual object to be classified) a hierarchy of mixture densities that represent kernel density estimators at successively coarser levels. Our probability density queries together with novel classification improvement strategies provide the necessary information for very effective classification at any point of interruption. Moreover, we propose a novel evaluation method for anytime classification using Poisson streams and demonstrate the anytime learning performance of the Bayes tree.

european conference on machine learning | 2009

Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Philipp Kranen; Thomas Seidl

Anytime algorithms have been proposed for many different applications e.g. in data mining. Their strengths are the ability to first provide a result after a very short initialization and second to improve their result with additional time. Therefore, anytime algorithms have so far been used when the available processing time varies, e.g. on varying data streams. In this paper we propose to employ anytime algorithms on constant data streams, i.e. for tasks with constant time allowance. We introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm. We derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets. The goal that was set and reached in this paper is to improve the quality of the result over that of traditional budget approaches, which are used in an abundance of stream mining applications. Using anytime classification as an example application we show for SVM, Bayes and nearest neighbor classifiers that both our novel approaches improve the classification accuracy for slow and fast data streams. The results confirm our general theoretic models and show the effectiveness of our approaches. The simple yet effective idea can be employed for any anytime algorithm along with a quality measure and motivates further research in e.g. classification confidence measures or anytime algorithms.

international conference on data mining | 2010

Clustering Performance on Evolving Data Streams: Assessing Algorithms and Evaluation Measures within MOA

Philipp Kranen; Hardy Kremer; Timm Jansen; Thomas Seidl; Albert Bifet; Geoff Holmes; Bernhard Pfahringer

In todays applications, evolving data streams are ubiquitous. Stream clustering algorithms were introduced to gain useful knowledge from these streams in real-time. The quality of the obtained clusterings, i.e. how good they reflect the data, can be assessed by evaluation measures. A multitude of stream clustering algorithms and evaluation measures for clusterings were introduced in the literature, however, until now there is no general tool for a direct comparison of the different algorithms or the evaluation measures. In our demo, we present a novel experimental framework for both tasks. It offers the means for extensive evaluation and visualization and is an extension of the Massive Online Analysis (MOA) software environment released under the GNU GPL License.

database systems for advanced applications | 2012

Stream data mining using the MOA framework

Philipp Kranen; Hardy Kremer; Timm Jansen; Thomas Seidl; Albert Bifet; Geoff Holmes; Bernhard Pfahringer; Jesse Read

Massive Online Analysis (MOA) is a software framework that provides algorithms and evaluation methods for mining tasks on evolving data streams. In addition to supervised and unsupervised learning, MOA has recently been extended to support multi-label classification and graph mining. In this demonstrator we describe the main features of MOA and present the newly added methods for outlier detection on streaming data. Algorithms can be compared to established baseline methods such as LOF and ABOD using standard ranking measures including Spearman rank coefficient and the AUC measure. MOA is an open source project and videos as well as tutorials are publicly available on the MOA homepage.

Explore More