Emmanuel Müller
Hasso Plattner Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Emmanuel Müller.
international conference on data engineering | 2012
Fabian Keller; Emmanuel Müller; Klemens Böhm
Outlier mining is a major task in data analysis. Outliers are objects that highly deviate from regular objects in their local neighborhood. Density-based outlier ranking methods score each object based on its degree of deviation. In many applications, these ranking methods degenerate to random listings due to low contrast between outliers and regular objects. Outliers do not show up in the scattered full space, they are hidden in multiple high contrast subspace projections of the data. Measuring the contrast of such subspaces for outlier rankings is an open research challenge. In this work, we propose a novel subspace search method that selects high contrast subspaces for density-based outlier ranking. It is designed as pre-processing step to outlier ranking algorithms. It searches for high contrast subspaces with a significant amount of conditional dependence among the subspace dimensions. With our approach, we propose a first measure for the contrast of subspaces. Thus, we enhance the quality of traditional outlier rankings by computing outlier scores in high contrast projections only. The evaluation on real and synthetic data shows that our approach outperforms traditional dimensionality reduction techniques, naive random projections as well as state-of-the-art subspace search techniques and provides enhanced quality for outlier ranking.
international conference on data mining | 2007
Ira Assent; Ralph Krieger; Emmanuel Müller; Thomas Seidl
To gain insight into todays large data resources, data mining provides automatic aggregation techniques. Clustering aims at grouping data such that objects within groups are similar while objects in different groups are dissimilar. In scenarios with many attributes or with noise, clusters are often hidden in subspaces of the data and do not show up in the full dimensional space. For these applications, subspace clustering methods aim at detecting clusters in any sub- space. Existing subspace clustering approaches fall prey to an effect we call dimensionality bias. As dimensionality of subspaces varies, approaches which do not take this effect into account fail to separate clusters from noise. We give a formal definition of dimensionality bias and analyze consequences for subspace clustering. A dimensionality unbiased subspace clustering (DUSC) definition based on statistical foundations is proposed. In thorough experiments on synthetic and real world data, we show that our approach outperforms existing subspace clustering algorithms.
international conference on data mining | 2008
Ira Assent; Ralph Krieger; Emmanuel Müller; Thomas Seidl
Subspace clustering aims at detecting clusters in any subspace projection of a high dimensional space. As the number of projections is exponential in the number of dimensions, efficiency is crucial. Moreover, the resulting subspace clusters are often highly redundant, i.e. many clusters are detected multiply in several projections. We propose a novel index for efficient subspace clustering in a novel depth-first processing with in-process-removal of redundant clusters for better pruning. Thorough experiments on real and synthetic data show that INSCY yields substantial efficiency and quality improvements.
international conference on data mining | 2009
Emmanuel Müller; Ira Assent; Stephan Günnemann; Ralph Krieger; Thomas Seidl
Subspace clustering aims at detecting clusters in any subspace projection of a high dimensional space. As the number of possible subspace projections is exponential in the number of dimensions, the result is often tremendously large. Recent approaches fail to reduce results to relevant subspace clusters. Their results are typically highly redundant, i.e. many clusters are detected multiple times in several projections. In this work, we propose a novel model for relevant subspace clustering (RESCU). We present a global optimization which detects the most interesting non-redundant subspace clusters. We prove that computation of this model is NP-hard. For RESCU, we propose an approximative solution that shows high accuracy with respect to our relevance model. Thorough experiments on synthetic and real world data show that RESCU successfully reduces the result to manageable sizes. It reliably achieves top clustering quality while competing approaches show greatly varying performance.
knowledge discovery and data mining | 2014
Bryan Perozzi; Leman Akoglu; Patricia Iglesias Sánchez; Emmanuel Müller
Graph clustering and graph outlier detection have been studied extensively on plain graphs, with various applications. Recently, algorithms have been extended to graphs with attributes as often observed in the real-world. However, all of these techniques fail to incorporate the user preference into graph mining, and thus, lack the ability to steer algorithms to more interesting parts of the attributed graph. In this work, we overcome this limitation and introduce a novel user-oriented approach for mining attributed graphs. The key aspect of our approach is to infer user preference by the so-called focus attributes through a set of user-provided exemplar nodes. In this new problem setting, clusters and outliers are then simultaneously mined according to this user preference. Specifically, our FocusCO algorithm identifies the focus, extracts focused clusters and detects outliers. Moreover, FocusCO scales well with graph size, since we perform a local clustering of interest to the user rather than global partitioning of the entire graph. We show the effectiveness and scalability of our method on synthetic and real-world graphs, as compared to both existing graph clustering and outlier detection approaches.
international conference on data engineering | 2011
Emmanuel Müller; Matthias Schiffer; Thomas Seidl
Outlier mining is an important data analysis task to distinguish exceptional outliers from regular objects. For outlier mining in the full data space, there are well established methods which are successful in measuring the degree of deviation for outlier ranking. However, in recent applications traditional outlier mining approaches miss outliers as they are hidden in subspace projections. Especially, outlier ranking approaches measuring deviation on all available attributes miss outliers deviating from their local neighborhood only in subsets of the attributes. In this work, we propose a novel outlier ranking based on the objects deviation in a statistically selected set of relevant subspace projections. This ensures to find objects deviating in multiple relevant subspaces, while it excludes irrelevant projections showing no clear contrast between outliers and the residual objects. Thus, we tackle the general challenges of detecting outliers hidden in subspaces of the data. We provide a selection of subspaces with high contrast and propose a novel ranking based on an adaptive degree of deviation in arbitrary subspaces. In thorough experiments on real and synthetic data we show that our approach outperforms competing outlier ranking approaches by detecting outliers in arbitrary subspace projections.
international conference on data engineering | 2008
Emmanuel Müller; Ira Assent; Uwe Steinhausen; Thomas Seidl
Outlier detection is an important data mining task for consistency checks, fraud detection, etc. Binary decision making on whether or not an object is an outlier is not appropriate in many applications and moreover hard to parametrize. Thus, recently, methods for outlier ranking have been proposed. Determining the degree of deviation, they do not require setting a decision boundary between outliers and the remaining data. High dimensional and heterogeneous (continuous and categorical attributes) data, however, pose a problem for most outlier ranking algorithms. In this work, we propose our OutRank approach for ranking outliers in heterogeneous high dimensional data. We introduce a consistent model for different attribute types. Our novel scoring functions transform the analyzed structure of the data to a meaningful ranking. Promising results in preliminary experiments show the potential for successful outlier ranking in high dimensional data.
Sigkdd Explorations | 2007
Ira Assent; Ralph Krieger; Emmanuel Müller; Thomas Seidl
To gain insight into todays large data resources, data mining extracts interesting patterns. To generate knowledge from patterns and benefit from human cognitive abilities, meaningful visualization of patterns are crucial. Clustering is a data mining technique that aims at grouping data to patterns based on mutual (dis)similarity. For high dimensional data, subspace clustering searches patterns in any subspace of the attributes as patterns are typically obscured by many irrelevant attributes in the full space. For visual analysis of subspace clusters, their comparability has to be ensured. Existing subspace clustering approaches, however, lack interactive visualization and show bias with respect to the dimensionality of subspaces. In this work, dimensionality unbiased subspace clustering and a novel distance function for subspace clusters are proposed. We suggest two visualization techniques that allow users to browse the entire subspace clustering, to zoom into individual objects, and to analyze subspace cluster characteristics in-depth. Bracketing of different parameter settings enable users to immediately see the effect of parameters on their data and hence to choose the best clustering result for further analysis. Usage of user analysis for feedback to the subspace clustering algorithm directly improves the subspace clustering. We demonstrate our visualization techniques on real world data and confirm results through additional accuracy measurements and comparison with existing subspace clustering algorithms.
international conference on data mining | 2012
Emmanuel Müller; Ira Assent; Patricia Iglesias; Yvonne Mülle; Klemens Böhm
Outlier mining is an important task for finding anomalous objects. In practice, however, there is not always a clear distinction between outliers and regular objects as objects have different roles w.r.t. different attribute sets. An object may deviate in one subspace, i.e. a subset of attributes. And the same object might appear perfectly regular in other subspaces. One can think of subspaces as multiple views on one database. Traditional methods consider only one view (the full attribute space). Thus, they miss complex outliers that are hidden in multiple subspaces. In this work, we propose Outrank, a novel outlier ranking concept. Outrank exploits subspace analysis to determine the degree of outlierness. It considers different subsets of the attributes as individual outlier properties. It compares clustered regions in arbitrary subspaces and derives an outlierness score for each object. Its principled integration of multiple views into an outlierness measure uncovers outliers that are not detectable in the full attribute space. Our experimental evaluation demonstrates that Outrank successfully determines a high quality outlier ranking, and outperforms state-of-the-art outlierness measures.
conference on information and knowledge management | 2011
Stephan Günnemann; Ines Färber; Emmanuel Müller; Ira Assent; Thomas Seidl
Knowledge discovery in databases requires not only development of novel mining techniques but also fair and comparable quality assessment based on objective evaluation measures. Especially in young research areas where no common measures are available, researchers are unable to provide a fair evaluation. Typically, publications glorify the high quality of one approach only justified by an arbitrary evaluation measure. However, such conclusions can only be drawn if the evaluation measures themselves are fully understood. In this paper, we provide the basis for systematic evaluation in the emerging research area of subspace clustering. We formalize general quality criteria for subspace clustering measures not yet addressed in the literature. We compare the existing external evaluation methods based on these criteria and pinpoint limitations. We propose a novel external evaluation measure which meets the requirements in form of quality properties. In thorough experiments we empirically show characteristic properties of evaluation measures. Overall, we provide a set of evaluation measures that fulfill the general quality criteria as recommendation for future evaluations. All measures and datasets are provided on our website and are integrated in our evaluation framework.