Claudia Plant | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Claudia Plant is active.

Explore More

Publication

Featured researches published by Claudia Plant.

knowledge discovery and data mining | 2016

Skinny-dip: Clustering in a Sea of Noise

Samuel Maurus; Claudia Plant

Can we find heterogeneous clusters hidden in data sets with 80% noise? Although such settings occur in the real-world, we struggle to find methods from the abundance of clustering techniques that perform well with noise at this level. Indeed, perhaps this is enough of a departure from classical clustering to warrant its study as a separate problem. In this paper we present SkinnyDip which, based on Hartigans elegant dip test of unimodality, represents an intriguing approach to clustering with an attractive set of properties. Specifically, SkinnyDip is highly noise-robust, practically parameter-free and completely deterministic. SkinnyDip never performs multivariate distance calculations, but rather employs insightful recursion based on dips into univariate projections of the data. It is able to detect a range of cluster shapes and densities, assuming only that each cluster admits a unimodal shape. Practically, its run-time grows linearly with the data. Finally, for high-dimensional data, continuity properties of the dip enable SkinnyDip to exploit multimodal projection pursuit in order to find an appropriate basis for clustering. Although not without its limitations, SkinnyDip compares favorably to a variety of clustering approaches on synthetic and real data, particularly in high-noise settings.

Knowledge and Information Systems | 2017

Synchronization-based scalable subspace clustering of high-dimensional data

Junming Shao; Xinzuo Wang; Qinli Yang; Claudia Plant; Christian Böhm

How to address the challenges of the “curse of dimensionality” and “scalability” in clustering simultaneously? In this paper, we propose arbitrarily oriented synchronized clusters (ORSC), a novel effective and efficient method for subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the weighted interaction model and iterative dynamic clustering, our approach ORSC (a) naturally detects correlation clusters in arbitrarily oriented subspaces, including arbitrarily shaped nonlinear correlation clusters. Our approach is (b) robust against noise and outliers. In contrast to previous methods, ORSC is (c) easy to parameterize, since there is no need to specify the subspace dimensionality or other difficult parameters. Instead, all interesting subspaces are detected in a fully automatic way. Finally, (d) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets. Extensive experiments have demonstrated the effectiveness and efficiency of our approach.

european conference on machine learning | 2017

Attributed Graph Clustering with Unimodal Normalized Cut

Wei Ye; Linfei Zhou; Xin Sun; Claudia Plant; Christian Böhm

Graph vertices are often associated with attributes. For example, in addition to their connection relations, people in friendship networks have personal attributes, such as interests, age, and residence. Such graphs (networks) are called attributed graphs. The detection of clusters in attributed graphs is of great practical relevance, e.g., targeting ads. Attributes and edges often provide complementary information. The effective use of both types of information promises meaningful results. In this work, we propose a method called UNCut (for Unimodal Normalized Cut) to detect cohesive clusters in attributed graphs. A cohesive cluster is a subgraph that has densely connected edges and has as many homogeneous (unimodal) attributes as possible. We adopt the normalized cut to assess the density of edges in a graph cluster. To evaluate the unimodality of attributes, we propose a measure called unimodality compactness which exploits Hartigans’ dip test. Our method UNCut integrates the normalized cut and unimodality compactness in one framework such that the detected clusters have low normalized cut and unimodality compactness values. Extensive experiments on various synthetic and real-world data verify the effectiveness and efficiency of our method UNCut compared with state-of-the-art approaches. Code and data related to this chapter are available at: https://www.dropbox.com/sh/xz2ndx65jai6num/AAC9RJ5PqQoYoxreItW83PrLa?dl=0.

knowledge discovery and data mining | 2016

FUSE: Full Spectral Clustering

Wei Ye; Sebastian Goebl; Claudia Plant; Christian Böhm

Multi-scale data which contains structures at different scales of size and density is a big challenge for spectral clustering. Even given a suitable locally scaled affinity matrix, the first k eigenvectors of such a matrix still cannot separate clusters well. Thus, in this paper, we exploit the fusion of the cluster-separation information from all eigenvectors to achieve a better clustering result. Our method FUll Spectral ClustEring (FUSE) is based on Power Iteration (PI) and Independent Component Analysis (ICA). PI is used to fuse all eigenvectors to one pseudo-eigenvector which inherits all the cluster-separation information. To conquer the cluster-collision problem, we utilize PI to generate p (p > k) pseudo-eigenvectors. Since these pseudo-eigenvectors are redundant and the cluster-separation information is contaminated with noise, ICA is adopted to rotate the pseudo-eigenvectors to make them pairwise statistically independent. To let ICA overcome local optima and speed up the search process, we develop a self-adaptive and self-learning greedy search method. Finally, we select k rotated pseudo-eigenvectors (independent components) which have more cluster-separation information measured by kurtosis for clustering. Various synthetic and real-world data verifies the effectiveness and efficiency of our FUSE method.

international conference on data mining | 2016

Gaussian Component Based Index for GMMs

Linfei Zhou; Bianca Wackersreuther; Frank Fiedler; Claudia Plant; Christian Böhm

Efficient similarity search for uncertain data is a challenging task in many modern data mining applications like image retrieval, speaker recognition and stock market analysis. A common way to model the uncertainty of data objects is using probability density functions in the form of Gaussian Mixture Models (GMMs), which have an ability to approximate arbitrary distribution. However, due to the possible unequal length of mixture models, the use of existing index techniques has serious problems for the objects modeled by GMMs. Either the techniques cannot handle GMMs or they have too many limitations. Hence, we propose a dynamic index structure, Gaussian Component based Index (GCI), for GMMs. GCI decomposes GMMs into the single, pairs, or n-lets of Gaussian components, stores these components into well studied index trees such as U-tree and Gauss-Tree, and refines the corresponding GMMs in a conservative but tight way. GCI supports both k-most-likely queries and probability threshold queries by means of Matching Probability. Extensive experimental evaluations of GCI demonstrate a considerable speed-up of similarity search on both synthetic and real-world data sets.

international conference on data mining | 2016

Generalized Independent Subspace Clustering

Wei Ye; Samuel Maurus; Nina Hubig; Claudia Plant

Data can encapsulate different object groupings in subspaces of arbitrary dimension and orientation. Finding such subspaces and the groupings within them is the goal of generalized subspace clustering. In this work we present a generalized subspace clustering technique capable of finding multiple non-redundant clusterings in arbitrarily-oriented subspaces. We use Independent Subspace Analysis (ISA) to find the subspace collection that minimizes the statistical dependency (redundancy) between clusterings. We then cluster in the arbitrarily-oriented subspaces identified by ISA. Our algorithm ISAAC (Independent Subspace Analysis and Clustering) uses the Minimum Description Length principle to automatically choose parameters that are otherwise difficult to set. We comprehensively demonstrate the effectiveness of our approach on synthetic and real-world data.

Neurocomputing | 2018

Transferring deep knowledge for object recognition in Low-quality underwater videos

Xin Sun; Junyu Shi; Lipeng Liu; Junyu Dong; Claudia Plant; Xinhua Wang; Huiyu Zhou

In recent years, underwater video technologies allow us to explore the ocean in scientific and noninvasive ways, such as environmental monitoring, marine ecology studies, and fisheries management. However the low-light and high-noise scenarios pose great challenges for the underwater image and video analysis. We here propose a CNN knowledge transfer framework for underwater object recognition and tackle the problem of extracting discriminative features from relatively low contrast images. Even with the insufficient training set, the transfer framework can well learn a recognition model for the special underwater object recognition task together with the help of data augmentation. For better identifying objects from an underwater video, a weighted probabilities decision mechanism is introduced to identify the object from a series of frames. The proposed framework can be implemented for real-time underwater object recognition on autonomous underwater vehicles and video monitoring systems. To verify the effectiveness of our method, experiments on a public dataset are carried out. The results show that the proposed method achieves promising results for underwater object recognition on both test image datasets and underwater videos.

knowledge discovery and data mining | 2017

Learning from Labeled and Unlabeled Vertices in Networks

Wei Ye; Linfei Zhou; Dominik Mautz; Claudia Plant; Christian Böhm

Networks such as social networks, citation networks, protein-protein interaction networks, etc., are prevalent in real world. However, only very few vertices have labels compared to large amounts of unlabeled vertices. For example, in social networks, not every user provides his/her profile information such as the personal interests which are relevant for targeted advertising. Can we leverage the limited user information and friendship network wisely to infer the labels of unlabeled users? In this paper, we propose a semi-supervised learning framework called weighted-vote Geometric Neighbor classifier (wvGN) to infer the likely labels of unlabeled vertices in sparsely labeled networks. wvGN exploits random walks to explore not only local but also global neighborhood information of a vertex. Then the label of the vertex is determined by the accumulated local and global neighborhood information. Specifically, wvGN optimizes a proposed objective function by a search strategy which is based on the gradient and coordinate descent methods. The search strategy iteratively conducts a coarse search and a fine search to escape from local optima. Extensive experiments on various synthetic and real-world data verify the effectiveness of wvGN compared to state-of-the-art approaches.

knowledge discovery and data mining | 2017

Towards an Optimal Subspace for K-Means

Dominik Mautz; Wei Ye; Claudia Plant; Christian Böhm

Is there an optimal dimensionality reduction for k-means, revealing the prominent cluster structure hidden in the data? We propose SUBKMEANS, which extends the classic k-means algorithm. The goal of this algorithm is twofold: find a sufficient k-means-style clustering partition and transform the clusters onto a common subspace, which is optimal for the cluster structure. Our solution is able to pursue these two goals simultaneously. The dimensionality of this subspace is found automatically and therefore the algorithm comes without the burden of additional parameters. At the same time this subspace helps to mitigate the curse of dimensionality. The SUBKMEANS optimization algorithm is intriguingly simple and efficient. It is easy to implement and can readily be adopted to the current situation. Furthermore, it is compatible to many existing extensions and improvements of k-means.

knowledge discovery and data mining | 2017

Let's See Your Digits: Anomalous-State Detection using Benford's Law

Samuel Maurus; Claudia Plant

Benfords Law explains a curious phenomenon in which the leading digits of naturally-occurring numerical data are distributed in a precise fashion. In this paper we begin by showing that system metrics generated by many modern information systems like Twitter, Wikipedia, YouTube and GitHub obey this law. We then propose a novel unsupervised approach called BenFound that exploits this property to detect anomalous system events. BenFound tracks the Benfordness of key system metrics, like the follower counts of tweeting Twitter users or the change deltas in Wikipedia page edits. It then applies a novel Benford-conformity test in real-time to identify non-Benford events. We investigate a variety of such events, showing that they correspond to unnatural and often undesirable system interactions like spamming, hashtag-hijacking and denial-of-service attacks. The result is a technically-uncomplicated and effective red flagging technique that can be used to complement existing anomaly-detection approaches. Although not without its limitations, it is highly efficient and requires neither obscure parameters, nor text streams, nor natural-language processing.

Explore More