Brigitte Boden | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brigitte Boden is active.

Explore More

Publication

Featured researches published by Brigitte Boden.

international conference on data mining | 2010

Subspace Clustering Meets Dense Subgraph Mining: A Synthesis of Two Paradigms

Stephan Günnemann; Ines Färber; Brigitte Boden; Thomas Seidl

Todays applications deal with multiple types of information: graph data to represent the relations between objects and attribute data to characterize single objects. Analyzing both data sources simultaneously can increase the quality of mining methods. Recently, combined clustering approaches were introduced, which detect densely connected node sets within one large graph that also show high similarity according to all of their attribute values. However, for attribute data it is known that this full-space clustering often leads to poor clustering results. Thus, subspace clustering was introduced to identify locally relevant subsets of attributes for each cluster. In this work, we propose a method for finding homogeneous groups by joining the paradigms of subspace clustering and dense sub graph mining, i.e. we determine sets of nodes that show high similarity in subsets of their dimensions and that are as well densely connected within the given graph. Our twofold clusters are optimized according to their density, size, and number of relevant dimensions. Our developed redundancy model confines the clustering to a manageable size of only the most interesting clusters. We introduce the algorithm Gamer for the efficient calculation of our clustering. In thorough experiments on synthetic and real world data we show that Gamer achieves low runtimes and high clustering qualities.

european conference on machine learning | 2011

DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors

Stephan Günnemann; Brigitte Boden; Thomas Seidl

Data sources representing attribute information in combination with network information are widely available in todays applications. To realize the full potential for knowledge extraction, mining techniques like clustering should consider both information types simultaneously. Recent clustering approaches combine subspace clustering with dense subgraph mining to identify groups of objects that are similar in subsets of their attributes as well as densely connected within the network. While those approaches successfully circumvent the problem of full-space clustering, their limited cluster definitions are restricted to clusters of certain shapes. In this work, we introduce a density-based cluster definition taking the attribute similarity in subspaces and the graph density into account. This novel cluster model enables us to detect clusters of arbitrary shape and size. We avoid redundancy in the result by selecting only the most interesting non-redundant clusters. Based on this model, we introduce the clustering algorithm DB-CSC. In thorough experiments we demonstrate the strength of DB-CSC in comparison to related approaches.

pacific-asia conference on knowledge discovery and data mining | 2013

Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors

Stephan Günnemann; Brigitte Boden; Ines Färber; Thomas Seidl

Large graphs are ubiquitous in today’s applications. Besides the mere graph structure, data sources usually provide information about single objects by feature vectors. To realize the full potential for knowledge extraction, recent approaches consider both information types simultaneously. Thus, for the task of clustering, combined clustering models determine object groups within one network that are densely connected and show similar characteristics. However, due to the inherent complexity of such a combination, the existing methods are not efficiently executable and are hardly applicable to large graphs.

Knowledge and Information Systems | 2014

GAMer: a synthesis of subspace clustering and dense subgraph mining

Stephan Günnemann; Ines Färber; Brigitte Boden; Thomas Seidl

In this work, we propose a new method to find homogeneous object groups in a single vertex-labeled graph. The basic premise is that many prevalent datasets consist of multiple types of information: graph data to represent the relations between objects and attribute data to characterize the single objects. Analyzing both information types simultaneously can increase the expressiveness of the resulting patterns. Our patterns of interest are sets of objects that are densely connected within the associated graph and as well show high similarity regarding their attributes. As for attribute data it is known that full-space clustering often is futile, we have to analyze the similarity of objects regarding subsets of their attributes. In order to take full advantage of all present information, we combine the paradigms of dense subgraph mining and subspace clustering. For our approach, we face several challenges to achieve a sound combination of the two paradigms. We maximize our twofold clusters according to their density, size, and number of relevant dimensions. The optimization of these three objectives usually is conflicting; thus, we realize a trade-off between these characteristics to obtain meaningful patterns. We develop a redundancy model to confine the clustering to a manageable size by selecting only the most interesting clusters for the result set. We prove the complexity of our clustering model and we particularly focus on the exploration of various pruning strategies to design the efficient algorithm GAMer (Graph & Attribute Miner). In thorough experiments on synthetic and real world data we show that GAMer achieves low runtimes and high clustering qualities. We provide all datasets, measures, executables, and parameter settings on our website http://dme.rwth-aachen.de/gamer.

european conference on machine learning | 2012

CC-MR --- finding connected components in huge graphs with mapreduce

Thomas Seidl; Brigitte Boden; Sergej Fries

The detection of connected components in graphs is a well-known problem arising in a large number of applications including data mining, analysis of social networks, image analysis and a lot of other related problems. In spite of the existing very efficient serial algorithms, this problem remains a subject of research due to increasing data amounts produced by modern information systems which cannot be handled by single workstations. Only highly parallelized approaches on multi-core-servers or computer clusters are able to deal with these large-scale data sets. In this work we present a solution for this problem for distributed memory architectures, and provide an implementation for the well-known MapReduce framework developed by Google. Our algorithm CC-MR significantly outperforms the existing approaches for the MapReduce framework in terms of the number of necessary iterations, communication costs and execution runtime, as we show in our experimental evaluation on synthetic and real-world data. Furthermore, we present a technique for accelerating our implementation for datasets with very heterogeneous component sizes as they often appear in real data sets.

Data Mining and Knowledge Discovery | 2012

Finding density-based subspace clusters in graphs with feature vectors

Stephan Günnemann; Brigitte Boden; Thomas Seidl

Data sources representing attribute information in combination with network information are widely available in today’s applications. To realize the full potential for knowledge extraction, mining techniques like clustering should consider both information types simultaneously. Recent clustering approaches combine subspace clustering with dense subgraph mining to identify groups of objects that are similar in subsets of their attributes as well as densely connected within the network. While those approaches successfully circumvent the problem of full-space clustering, their limited cluster definitions are restricted to clusters of certain shapes. In this work we introduce a density-based cluster definition, which takes into account the attribute similarity in subspaces as well as a local graph density and enables us to detect clusters of arbitrary shape and size. Furthermore, we avoid redundancy in the result by selecting only the most interesting non-redundant clusters. Based on this model, we introduce the clustering algorithm DB-CSC, which uses a fixed point iteration method to efficiently determine the clustering solution. We prove the correctness and complexity of this fixed point iteration analytically. In thorough experiments we demonstrate the strength of DB-CSC in comparison to related approaches.

international conference on data engineering | 2014

PHiDJ: Parallel similarity self-join for high-dimensional vector data with MapReduce

Sergej Fries; Brigitte Boden; Grzegorz Stepien; Thomas Seidl

Join processing on large-scale vector data is an important problem in many applications, as vectors are a common representation for various data types. Especially, several data analysis tasks like near duplicate detection, density-based clustering or data cleaning are based on similarity self-joins, which are a special type of join. For huge data sets, MapReduce proved to be a suitable, error-tolerant framework for parallel join algorithms. Recent approaches exploit the vector-space properties for low-dimensional vector data for an efficient join computation. However, so far no parallel similarity self-join approaches aiming at high-dimensional vector data were proposed. In this work we propose the novel similarity self-join algorithm PHiDJ (Parallel High-Dimensional Join) for the MapReduce framework. PHiDJ is well suited for medium to high-dimensional data and exploits multiple filter techniques for reducing communication and computational costs. We provide a solution for efficient join computation for skewed distributed data. Our experimental evaluation on medium- to high-dimensional data shows that our approach outperforms existing techniques.

european conference on machine learning | 2014

Density-based subspace clustering in heterogeneous networks

Brigitte Boden; Martin Ester; Thomas Seidl

Many real-world data sets, like data from social media or bibliographic data, can be represented as heterogeneous networks with several vertex types. Often additional attributes are available for the vertices, such as keywords for a paper. Clustering vertices in such networks, and analyzing the complex interactions between clusters of different types, can provide useful insights into the structure of the data. To exploit the full information content of the data, clustering approaches should consider the connections in the network as well as the vertex attributes. We propose the density-based clustering model TCSC for the detection of clusters in heterogeneous networks that are densely connected in the network as well as in the attribute space. Unlike previous approaches for clustering heterogeneous networks, TCSC enables the detection of clusters that show similarity only in a subset of the attributes, which is more effective in the presence of a large number of attributes.

statistical and scientific database management | 2013

RMiCS: a robust approach for mining coherent subgraphs in edge-labeled multi-layer graphs

Brigitte Boden; Stephan Günnemann; Holger Hoffmann; Thomas Seidl

Detecting dense subgraphs in a large graph is an important graph mining problem and various approaches have been proposed for its solution. While most existing methods only consider unlabeled and one-dimensional graph data, many real-world applications provide far richer information. Thus, in our work, we consider graphs that contain different types of edges -- represented as different layers/dimensions of a graph -- as well as edge labels that further characterize the relations between two vertices. We argue that exploiting this additional information supports the detection of more interesting clusters. In general, we aim at detecting clusters of vertices that are densely connected by edges with similar labels in subsets of the graph layers. So far, there exists only a single method that tries to detect clusters in such graphs. This method, however, is highly sensitive to noise: already a single edge with a deviating label can completely hinder the detection of interesting clusters. In this paper, we present the RCS (Robust Coherent Subgraph) model which enables us to detect clusters even in noisy data. This robustness greatly enhances the applicability on real-world data. In order to obtain interpretable results, RCS avoids redundant clusters in the result set. We present the algorithm RMiCS for an efficient detection of RCS clusters and we analyze its behavior in various experiments on synthetic and real-world data.

conference on information and knowledge management | 2012

Tracing clusters in evolving graphs with node attributes

Brigitte Boden; Stephan Günnemann; Thomas Seidl

Data sources representing social networks with additional attribute information about the nodes are widely available in todays applications. Recently, combined clustering methods were introduced that consider graph information and attribute information simultaneously to detect meaningful clusters in such networks. In many cases, such attributed graphs also evolve over time. Therefore, there is a need for clustering methods that are able to trace clusters over different time steps and analyze their evolution over time. In this paper, we extend our combined clustering method DB-CSC to the analysis of evolving combined clusters.

Explore More