Is this you? Create Your Porfile

Michael Hamann

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Hamann is active.

Explore More

Publication

Featured researches published by Michael Hamann.

advances in social networks analysis and mining | 2015

Structure-Preserving Sparsification of Social Networks

Gerd Lindner; Christian L. Staudt; Michael Hamann; Henning Meyerhenke; Dorothea Wagner

Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of edge sparsification methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally by these scores. In addition, we propose a new sparsification method (Local Degree) which preserves edges leading to local hub nodes. All methods are evaluated on a set of 100 Facebook social networks with respect to network properties including diameter, connected components, community structure, and multiple node centrality measures. Experiments with our implementations of the sparsification methods (using the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20% of the original set of edges. Furthermore, the experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. Our Local Degree method is fast enough for large-scale networks and performs well across a wider range of properties than previously proposed methods.

arXiv: Data Structures and Algorithms | 2018

Graph Bisection with Pareto Optimization

Michael Hamann; Ben Strasser

We introduce FlowCutter, a novel algorithm to compute a set of edge cuts or node separators that optimize cut size and balance in the Pareto sense. Our core algorithm heuristically solves the balanced connected st-edge-cut problem, where two given nodes s and t must be separated by removing edges to obtain two connected parts. Using the core algorithm as a subroutine, we build variants that compute node separators that are independent of s and t. From the computed Pareto set, we can identify cuts with a particularly good tradeoff between cut size and balance that can be used to compute contraction and minimum fill-in orders, which can be used in Customizable Contraction Hierarchies (CCHs), a speed-up technique for shortest-path computations. Our core algorithm runs in O(c∣E∣) time, where E is the set of edges and c is the size of the largest outputted cut. This makes it well suited for separating large graphs with small cuts, such as road graphs, which is the primary application motivating our research. For road graphs, we present an extensive experimental study demonstrating that FlowCutter outperforms the current state of the art in terms of both cut sizes and CCH performance. By evaluating FlowCutter on a standard graph partitioning benchmark, we further show that FlowCutter also finds small, balanced cuts on nonroad graphs. Another application is the computation of small tree decompositions. To evaluate the quality of our algorithm in this context, we entered the PACE 2016 challenge [13] and won first place in the corresponding sequential competition track. We can therefore conclude that our FlowCutter algorithm finds small, balanced cuts on a wide variety of graphs.

Studies in computational intelligence | 2016

Generating Scaled Replicas of Real-World Complex Networks

Christian L. Staudt; Michael Hamann; Ilya Safro; Alexander Gutfraind; Henning Meyerhenke

Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks can be generated by formal rules. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how models can be fitted to an original network to produce a structurally similar replica, and (c) aim for producing much larger networks than the original exemplar. In a comparative experimental study, we find ReCoN often superior to many other stateof- the-art network generation methods. Our design yields a scalable and effective tool for replicating a given network while preserving important properties at both microand macroscopic scales and (optionally) scaling the replica by orders of magnitude in size. We recommend ReCoN as a general practical method for creating realistic test data for the engineering of computational methods on networks, verification, and simulation studies. We provide scalable open-source implementations of most studied methods, including ReCoN.

algorithm engineering and experimentation | 2017

I/O-efficient Generation of Massive Graphs Following the LFR Benchmark

Michael Hamann; Ulrich Meyer; Manuel Penschuck; Dorothea Wagner

LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/Oefficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massive parallel distributed algorithm, and is faster than a state-of-theart internal memory implementation even on instances fitting in main memory. EM-LFR’s implementation is capable of generating large graph instances orders of magnitude faster than the original implementation. We give evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances. Similarly, we analyse the evolution of graph properties as EM-ES is executed on networks obtained with EM-CM/ES and find that the alternative approach can accelerate the sampling process. ∗This work was partially supported by the DFG under grants ME 2088/3-2, WA 654/22-2. Parts of this paper were published as [21]. 1 ar X iv :1 60 4. 08 73 8v 3 [ cs .D S] 1 4 Ju n 20 17

european symposium on algorithms | 2015

Fast Quasi-Threshold Editing

Ulrik Brandes; Michael Hamann; Ben Strasser; Dorothea Wagner

We introduce Quasi-Threshold Mover (QTM), an algorithm to solve the quasi-threshold (also called trivially perfect) graph editing problem with a minimum number of edge insertions and deletions. Given a graph it computes a quasi-threshold graph which is close in terms of edit count, but not necessarily closest as this edit problem is NP-hard. We present an extensive experimental study, in which we show that QTM performs well in practice and is the first heuristic that is able to scale to large real-world graphs in practice. As a side result we further present a simple linear-time algorithm for the quasi-threshold recognition problem.

Bioinformatics | 2018

Two C++ libraries for counting trees on a phylogenetic terrace

Rudolf Biczok; Peter Bozsoky; Peter Eisenmann; Johannes Ernst; Tobias Ribizel; Fedor Scholz; Axel Trefzer; Florian Weber; Michael Hamann; Alexandros Stamatakis

Abstract Motivation The presence of terraces in phylogenetic tree space, i.e. a potentially large number of distinct tree topologies that have exactly the same analytical likelihood score, was first described by Sanderson et al. However, popular software tools for maximum likelihood and Bayesian phylogenetic inference do not yet routinely report, if inferred phylogenies reside on a terrace, or not. We believe, this is due to the lack of an efficient library to (i) determine if a tree resides on a terrace, (ii) calculate how many trees reside on a terrace and (iii) enumerate all trees on a terrace. Results In our bioinformatics practical that is set up as a programming contest we developed two efficient and independent C++ implementations of the SUPERB algorithm by Constantinescu and Sankoff (1995) for counting and enumerating trees on a terrace. Both implementations yield exactly the same results, are more than one order of magnitude faster, and require one order of magnitude less memory than a previous thirrd party python implementation. Availability and implementation The source codes are available under GNU GPL at https://github.com/terraphast. Supplementary information Supplementary data are available at Bioinformatics online.

arXiv: Social and Information Networks | 2017

Generating realistic scaled complex networks

Christian L. Staudt; Michael Hamann; Alexander Gutfraind; Ilya Safro; Henning Meyerhenke

Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks including verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size.

european conference on parallel processing | 2018

Distributed Graph Clustering Using Modularity and Map Equation

Michael Hamann; Ben Strasser; Dorothea Wagner; Tim Zeitz

We study large-scale, distributed graph clustering. Given an undirected, weighted graph, our objective is to partition the nodes into disjoint sets called clusters. Each cluster should contain many internal edges. Further, there should only be few edges between clusters. We study two established formalizations of this internally-dense-externally-sparse principle: modularity and map equation. We present two versions of a simple distributed algorithm to optimize both measures. They are based on Thrill, a distributed big data processing framework that implements an extended MapReduce model. The algorithms for the two measures, DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality measures is easy. In an extensive experimental study, we demonstrate the excellent performance of our algorithms on real-world and synthetic graph clustering benchmark graphs.

Journal of Experimental Algorithmics | 2018

I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark

Michael Hamann; Ulrich Meyer; Manuel Penschuck; Hung Tran; Dorothea Wagner

LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/O-efficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation, we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massively parallel distributed algorithm, and is faster than a state-of-the-art internal memory implementation even on instances fitting in main memory. EM-LFR’s implementation is capable of generating large graph instances orders of magnitude faster than the original implementation. We give evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances. Similarly, we analyze the evolution of graph properties as EM-ES is executed on networks obtained with EM-CM/ES and find that the alternative approach can accelerate the sampling process.

Algorithms | 2017

Local Community Detection Based on Small Cliques

Michael Hamann; Eike Röhrs; Dorothea Wagner

Community detection aims to find dense subgraphs in a network. We consider the problem of finding a community locally around a seed node both in unweighted and weighted networks. This is a faster alternative to algorithms that detect communities that cover the whole network when actually only a single community is required. Further, many overlapping community detection algorithms use local community detection algorithms as basic building block. We provide a broad comparison of different existing strategies of expanding a seed node greedily into a community. For this, we conduct an extensive experimental evaluation both on synthetic benchmark graphs as well as real world networks. We show that results both on synthetic as well as real-world networks can be significantly improved by starting from the largest clique in the neighborhood of the seed node. Further, our experiments indicate that algorithms using scores based on triangles outperform other algorithms in most cases. We provide theoretical descriptions as well as open source implementations of all algorithms used.

Explore More