Gereon Frahling
University of Paderborn
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gereon Frahling.
symposium on principles of database systems | 2006
Luciana S. Buriol; Gereon Frahling; Stefano Leonardi; Alberto Marchetti-Spaccamela; Christian Sohler
We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log|V|⋅(1+s⋅|V|/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs.We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8,000 nodes and 40,000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s=1,000, 10,000, 100,000, 1,000,000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s=1,000,000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s=10,000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly).
symposium on the theory of computing | 2005
Gereon Frahling; Christian Sohler
A dynamic geometric data stream consists of a sequence of <i>m</i> insert/delete operations of points from the discrete space 1,…,Δ<i><sup>d</sup></i> [26]. We develop streaming (1 + ε)-approximation algorithms for <i>k</i>-median, <i>k</i>-means, MaxCut, maximum weighted matching (MaxWM), maximum travelling salesperson (MaxTSP), maximum spanning tree (MaxST), and average distance over dynamic geometric data streams. Our algorithms maintain a small weighted set of points(a coreset) that approximates with probability 2/3 the current point set with respect to the considered problem during the <i>m</i> insert/delete operations of the data stream. They use poly (ε<sup>-1</sup>, log <i>m</i>, log Δ) space and update time per insert/delete operation for constant <i>k</i> and dimension <i>d</i>Having a coreset one only needs a fast approximation algorithm for the weighted problem to compute a solution quickly. In fact, even an exponential algorithm is sometimes feasible as its running time may still be polynomial in <i>n</i>. For example one can compute in poly(log <i>n</i>, exp(<i>O</i>((1+log (1⁄ε)⁄ε)<sup><i>d</i>-1</sup>))) time a solution to <i>k</i>-median and <i>k</i>-means [21] where <i>n</i> is the size of the current point set and <i>k</i> and <i>d</i> are constants. Finding an implicit solution to MaxCut can be done in poly(log <i>n</i>, exp((1⁄ε)<sup>O(1)</sup>)) time. For MaxST and average distance we require poly(log <i>n</i>, ε<sup>-1</sup>) time and for MaxWM we require O(<i>n</i><sup>3</sup>) time to do this.
symposium on computational geometry | 2005
Gereon Frahling; Piotr Indyk; Christian Sohler
A dynamic geometric data stream is a sequence of m Add/Remove operations of points from a discrete geometric space (1,...,Δ)d [21]. Add(p) inserts a point p from (1,...,Δ)d into the current point set, Remove(p) deletes p from P. We develop low-storage data structures to (i) maintain ε-approximations of range spaces of P with constant VC-dimension and (ii) maintain an ε-approximation of the weight of the Euclidean minimum spanning tree of P. Our data structures use O(log3ε • log3(1/ε) • log(1/ε)/ε2) and O(log (1/δ) • (log Δ/ε)O(d)) bits of memory, respectively (we assume that the dimension d is a constant), and they are correct with probability 1-δ. These results are based on a new data structure that maintains a set of elements chosen (almost) uniformly at random from P.
symposium on computational geometry | 2006
Gereon Frahling; Christian Sohler
In this paper we develop an efficient implementation for a k-means clustering algorithm. Our algorithm is a variant of KMHybrid [28, 20], i.e. it uses a combination of Lloyd-steps and random swaps, but as a novel feature it uses coresets to speed up the algorithm. A coreset is a small weighted set of points that approximates the original point set with respect to the considered problem. The main strength of the algorithm is that it can quickly determine clusterings of the same point set for many values of k. This is necessary in many applications, since, typically, one does not know a good value for k in advance. Once we have clusterings for many different values of k we can determine a good choice of k using a quality measure of clusterings that is independent of k, for example the average silhouette coefficient. The average silhouette coefficient can be approximated using coresets.To evaluate the performance of our algorithm we compare it with algorithm KMHybrid [28] on typical 3D data sets for an image compression application and on artificially created instances. Our data sets consist of 300,000 to 4.9 million points. We show that our algorithm significantly outperforms KMHybrid on most of these input instances. Additionally, the quality of the solutions computed by our algorithm deviates less than that of KMHybrid.We also computed clusterings and approximate average silhouette coefficient for k=1,…,100 for our input instances and discuss the performance of our algorithm in detail.
european symposium on algorithms | 2007
Luciana S. Buriol; Gereon Frahling; Stefano Leonardi; Christian Sohler
We present random sampling algorithms that with probability at least 1 - δ compute a (1 ± Ɛ)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph. n nSince the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs. n nWe implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largest instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances.
International Journal of Computational Geometry and Applications | 2008
Gereon Frahling; Piotr Indyk; Christian Sohler
A dynamic geometric data stream is a sequence of m ADD/REMOVE operations of points from a discrete geometric space {1,…, Δ}d ?. ADD (p) inserts a point p from {1,…, Δ}d into the current point set P, REMOVE(p) deletes p from P. We develop low-storage data structures to (i) maintain e-nets and e-approximations of range spaces of P with small VC-dimension and (ii) maintain a (1 + e)-approximation of the weight of the Euclidean minimum spanning tree of P. Our data structure for e-nets uses bits of memory and returns with probability 1 – δ a set of points that is an e-net for an arbitrary fixed finite range space with VC-dimension . Our data structure for e-approximations uses bits of memory and returns with probability 1 – δ a set of points that is an e-approximation for an arbitrary fixed finite range space with VC-dimension . The data structure for the approximation of the weight of a Euclidean minimum spanning tree uses O(log(1/δ)(log Δ/e)O(d)) space and is correct with probability at least 1 – δ. Our results are based on a new data structure that maintains a set of elements chosen (almost) uniformly at random from P.
International Journal of Computational Geometry and Applications | 2008
Gereon Frahling; Christian Sohler
In this paper we develop an efficient implementation for a k-means clustering algorithm. The algorithm is based on a combination of Lloyds algorithm with random swapping of centers to avoid local minima. This approach was proposed by Mount 30. The novel feature of our algorithms is the use of coresets to speed up the algorithm. A coreset is a small weighted set of points that approximates the original point set with respect to the considered problem. We use a coreset construction described in 12. Our algorithm first computes a solution on a very small coreset. Then in each iteration the previous solution is used as a starting solution on a refined, i.e. larger, coreset. To evaluate the performance of our algorithm we compare it with algorithm KMHybrid 30 on typical 3D data sets for an image compression application and on artificially created instances. Our data sets consist of 300,000 to 4.9 million points. Our algorithm outperforms KMHybrid on most of these input instances. Additionally, the quality of the solutions computed by our algorithm deviates significantly less than that of KMHybrid. We conclude that the use of coresets has two effects. First, it can speed up algorithms significantly. Secondly, in variants of Lloyds algorithm, it reduces the dependency on the starting solution and thus makes the algorithm more stable. Finally, we propose the use of coresets as a heuristic to approximate the average silhouette coefficient of clusterings. The average silhouette coefficient is a measure for the quality of a clustering that is independent of the number of clusters k. Hence, it can be used to compare the quality of clusterings for different sizes of k. To show the applicability of our approach we computed clusterings and approximate average silhouette coefficient for k = 1,…, 100 for our input instances and discuss the performance of our algorithm in detail.
cologne twente workshop on graphs and combinatorial optimization | 2006
Ulrich Faigle; Gereon Frahling
Computing a maximum weighted stable set in a bipartite graph is considered well-solved and usually approached with preflow-push, Ford-Fulkerson or network simplex algorithms. We present a combinatorial algorithm for the problem that is not based on flows. Numerical tests suggest that this algorithm performs quite well in practice and is competitive with flow based algorithms especially in the case of dense graphs.
european symposium on algorithms | 2005
Gereon Frahling; Jens Krokowski
Modern computer graphics systems are able to render sophisticated 3D scenes consisting of millions of polygons. For most camera positions only a small collection of these polygons is visible. We address the problem of occlusion culling, i.e., determine hidden primitives. Aila, Miettinen, and Nordlund suggested to implement a FIFO buffer on graphics cards which is able to delay the polygons before drawing them [2]. When one of the polygons within the buffer is occluded or masked by another polygon arriving later from the application, the rendering engine can drop the occluded one without rendering, saving important rendering time. n nWe introduce a theoretical online model to analyse these problems in theory using competitive analysis. For different cost measures we invent the first competitive algorithms for online occlusion culling. Our implementation shows that these algorithms outperform the FIFO strategy for real 3D scenes as well.
international conference on machine learning | 2017
Vladimir Braverman; Gereon Frahling; Harry Lang; Christian Sohler; Lin F. Yang