Ralf Diekmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ralf Diekmann is active.

Explore More

Publication

Featured researches published by Ralf Diekmann.

parallel computing | 1999

Efficient schemes for nearest neighbor load balancing

Ralf Diekmann; Andreas Frommer; Burkhard Monien

We design a general mathematical framework to analyze the properties of nearest neighbor balancing algorithms of the diffusion type. Within this framework we develop a new Optimal Polynomial Scheme (OPS) which we show to terminate within a finite number m of steps, where m only depends on the graph and not on the initial load distribution.We show that all existing diffusion load balancing algorithms, including OPS, determine a flow of load on the edges of the graph which is uniquely defined, independent of the method and minimal in the l2-norm. This result can also be extended to edge weighted graphs.The l2-minimality is achieved only if a diffusion algorithm is used as preprocessing and the real movement of load is performed in a second step. Thus, it is advisable to split the balancing process into the two steps of first determining a balancing flow and afterwards moving the load. We introduce the problem of scheduling a flow and present some first results on its complexity and the approximation quality of local greedy heuristics.

parallel computing | 2000

Shape-optimized mesh partitioning and load balancing for parallel adaptive FEM

Ralf Diekmann; Robert Preis; Frank Schlimbach; Chris Walshaw

We present a dynamic distributed load balancing algorithm for parallel, adaptive Finite Element simulations in which we use preconditioned Conjugate Gradient solvers based on domain-decomposition. The load balancing is designed to maintain good partition aspect ratio and we show that cut size is not always the appropriate measure in load balancing. Furthermore, we attempt to answer the question why the aspect ratio of partitions plays an important role for certain solvers. We define and rate different kinds of aspect ratio and present a new center-based partitioning method of calculating the initial distribution which implicitly optimizes this measure. During the adaptive simulation, the load balancer calculates a balancing flow using different versions of the diffusion algorithm and a variant of breadth first search. Elements to be migrated are chosen according to a cost function aiming at the optimization of subdomain shapes. Experimental results for Brambles preconditioner and comparisons to state-of-the-art load balancers show the benefits of the construction.

Archive | 1993

Problem Independent Distributed Simulated Annealing and its Applications

Ralf Diekmann; Reinhard Lüling; Jens Simon

Simulated annealing has proven to be a good technique for solving hard combinatorial optimization problems. Some attempts at speeding up annealing algorithms have been based on shared memory multiprocessor systems. Also parallelizations for certain problems on distributed memory multiprocessor systems are known.

Lecture Notes in Computer Science | 1997

Engineering Diffusive Load Balancing Algorithms Using Experiments

Ralf Diekmann; S. Muthukrishnan; Madhu V. Nayakkankuppam

We study a distributed load balancing problem on arbitrary graphs. First Order (FO) and Second Order (SO) schemes are popular local diffusive schedules for this problem. To use them, several parameters have to be chosen carefully. Determining the “optimal” parameters analytically is difficult, and on a practical level, despite the widespread use of these schemes, little is known on how relevant parameters must be set. We employ systematic experiments to engineer the choice of relevant parameters in first and second order schemes. We present a centralized polynomial time algorithm for choosing the “optimal” FO scheme based on semidefinite programming. Based on the empirical evidence from our implementation of this algorithm, we pose conjectures on the closed-form solution of optimal FO schemes for various graphs. We also present a heuristic algorithm to locally estimate relevant parameters in the FO and SO schemes; ourestimates are fairly accurate compared to those based on expensive global communication. Finally, we show that the FO and SO schemes that use approximate values rather than the optimal parameters, can be improved using a new iterative scheme that we introduce here; this scheme is of independent interest. The software we have developed for our implementations is available freely, and can serve as a platform for experimental research in this area. Our methods are being included in PadFEM, the Paderborn Finite Element Library [1].

ieee international conference on high performance computing data and analytics | 1999

Multilevel Mesh Partitioning for Optimizing Domain Shape

Chris Walshaw; M. Cross; Ralf Diekmann; Frank Schlimbach

Multilevel algorithms are a successful class of optimization techniques that address the mesh partitioning problem for mapping meshes onto parallel computers. They usually combine a graph contraction algorithm together with a lo-cal optimization method that refines the partition at each graph level. To date, these algorithms have been used al-most exclusively to minimize the cut-edge weight in the graph with the aim of minimizing the parallel communication overhead. However, it has been shown that for certain classes of problems, the convergence of the underlying solution algorithm is strongly influenced by the shape or aspect ratio of the subdomains. Therefore, in this paper, the authors modify the multilevel algorithms to optimize a cost function based on the aspect ratio. Several variants of the algorithms are tested and shown to provide excellent results.

international parallel and distributed processing symposium | 1994

Sorting large data sets on a massively parallel system

Ralf Diekmann; Jörn Gehring; Reinhard Lüling; Burkhard Monien; Markus Nubel; Rolf Wanka

This paper presents a performance study for many of todays popular parallel sorting algorithms. It is the first to present a comparative study on a large scale MIMD system. The machine, a Parsytec GCel, contains 1024 processors connected as a two-dimensional grid. To justify the experimental results, we develop a theoretical model to predict the performance in terms of communication and computation times. We get a very close relation between the experiments and the theoretical model as long as the edge congestion caused by the algorithms is predicted precisely. We compare: Bitonicsort, Shearsort, Gridsort, Samplesort, and Radixsort. Experiments were performed using random instances according to a well known benchmark problem. Results show that for the machine we used, Bitonicsort performs best for smaller numbers of keys per processor (<2048) and Samplesort outperforms all other methods for larger instances.<<ETX>>

Parallel Algorithms and Applications | 1996

COMBINING HELPFUL SETS AND PARALLEL SIMULATED ANNEALING FOR THE GRAPH-PARTITIONING PROBLEM∗

Ralf Diekmann; Reinhard Lüling; Burkhard Monien; Carsten Spräner

In this paper we present a new algorithm for the k-partitioning problem which achieves an improved solution uality compared to known heuristics. We apply the principle of so called “helpful sets”, which has shown to be very efficient for graph bisection, to the direct k-partitioning problem. The principle is extended in several ways. We introduce a new abstraction technique which shrinks the graph during runtime in a dynamic way leading to shorter computation times and improved solutions qualities. The use of stochastic methods provides further improvements in terms of solution quality. Additionally we present a parallel implementation of the new heuristic. The parallel algorithm delivers the same solution quality as the sequential one while providing reasonable parallel efficiency on MIMD-systems of moderate size. All results are verified by experiments for various graphs and processor numbers.

international workshop on parallel algorithms for irregularly structured problems | 1995

Parallel Decomposition of Unstructured FEM-Meshes

Ralf Diekmann; Derk Meyer; Burkhard Monien

We present a massively parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a massively parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition taking several cost functions into account. It first calculates the amount of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount and necessary communication and other measures which are important for the numerical solution method (like for example the aspect ratio of the resulting domains).

european conference on parallel processing | 1998

Aspect Radio for Mesh Partitioning

Ralf Diekmann; Robert Preis; Frank Schlimbach; Chris Walshaw

This paper deals with the measure of Aspect Ratio for mesh partitioning and gives hints why, for certain solvers, the Aspect Ratio of partitions plays an important role. We define and rate different kinds of Aspect Ratio, present a new center-based partitioning method which optimizes this measure implicitly and rate several existing partitioning methods and tools under the criterion of Aspect Ratio.

Concurrency and Computation: Practice and Experience | 1997

Decentralized remapping of data parallel applications in distributed memory multiprocessors

Cheng Zhong Xu; Francis C. M. Lau; Ralf Diekmann

SUMMARY In this paper we present a decentralized remapping method for data parallel applications on distributed memory multiprocessors. The method uses a generalized dimension exchange (GDE) algorithm periodically during the execution of an application to balance (remap) the system’s workload. We implemented this remapping method in parallel WaTor simulations and parallel image thinning applications, and found it to be effective in reducing the computation time. The average performance gain is about 20% in the WaTor simulation of a 256 256 ocean grid on 16 processors, and up to 8% in the thinning of a typical image of size 128 128 on eight processors. The performance gains due to remapping in the image thinning case are reasonably substantial given the fact that the application by its very nature does not necessarily favor remapping. We also implemented this remapping method, using up to 32 processors, for partitioning and re-partitioning of grids in computational fluid dynamics. It was found that the GDE-based parallel refinement policy, coupled with simple geometric strategies, produces partitions that are comparable in quality to those from the best serial algorithms. ©1997 John Wiley & Sons, Ltd.

Explore More