Is this you? Create Your Porfile

Manfred Kunde

Technische Universität München

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Manfred Kunde is active.

Explore More

Publication

Featured researches published by Manfred Kunde.

european symposium on algorithms | 1993

Block Gossiping on Grids and Tori: Deterministic Sorting and Routing Match the Bisection Bound

Manfred Kunde

Deterministic sorting and routing on r-dimensional n × ... × n grids of processors is studied. For h−h problems, h≥4r, where each processor initially and finally contains at most h elements, we show that the general h−h sorting as well as h−h routing problem can be solved within hn/2+o(hr2n). That is, the bisection bound is asymptotically tight for deterministic h−h sorting and h−h routing. On an r-dimensional torus, a grid with wrap-arounds, the number of transfer steps is hn/4+o(hrn), again matching the corresponding bisection bound. This shows that inspite of the fact that routing problems contain more information at the beginning than the sorting problems there is no substantial difference between them on grids and tori. The results are possible by a new method where subsets of packets and information are uniformly distributed to the whole grid.

symposium on theoretical aspects of computer science | 1987

Optimal Sorting on Multi-Dimensionally Mesh-Connected Computers

Manfred Kunde

An algorithm is presented sorting N=n1n2...nr, r≧2, elements on an n1 × n2 × ... × nr mesh-connected array of processors within 2(n1+...+nr−1)+nr+0(n11−e+...+nr1−e), ɛ>0, data interchange steps. Hence this algorithm asymptotically matches the quite recently given lower bound for r-dimensional meshes. The asymptotically optimal lower bound of (2r/21/r) N1/r interchange steps can only be obtained on r-dimensional meshes withaspect ratio ni : nr=1 : 2 for all i=1,...,r−1. Moreover, for meshes with wraparound connections the slightly altered algorithm only need 1.5(n1+...+nr−1)+nr+0(n11−e+...+nr1−e) data interchange steps, which asymptotically is significantly smaller than the lower bound for sorting on meshes without wrap-arounds.

parallel computing | 1988

The instruction systolic array and its relation to other models of parallel computers

Manfred Kunde; Hans-Werner Lang; Manfred Schimmler; Hartmut Schmeck; Heiko Schröder

Abstract In this paper we investigate the relationships between three different models of parallel computers based on mesh-connected arrays: the processor array (PA), which is an MIMD-array of independent processors, the instruction broadcasting array (IBA), where the instructions are broadcast to all the processors of a column and executed according to selector information which is broadcast to all the processors of a row, and the instruction systolic array (ISA), where the instructions are pumped through the array row by row and combined with selector information which is pumped through the array column by column. For every two of these models we determine tight bounds on the worst-case delay introduced by a transformation of a program on one model into an equivalent program on the other. The results show that the ISA concept combines the advantages of standard systolic arrays with those of the MIMD concept. Since in addition the ISA architecture has smaller area requirements than a corresponding systolic array or MIMD machine it is strong practical relevance.

Journal of Parallel and Distributed Computing | 1991

( k — k ) Routing on multidimensional mesh-connected arrays

Manfred Kunde; Thomas Tensi

In this paper the authors study the problem of routing packets on an r-dimensional mesh-connected array of processors. The focus of this paper is on routing with each processor containing exactly k packets, k {ge} 2, initially and finally (so-called k-k routing). For two- dimensional n {times} n grids the number of transport steps is at most 5/4 kn + O(kn/f(n)) with a buffer size of O(kf(n)). In the special case of a sequence of k permutation routing problems this step count can be reduced to kn + O(kn/f(n)). For an r-dimensional grid, r {ge} 3, with side length n the same technique yields an algorithm with step count (r {minus} 1)(1 + 1/r{sup 2})kn + O(n/f(n){sup 1/(r{minus}1)}) and a buffer rk {center dot} f(n). For sequences of permutation routing problems this drops to (k/r) (2r {minus} 2)n + O(kn/f(n){sup 1/(r{minus}1)}) and a buffer size of O(kf(n)). Furthermore it is shown that splitting large packets into smaller ones has benefits for permutation routing problems. For grids with wraparound connections these step counts and times generally can be reduced by one-half.

symposium on theoretical aspects of computer science | 1994

Faster Sorting and Routing on Grids with Diagonals

Manfred Kunde; Rolf Niedermeier; Peter Rossmanith

We study routing and sorting on grids with diagonal connections. We show that for so-called h-h problems faster solutions can be obtained than on comparable grids without diagonals. In most of the cases the number of transport steps for the new algorithms are less than half the on principle smallest number given by the bisection bound for grids without diagonals.

Lecture Notes in Computer Science | 1989

Packet Routing on Grids of Processors

Manfred Kunde

The problem of routing packets onn1×...×nr mesh-connected arrays or grids of processors is studied. The focus of this paper is on permutation routing where each processor contains exactly one packet initially and finally. A slight modification of permutation routing called balanced routing is also discussed. For two-dimensional grids a determinisitc routing algorithm is given forn×n meshes where each processor has a buffer of size f(n) < n. It needs 2n + O(n/f(n)) steps on grids without wrap-arounds. Hence, it is asymptoticaliy nearly optimal, and as good as randomized algorithms routing data only with high probability. Furthermore, it is demonstrated that onr-dimensional cubes of processors permutation routing can be performed asymptotically by (2r−2)n steps, which is faster than the running times of so-far known randomized algorithms and of deterministic algorithms.

acm symposium on parallel algorithms and architectures | 1991

Balanced routing: towards the distance bound on grids

Manfred Kunde

Manfred Kunde * Institut f, Informatik, TU Munich, Arcisstr. 21, D-8000 Munich 2, Germany The problem of packet routing on an r-dimensional mesh-connected array or grid of processors with sidelength n. is studied. Each processor is able to store rf(n) packets, ~(n) < nl–lir. The new class of balanced routing problems is introduced which includes such fundamental problems as partial h – h routing. On 3-dimensional n x n x n grids (without wrap-arounds) partial permutation problems and so-called (1, ~(n))balanced problems can be solved by a deterministic routing algorithm needing only 3.333n + 0(n/~(n)l’2) steps which is asymptotically only 11.11 percent larger than the distance bound of 3n – 3 steps. The result is better than those of so far known randomized algorithms, routing data only with high probability. The new algorithm also beats previous deterministic routing algorithms. For arbitrary r, r z 3, permutation routing on r-dimensional cubes can be performed in asymptotically (27—3+ l/r) n steps which again is faster than the running times of so far known algorithms, both randomized and deterministic ones. A further improvement leads to an algorithm with approximately (r+ (r – 2)(1/r)litT-21)n + O(n/~(n)llt”-lJ) transport steps. For a constant buffer size of 0((r2/c)r–1) packets, c > 0, a number of (r-+ (r-2) (1/r) lit” -2 J+e)n steps can be achieved. The number of steps is reduced to the half if the algorithms are adapted to tori of processors, i.e. grids with wrap-around connections. Furthermore, algorithms for the general h — h routing problem are presented which also needs a smaller number of steps than previous algorithms. It is shown that these problems can be solved in O((h + r)n) instead of O(hrn)

symposium on theoretical aspects of computer science | 1995

Optimal average case sorting on arrays

Manfred Kunde; Rolf Niedermeier; Klaus Reinhardt; Peter Rossmanith

We present algorithms for sorting and routing on two-dimensional mesh-connected parallel architectures that are optimal on average. If one processor has many packets then we asymptotically halve the up to now best running times. For a load of one optimal algorithms are known for the mesh. We improve this to a load of eight without increasing the running time. For tori no optimal algorithms were known even for a load of one. Our algorithm is optimal for every load. Other architectures we consider include meshes with diagonals and reconfigurable meshes. Furthermore, the method applies to meshes of arbitrary higher dimensions and also enables optimal solutions for the routing problem.

international conference on parallel processing | 1986

A general approach to sorting on 3-dimensionally mesh-connected arrays

Manfred Kunde

A general method for generating 3-dimensional sorting algorithms by using 2-dimensional algorithms is presented. The main advantage is that from a large class of sorting algorithms suitable for mesh-connected rectangles of processors we efficiently obtain sorting algorithms suitable for 3-dimensional meshes. It is shown that by using the s2-way merge sort of Thompson and Kung sorting n3 elements can be performed on an n × n × n cube with 12n+0(n2/3 log n) data interchange steps. Further improvements lead to an algorithm for an n/2 × n × 2n mesh sorting n3 items within 10.5n+O (n2/3log n) interchange steps. By a generalization of the method to r-dimensional cubes one can obtain algorithms sorting nΓ elements with 0(r3n) interchange steps.

Journal of Algorithms | 1988

On a special case of uniform processor scheduling

Manfred Kunde; Michael A. Langston; Jin-Ming Liu

Abstract We investigate the problem of nonpreemptively assigning a set of independent tasks to a system of uniform processors in an effort to minimize the overall finish time. Our attention is restricted to the special case in which all processors have the same speed, except for one which may be faster. This models the system configuration in which there is one fast central processor and a collection of slower peripheral processors. We analyze a variation of the MULTIFIT algorithm derived from bin-packing and prove that it possesses a tight worst-case performance bound of ( 17 + 1) 4 , which is approximately 1.28.

Explore More