Gudula Rünger
Chemnitz University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gudula Rünger.
Journal of Systems Architecture | 1999
Thomas Rauber; Gudula Rünger
Algorithms from scientific computing often exhibit a two-level parallelism based on potential method parallelism and potential system parallelism. We consider the parallel implementation of those algorithms on distributed memory machines. The two-level potential parallelism of algorithms is expressed in a specification consisting of an upper level hierarchy of multiprocessor tasks each of which has an internal structure of uniprocessor tasks. To achieve an optimal parallel execution time, the parallel execution of such a program requires an optimal scheduling of the multiprocessor tasks and an appropriate treatment of uniprocessor tasks. For an important subclass of structured method parallelism we present a scheduling methodology which takes data redistributions between multiprocessor tasks into account. As costs we use realistic parallel runtimes. The scheduling methodology is designed for an integration into a parallel compiler tool. We illustrate the multitask scheduling by several examples from numerical analysis.
IEEE Transactions on Software Engineering | 2000
Thomas Rauber; Gudula Rünger
The construction of efficient parallel programs usually requires expert knowledge in the application area and a deep insight into the architecture of a specific parallel machine. Often, the resulting performance is not portable, i.e., a program that is efficient on one machine is not necessarily efficient on another machine with a different architecture. Transformation systems provide a more flexible solution. They start with a specification of the application problem and allow the generation of efficient programs for different parallel machines. The programmer has to give an exact specification of the algorithm expressing the inherent degree of parallelism and is released from the low-level details of the architecture. We propose such a transformation system with an emphasis on the exploitation of the data parallelism combined with a hierarchically organized structure of task parallelism. Starting with a specification of the maximum degree of task and data parallelism, the transformations generate a specification of a parallel program for a specific parallel machine. The transformations are based on a cost model and are applied in a predefined order, fixing the most important design decisions like the scheduling of independent multitask activations, data distributions, pipelining of tasks, and assignment of processors to task activations. We demonstrate the usefulness of the approach with examples from scientific computing.
Journal of Parallel and Distributed Computing | 2005
Thomas Rauber; Gudula Rünger
The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. The result is a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring.
conference on high performance computing (supercomputing) | 2002
Thomas Rauber; Gudula Rünger
The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. This results in a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring.
international conference on supercomputing | 2004
Sascha Hunold; Thomas Rauber; Gudula Rünger
Matrix-matrix multiplication is one of the core computations in many algorithms from scientific computing or numerical analysis and many efficient realizations have been invented over the years, including many parallel ones. The current trend to use clusters of PCs or SMPs for scientific computing suggests to revisit matrix-matrix multiplication and investigate efficiency and scalability of different versions on clusters. In this paper we present parallel algorithms for matrix-matrix multiplication which are built up from several algorithms in a multilevel structure. Each level is associated with a hierarchical partition of the set of available processors into disjoint subsets so that deeper levels of the algorithm employ smaller groups of processors in parallel. We perform runtime experiments on several parallel platforms and show that multilevel algorithms can lead to significant performance gains compared with state-of-the-art methods.
international parallel and distributed processing symposium | 2004
Matthias Kühnemann; Thomas Rauber; Gudula Rünger
Summary form only given. Performance prediction is necessary and crucial in order to deal with multidimensional performance effects on parallel systems. The increasing use of parallel supercomputers and cluster systems to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. In this paper, we describe a compiler tool to automate performance prediction for execution times of parallel programs by runtime formulas in closed form. For an arbitrary parallel MPI source program the tool generates a corresponding runtime function modeling the CPU execution time and the message passing overhead. The environment is proposed to support the development process and the performance engineering activities that accompany the whole software life cycle. The performance prediction tool is shown to be effective in analyzing a representative application for varying problem sizes on several platforms using different numbers of processors.
Microprocessing and Microprogramming | 1996
Thomas Rauber; Gudula Rünger
Abstract The numerical solution of differential equations is an important problem in the natural sciences and engineering. But the computational effort to find a solution with the desired accuracy is usually quite large. This suggests the use of powerful parallel machines which often use a distributed memory organization. In this article, we present a parallel programming methodology to derive structured parallel implementations of numerical methods that exhibit two levels of potential parallelism, a coarse-grain method parallelism and a medium grain parallelism on data or systems. The derivation process is subdivided into three stages: The first stage identifies the potential for parallelism in the numerical method, the second stage fixes the implementation decisions for a parallel program and the third stage derives the parallel implementation for a specific parallel machine. The derivation process is supported by a group-SPMD computational model that allows the prediction of runtimes for a specific parallel machine. This enables the programmer to test different alternatives and to implement only the most promising one. We give several examples for the derivation of parallel implementations and of the performance prediction. Experiments on an Intel iPSC/860 confirm the accuracy of the runtime predictions. The parallel programming methodology separates the software issues from the architectural details, enables the design of well-structured, reusable and portable software and supplies a formal basis for automatic support.
acm symposium on applied computing | 1999
Thomas Rauber; Gudula Rünger
We present a coordination model to derive efficient implementations using mixed task and data parallelism. The model provides a specification language in which the programmer defines the available degree of parallelism and a coordination language in which the programmer determines how the potential parallelism is exploited for a specific implementation. Specification programs depend only on the algorithm whereas coordination programs may be different for different target machines in order to obtain the best performance. The transformation of a specification program into a coordination program is performed in well-defined steps where each step selects a specific implementation detail. Therefore, the transformation can be automated, thus guaranteeing a correct target program. We demonstrate the usefulness of the model by applying it to solution methods for differential equations.
The Journal of Supercomputing | 2014
Thomas Rauber; Gudula Rünger; Michael Schwind; Haibin Xu; Simon Melzner
The energy consumption is an important aspect of today’s processors and a large variety of research approaches deal with reducing the energy consumption for specific application codes on different platforms under certain constraints. These research approaches are based on energy information acquired by very different means, such as hardware settings with power-meters, software methods with hardware counters available for more recent CPUs, or simulations based on theoretical models. In this article, all of these energy acquisition methods are investigated and compared. As application programs, we consider the SPEC CPU2006 integer and floating-point benchmark collections, which represent a large variety of applications from different areas. The investigations are done for single multicore CPUs with the goal to get more insight into their energy consumption behavior. An experimental evaluation is performed on three recent processor types with dynamic voltage–frequency scaling. The article compares the measured energy and the energy provided by hardware counters with the energy predicted by simulation models. The comparison shows that the simulation models are able to capture the energy consumption quite accurately.
international conference on parallel processing | 2008
Jörg Dümmler; Thomas Rauber; Gudula Rünger
In this paper, we explore the use of hierarchically structured multiprocessor tasks (M-tasks) for programming multi-core cluster systems.These systems often have hierarchically structured interconnection networks combining different computing resources, starting with the interconnect within multi-core processors up to the interconnection network combining nodes of the cluster or supercomputer. M-task programs can support the effective use of the computing resources by adapting the task structure of the program to the hierarchical organization of the cluster system and by exploiting the available data parallelism within the M-tasks. In particular, we consider different mapping algorithms for M-tasks and investigate the resulting efficiency and scalability. We present experimental results for different application programs and different multi-core systems.