Is this you? Create Your Porfile

Hubert Ritzdorf

Center for Information Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hubert Ritzdorf is active.

Explore More

Publication

Featured researches published by Hubert Ritzdorf.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 1999

Flattening on the Fly: Efficient Handling of MPI Derived Datatypes

Jesper Larsson Träff; Rolf Hempel; Hubert Ritzdorf; Falk Zimmermann

The Message Passing Interface (MPI) incorporates a mechanism for describing structured, non-contiguous memory layouts for use as communication buffers in MPI communication functions. The rationale behind the derived datatype mechanism is to alleviate the user from tedious packing and unpacking of non-consecutive data into contiguous communication buffers. Furthermore, the mechanism makes it possible to improve performance by saving on internal buffering. Apparently, current MPI implementations entail considerable performance penalties when working with derived datatypes. We describe a new method called flattening on the fly for the efficient handling of derived datatypes in MPI. The method aims at exploiting regularities in the memory layout described by the datatype as far as possible. In addition it considerably reduces the overhead for parsing the datatype. Flattening on the fly has been implemented and evaluated on an NEC SX-4 vector supercomputer. On the SX-4 flattening on the fly performs significantly better than previous methods, resulting in performance comparable to what the user can in the best case achieve by packing and unpacking data manually. Also on a PC cluster the method gives worthwhile improvements in cases that are not handled well by the conventional implementation.

conference on high performance computing (supercomputing) | 2000

The Implementation of MPI-2 One-Sided Communication for the NEC SX-5

Jesper Larsson Träff; Hubert Ritzdorf; Rolf Hempel

We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote Memory Access) for the NEC SX-5 vector supercomputer. MPI/SX is a non-threaded implementation of the full MPI-2 standard. Essential features of the implementation are presented, including the synchronization mechanisms, the handling of communication windows in global shared and in process local memory, as well as the handling of MPI derived datatypes. In comparative benchmarks the data transfer operations for one-sided communication and point-to-point message passing show very similar performance, both when data reside in global shared and when in process local memory. Derived datatypes, which are of particular importance for applications using one-sided communications, impose only a modest overhead and can be used without any significant loss of performance. Thus, the MPI/SX programmer can freely choose either the message passing or the one-sided communication model, whichever is most convenient for the given application.

international parallel and distributed processing symposium | 2006

Collective operations in NEC's high-performance MPI libraries

Hubert Ritzdorf; Jesper Larsson Träff

We give an overview of the algorithms and implementations in the high-performance MPI libraries MPI/SX and MPI/ES of some of the most important collective operations of MPI (the message passing interface). The infrastructure of MPI/SX makes it easy to incorporate new algorithms and algorithms for common special cases (e.g. a single SX node, or a single MPI process per SX node). Algorithms that are among the best known are employed, and special hardware features of the SX architecture and internode crossbar switch (IXS) are exploited wherever possible. We discuss in more detail the implementation of MPLBarrier, MPLBcast, the MPI reduction collectives, MPI-Alltoall, and the gather/scatter collectives. Performance figures and comparisons to straightforward algorithms are given for a large SX-8 system, and for the Earth Simulator. The measurements show excellent absolute performance, and demonstrate the scalability of MPI/SX and MPI/ES to systems with large numbers of nodes

conference on high performance computing (supercomputing) | 2003

Fast Parallel Non-Contiguous File Access

Joachim Worringen; Jesper Larsson Träff; Hubert Ritzdorf

Many applications of parallel I/O perform non-contiguous file accesses: instead of accessing a single (large) block of data in a file, a number of (smaller) blocks of data scattered throughout the file needs to be accessed in each logical I/O operation. However, only few file system interfaces directly support this kind of non-contiguous file access. In contrast, the most commonly used parallel programming interface, MPI, incorporates a exible model of parallel I/O through its MPI-IO interface. With MPI-IO, arbitrary non-contiguous file accesses are supported in a uniform fashion by the use of derived MPI datatypes set up by the user to re ect the desired I/O pattern. Despite a considerable amount of recent work in this area, current MPI-IO implementations suffer from low performance of such non-contiguous accesses when compared to the performance of the storage system for contiguous accesses. In this paper we analyze an important bottleneck in the efficient handling of non-contiguous access patterns in current implementations of MPI-IO. We present a new technique, termed listless I/O, that can be incorporated into MPI-IO implementations like the well-known ROMIO implementation, and completely eliminates this bottleneck. We have implemented the technique in MPI/SX, the MPI implementation for the NEC SX-series of parallel vector computers. Results with a synthetic benchmark and an application kernel show that listless I/O is able to increase the bandwidth for non-contiguous file access by sometimes more than a factor of 500 when compared to the traditional approach.

Lecture Notes in Computer Science | 2003

Improving generic non-contiguous file access for MPI-IO

Joachim Worringen; Jesper Larsson Träff; Hubert Ritzdorf

We present a fundamental improvement of the generic techniques for non-contiguous file access in MPI-IO. The improvement consists in the replacement of the conventional data management algorithms based on a representation of the non-contiguous fileview as a list of 〈offset, length 〉 tuples. The improvement is termed listless i/o as it instead makes use of space- and time-efficient datatype handling functionality that is completely free of lists for processing non-contiguous data in the file or in memory. Listless i/o has been implemented for both independent and collective file accesses and improves access performance by increasing the data throughput between user buffers and file buffers. Additionally, it reduces the memory footprint of the process performing non-contiguous I/O. In this paper we give results for a synthetic benchmark on a PC cluster using different file systems. We demonstrate improvements in I/O bandwidth that exceed a factor of 10.

parallel computing | 1996

Real applications on the new parallel system NEC Cenju-3

Rolf Hempel; Robin Calkin; Reinhold Hess; Wolfgang Joppich; Cornelis W. Oosterlee; Hubert Ritzdorf; Peter Wypior; Wolfgang Ziegler; Nubohiko Koike; Takashi Washio; Udo Keller

The new massively parallel computer Cenju-3 of NEC has entered the market recently. NEC has set up a 64-processor machine at GMD. After the implementation of the PARMACS programming interface, large applications have been ported to the system. Early benchmark results show the performance of the Cenju-3 in a variety of application areas.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 1997

Implementation of MPI on NEC's SX-4 Multi-Node Architecture

Rolf Hempel; Hubert Ritzdorf; Falk Zimmermann

MPICH is a portable implementation of MPI, the international standard for message-passing programming. This paper describes the port of MPICH to the NEC SX-4 parallel vector-supercomputer. By fine-tuning the implementation to the underlying architecture, the message-passing performance could be greatly enhanced. Some of the performance optimizations which led to NECs current product-level MPI library are presented. Finally, there is an outlook on future activities which will include further optimizations and functional extensions.

Parallel Computational Fluid Dynamics 1995#R##N#Implementations and Results Using Parallel Computers | 1996

Benchmarking the FLOWer code on different parallel and vector machines

Cornelis W. Oosterlee; Hubert Ritzdorf; H.M. Bleecke; B. Eisfeld

Publisher Summary This chapter provides a comparison of wall clock times obtained with the parallel FLOWer code with the sequential code. It discusses two problems: a three-dimensional Euler flow around a NACA0012-wing and a flow around a wing-body configuration. Both grids are divided into several blocks, so that the performance on a vector computer can be compared with the performance on a parallel machine. Existing sequential production codes have been parallelized with a high-level communications library for industrial codes (CLIC). For three-dimensional test problems the performance of the computational fluid dynamics (CFD) code FLOWer on vector and parallel computers has been evaluated. A large wing-body Navier-Stokes test example with more than 6 million grid points is solved. The chapter presents the results on the multiple instruction multiple data (MIMD) computers: IBM SP2, NEC Cenju-3, and Intel Paragon. Wall clock times are also known for some vector machines, such as a Cray C90 with 12 processors, a Cray J90 with 8 processors, and an NEC SX3.

european conference on parallel processing | 1999

A PC Cluster with Application-Quality MPI

Maciej Golebiewski; Achim Basermann; Markus Baum; Rolf Hempel; Hubert Ritzdorf; Jesper Larsson Träff

This paper presents an implementation of MPI on a cluster of Linux-based, dual-processor PCs interconnected by a Myricom high speed network. The implementation uses MPICH for the high level protocol and FM/HPVM for the basic communications layer. It allows multiple processes and multiple users on the same PC, and passes an extensive test suite. Execution times for several application codes, ranging from simple communication kernels to large Fortran codes, show good performance. The result is a high-performance MPI interface with multi-user service for this PC cluster.

Archive | 1995

Experiences with a parallel multiblock multigrid solution technique for the Euler equations

Cornelis W. Oosterlee; Hubert Ritzdorf; A. Schüller; B. Steckel

The parallel solution of 2D steady compressible Euler equations with a multigrid method is investigated. The parallelization technique used is the grid partitioning strategy. The influence of splitting into many blocks on multigrid convergence rates is reduced with an extra interior boundary relaxation and an extra update of the overlap region. The finite volume discretization of the equations is based on the Godunov upwind approach, with Osher’s flux difference splitting for the convective terms. Second order accuracy is obtained with defect correction. Solution times of the multigrid algorithms are presented for several parallel MIMD computers.

Explore More