Rudolf Berrendorf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rudolf Berrendorf is active.

Explore More

Publication

Featured researches published by Rudolf Berrendorf.

Concurrency and Computation: Practice and Experience | 1992

Evaluating the basic performance of the Intel iPSC/860 parallel computer

Rudolf Berrendorf; Jukka Helin

We evaluate the basic performance of the Intel iPSC/860 computer, which can have up to 128 Intel i860-based nodes connected together with a hypercube network topology. After giving a brief overview of the system, the properties and bottlenecks of the hardware architecture and software environment are discussed. Basic memory, scalar and vector performance of a single node is evaluated, and the communication performance and the overlap of computation and communication are analysed.

Concurrency and Computation: Practice and Experience | 2000

Performance characteristics for OpenMP constructs on different parallel computer architectures

Rudolf Berrendorf; Guido Nieken

OpenMP is emerging as a quasi-standard for shared memory parallel programming on small SMP-systems. To serve as a common programming interface in shared memory parallel programming, scalability to a larger number of nodes and support for different shared memory architectures has to be proven. In this paper we investigate how well the basic constructs of OpenMP are implemented on different parallel computer architectures.

international conference on supercomputing | 1991

Analyzing the performance of message passing MIMD Hypercubes: a study with the Intel iPSC/860

Jukka Helin; Rudolf Berrendorf

We describe lmw to evdlmte and analyze tile perfol’lllallce ofclistribtltecl memory NIIMDllypercubes and S11OW tile process in practice on tile Intel iPSC/860 llypercube. Tile perfornlance analysis has been cliviclecl into four levels: tile conlputing performance of a single nocIe. tile communication performance between tile nocles, tile performance in overlapping conlputation with collllllllllicatioll, mlcl tile I>erforlll+lllcc? illcolllplete l>arallelizecl application programs. G%3ndltsions about tile perfornlallce of t,lle iPSC!/8G0 are drawn.

international conference on parallel processing | 2006

Flexible i/o support for reconfigurable grid environments

Marc-André Hermanns; Rudolf Berrendorf; Marcel Birkner; Jan Seidel

With growing computational power of current supercomputers, scientific computing applications can work on larger problems. The corresponding increase in dataset size is often correlated to an increase in needed storage for the results. Current storage area networks (sans) balance i/o load on multiple disks using high speed networks, but are integrated on the operating system level, demanding administrative intervention if the usage topology changes. While this is practical for single sites or fairly static grid environments, it is hard to extend to a user defined per-job basis. Reconfigurable grid environments, where computing and storage resources are coupled on a per-job basis, need a more flexible approach for parallel i/o on remote locations. This paper gives a detailed overview of the abilities of the transparent remote access provided by tunnelfs, a part of the viola parallel i/o project. We show how tunnelfs manages flexible and transparent access to remote i/o resources in a reconfigurable grid environment, supporting the definition of the amount and location of persistent storage services on a per-job basis.

Lecture Notes in Computer Science | 2006

High-Bandwidth remote parallel i/o with the distributed memory filesystem MEMFS

Jan Seidel; Rudolf Berrendorf; Marcel Birkner; Marc-André Hermanns

The enormous advance in computational power of supercomputers enables scientific applications to process problems of increasing size. This is often correlated with an increasing amount of data stored in (parallel) filesystems. As the increase in bandwith of common disk based I/O devices can not keep up with the evolution of computational power, the access to this data becomes the bottleneck in many applications. memfs takes the approach to distribute I/O data among multiple dedicated remote servers on a user-level basis. It stores files in the accumulated main memory of these I/O nodes and is able to deliver this data with high bandwidth. We describe how memfs manages a memory based distributed filesystem, how it stores data among the participating I/O servers and how it assigns servers to application clients. Results are given for a usage in a grid project with high-bandwidth wan connections.

parallel processing and applied mathematics | 2005

Remote parallel i/o in grid environments

Rudolf Berrendorf; Marc-André Hermanns; Jan Seidel

Although processor speed and memory bandwidth as well as capacities of persistent storage devices evolved rapidly in the last years, the bandwidth between memory and persistent storage devices could not match that pace. As current scientific applications tend to process an enormous amount of data at runtime, the access to a local disk might become the main performance bottleneck. The communication infrastructure for local area networks has also evolved rapidly, thus modern filesystems for supercomputing are using storage area network solutions to distribute the load of application i/o to several special purpose i/o nodes. These san solutions, however, are often bound to a specific organizational structure, such as different locations of the same company or several institutes of a university. This often implies a common user base and accounting information at each site. In a highly variant grid environment these demands might be hard to meet. This paper describes the definition of two adio devices for romio, a publicly available mpii/o implementation, to provide transparent access to remote parallel i/o and additionally access to remote i/o on files in the memory of a remote server. The architecture of these devices allows for a definition of remote servers on a per job basis, and can be configured by the user before runtime.

Programming Models for Massively Parallel Computers | 1995

Compiling SVM-Fortran for the Intel Paragon XP/S

Rudolf Berrendorf; M. Gerndt

SVM-Fortran is a language designed to program highly parallel systems with a global address space. A compiler for SVM-Fortran is described which generates code for parallel machines; our current target machine is the Intel Paragon XP/S with an SVM-extension called ASVM. Performance numbers are given for applications and compared to results obtained with corresponding HPF-versions.

joint international conference on vector and parallel processing parallel processing | 1994

A Comparison of Shared Virtual Memory and Message Passing Programming Techniques Based on a Finite Element Application

Rudolf Berrendorf; Michael Gerndt; Zakaria Lahjomri; Thierry Priol

This paper describes the methods used and experiences made with implementing a finite element application on three different parallel computers with either message passing or shared virtual memory as the programming model. Designing a parallel finite element application using message-passing requires to find a data domain decomposition to map data into the local memory of the processors. Since data accesses may be very irregular, communication patterns are unknown prior to the parallel execution and thus makes the parallelization a difficult task. We argue that the use of a shared virtual memory greatly simplifies the parallelization step. It is shown experimentally on an hypercube iPSC/2 that the use of the KOAN/Fortran-S programming environment based on a shared virtual memory allows to port quickly and easily a sequential application without a significant degradation in performance compared to the message passing version. Results for recent parallel architectures such as the Paragon XP/S for message-passing and the KSR1 for shared virtual memory are presented, too.

european conference on parallel processing | 2017

New Efficient General Sparse Matrix Formats for Parallel SpMV Operations

Jan Philipp Ecker; Rudolf Berrendorf; Florian Mannuss

The Sparse Matrix-Vector Multiplication (SpMV) is an important building block in High Performance Computing. Performance improvements for the SpMV are often gained by the development of new optimized sparse matrix formats either by utilizing special sparsity patterns of a matrix or by taking bottlenecks of a hardware architecture into account. In this work a requirements analysis is done for sparse matrix formats with an emphasis on the parallel SpMV for large general sparse matrices. Based on these requirements, three new sparse matrix formats were developed, each combining several optimization techniques and addressing different optimization goals/hardware architectures. The CSR5 Bit Compressed (CSR5BC) format is an extension to the existing CSR5 format and optimized for GPUs. The other two formats, Hybrid Compressed Slice Storage (HCSS) and Local Group Compressed Sparse Row (LGCSR), are new formats optimized for multi-core/-processor architectures including the Xeon Phi Knights Landing. Results show that all three storage formats deliver good parallel SpMV performance on their target architectures over a large set of test matrices compared to other well performing formats in vendor and research libraries.

european conference on parallel processing | 2016

Performance Prediction and Ranking of SpMV Kernels on GPU Architectures

Christoph Lehnert; Rudolf Berrendorf; Jan Philipp Ecker; Florian Mannuss

Predicting the runtime of a sparse matrix-vector multiplication SpMV for different sparse matrix formats and thread mappings allows the dynamic selection of the most appropriate matrix format and thread mapping for a given matrix. This paper introduces two new generally applicable performance models for SpMV --- for linear and non-linear relationships --- based on machine learning techniques. This approach supersedes the common manual development of an explicit performance model for a new architecture or for a new format based on empirical data. The two new models are compared to an existing explicit performance model on different GPUs. Results show that the quality of performance prediction results, the ranking of the alternatives, and the adaptability to other formats/architectures of the two machine learning techniques is better than that of the explicit performance model.

Explore More