Katharina Benkert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katharina Benkert is active.

Explore More

Publication

Featured researches published by Katharina Benkert.

international conference on conceptual structures | 2007

An Efficient Implementation of the Thomas-Algorithm for Block Penta-diagonal Systems on Vector Computers

Katharina Benkert; Rudolf Fischer

In simulations of supernovae, linear systems of equations with a block penta-diagonal matrix possessing small, dense matrix blocks occur. For an efficient solution, a compact multiplication scheme based on a restructured version of the Thomas algorithm and specifically adapted routines for LU factorization as well as forward and backward substitution are presented. On a NEC SX-8 vector system, runtime could be decreased between 35% and 54% for block sizes varying from 20 to 85 compared to the original code with BLAS and LAPACK routines.

international parallel and distributed processing symposium | 2008

Outlier detection in performance data of parallel applications

Katharina Benkert; Edgar Gabriel; Michael M. Resch

When an adaptive software component is employed to select the best-performing implementation for a communication operation at runtime, the correctness of the decision taken strongly depends on detecting and removing outliers in the data used for the comparison. This automatic decision is greatly complicated by the fact that the types and quantities of outliers depend on the network interconnect and the nodes assigned to the job by the batch scheduler. This paper evaluates four different statistical methods used for handling outliers, namely a standard interquartile range method, a heuristic derived from the trimmed mean value, cluster analysis and a method using robust statistics. Using performance data from the Abstract Data and Communication Library (ADCL) we evaluate the correctness of the decisions made with each statistical approach over three fundamentally different network interconnects, namely a highly reliable InfiniBand network, a gigabit Ethernet network having a larger variance in the performance, and a hierarchical gigabit Ethernet network.

ieee international conference on high performance computing data and analytics | 2012

Highly Efficient and Scalable Software for the Simulation of Turbulent Flows in Complex Geometries

Daniel F. Harlacher; Sabine Roller; Florian Hindenlang; Claus-Dieter Munz; Tim Kraus; Martin Fischer; Koen Geurts; Matthias Meinke; Tobias Klühspies; Volker Metsch; Katharina Benkert

This paper investigates the efficiency of simulations for compressible turbulent flows with noise generation in complex geometries. It analyzes two different approaches and their suitability with respect to quality as well as turn around times required in industrial DoE processes. One approach makes use of a high order discontinuous Galerkin scheme. The efficiency of high order schemes on coarser meshes is compared to lower order schemes on finer meshes. The second approach is a 2nd order Finite Volume scheme, which employs a zonal coupling of LES and RANS to enhance efficiency in turbulence simulation. The schemes are applied to three industrial test cases which are described. Difficulties on HPC systems, especially load-balancing, MPI and IO, are pointed out and solutions are presented.

Journal of Algorithms & Computational Technology | 2008

The Abstract Data and Communication Library

Edgar Gabriel; Saber Feki; Katharina Benkert; Mohamad Chaarawi

Medical Doctors are increasingly incorporating simulation tools into their day-to-day work in hospitals and medical centers. The software packages used in these environments face tremendous reliability requirements and have to deal with restrictions with respect to the turn around time of a simulation, in order for the results to be useful. However, reaching performance goals for these applications is complicated by the wide range of hardware and software environments used in hospitals, making hardware dependent optimizations difficult. The Abstract Data and Communication Library (ADCL) helps to meet performance requirements by optimizing the communication operations for large scale simulations at runtime, adapting to the current hardware and software environment. ADCL provides for each communication pattern a large number of implementations and incorporates a runtime selection logic in order to choose the implementation leading to the highest performance of the application on the current platform.

Archive | 2006

Atomistic Simulations on Scalar and Vector Computers

Franz Gähler; Katharina Benkert

Large scale atomistic simulations are feasible only with classical effective potentials. Nevertheless, even for classical simulations some ab-initio computations are often necessary, e.g. for the development of potentials or the validation of the results. Ab-initio and classical simulations use rather different algorithms and make different requirements on the computer hardware. We present performance comparisons for the DFT code VASP and our classical molecular dynamics code IMD on different computer architectures, including both clusters of microprocessors and vector computers. VASP performs excellently on vector machines, whereas IMD is better suited for large clusters of microprocessors. We also report on our efforts to make IMD perform well even on vector machines.

Archive | 2010

Empirical Optimization of Collective Communications with ADCL

Katharina Benkert; Edgar Gabriel

The Abstract Data and Communication Library (ADCL) allows for auto-tuning of communication operations for parallel applications. This paper presents a new set of interfaces introduced in ADCL in order to support most MPI collective communication operations, and thus enable the optimization of one of the most widely used features of the MPI specification. The paper discusses semantic as well as implementation aspects, and evaluates the new interfaces using the NPB FT benchmark on a large selection of platforms and MPI libraries.

ieee international conference on high performance computing data and analytics | 2008

Teraflops Sustained Performance With Real World Applications

Sunil R. Tiyyagura; Panagiotis Adamidis; Rolf Rabenseifner; Peter Lammers; Stefan Borowski; F. Lippold; F. Svensson; Olaf Marxen; Stefan Haberhauer; Ari P. Seitsonen; J. Furthmüller; Katharina Benkert; Martin Galle; Thomas Bönisch; Uwe Küster; Michael M. Resch

This paper provides a comprehensive performance evaluation of the NEC SX-8 system at the High Performance Computing Center Stuttgart which has been in operation since July 2005. It provides a description of the installed hardware together with its performance for some synthetic benchmarks and five real world applications. All the applications achieved sustained Tflop/s performance. Additionally, the measurements presented show the ability of the system to solve not only large problems with a very high performance, but also medium sized problems with high efficiency using a large number of processors.

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface | 2010

Measuring execution times of collective communications in an empirical optimization framework

Katharina Benkert; Edgar Gabriel

An essential part of an empirical optimization library are the timing procedures with which the performance of different codelets is determined. In this paper, we present for four different timing methods to optimize collective MPI communications and compare their accuracy for the FFT NAS Parallel Benchmarks on a variety of systems with different MPI implementations. We find that timing larger code portions with infrequent synchronizations performs well on all systems.

Archive | 2007

Molecular Dynamics on NEC Vector Systems

Katharina Benkert; Franz Gähler

Molecular dynamics codes are widely used on scalar architectures where they exhibit good performance and scalability. For vector architectures, special al-gorithms like Layered Link Cell and Grid Search have been developed. Nevertheless, the performance measured on the NEC SX-8 remains unsatisfactory. The reasons for these performance deficits are studied in this paper.

TAEBC-2009 | 2008