Zoltán Szebenyi
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zoltán Szebenyi.
parallel computing | 2010
Markus Geimer; Pavel Saviankou; Alexandre Strube; Zoltán Szebenyi; Felix Wolf; Brian J. N. Wylie
Scalasca is an open-source toolset that can be used to analyze the performance behavior of parallel applications and to identify opportunities for optimization. Target applications include simulation codes from science and engineering based on the parallel programming interfaces MPI and/or OpenMP. Scalasca, which has been specifically designed for use on large-scale machines such as IBM Blue Gene and Cray XT, integrates runtime summaries suitable to obtain a performance overview with in-depth studies of concurrent behavior via event tracing. Although Scalasca was already successfully used with codes running with 294,912 cores on a 72-rack Blue Gene/P system, the current software design shows scalability limitations that adversely affect user experience and that will present a serious obstacle on the way to mastering larger scales in the future. In this paper, we outline how to address the two most important ones, namely the unification of local identifiers at measurement finalization as well as collating and displaying analysis reports.
ieee international conference on high performance computing data and analytics | 2009
Zoltán Szebenyi; Felix Wolf; Brian J. N. Wylie
The performance behavior of parallel simulations often changes considerably as the simulation progresses - with potentially process-dependent variations of temporal patterns. While call-path profiling is an established method of linking a performance problem to the context in which it occurs, call paths reveal only little information about the temporal evolution of performance phenomena. However, generating call-path profiles separately for thousands of iterations may exceed available buffer space - especially when the call tree is large and more than one metric is collected. In this paper, we present a runtime approach for the semantic compression of call-path profiles based on incremental clustering of a series of single-iteration profiles that scales in terms of the number of iterations without sacrificing important performance details. Our approach offers low runtime overhead by using only a condensed version of the profile data when calculating distances and accounts for process-dependent variations by making all clustering decisions locally.
Parallel Processing Letters | 2010
Brian J. N. Wylie; Markus Geimer; Bernd Mohr; David Böhme; Zoltán Szebenyi; Felix Wolf
Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer systems relying on applications being able to exploit very large configurations of processor cores, and associated analysis tools must also scale commensurately to isolate and quantify performance issues that manifest at the largest scales. In studying the scalability of the Scalasca performance analysis toolset to several hundred thousand MPI processes on XT5 and BG/P systems, we investigated a progressive execution performance deterioration of the well-known ASCI Sweep3D compact application. Scalasca runtime summarization analysis quantified MPI communication time that correlated with computational imbalance, and automated trace analysis confirmed growing amounts of MPI waiting times. Further instrumentation, measurement and analyses pinpointed a conditional section of highly imbalanced computation which amplified waiting times inherent in the associated wavefront communication that seriously degraded overall execution efficiency at very large scales. By employing effective data collation, management and graphical presentation, in a portable and straightforward to use toolset, Scalasca was thereby able to demonstrate performance measurements and analyses with 294,912 processes.
Parallel Tools Workshop | 2010
Markus Geimer; Felix Wolf; Brian J. N. Wylie; Daniel Becker; David Böhme; Wolfgang Frings; Marc-André Hermanns; Bernd Mohr; Zoltán Szebenyi
The number of processor cores on modern supercomputers is increasing from generation to generation, and as a consequence HPC applications are required to harness much higher degrees of parallelism to satisfy their growing demand for computing power. However, writing code that runs efficiently on large processor configurations remains a significant challenge. The situation is exacerbated by the rising number of cores imposing scalability demands not only on applications but also on the software tools needed for their development.
european conference on parallel processing | 2009
Zoltán Szebenyi; Brian J. N. Wylie; Felix Wolf
PEPC (Pretty Efficient Parallel Coulomb-solver) is a complex HPC application developed at the Julich Supercomputing Centre, scaling to thousands of processors. This is a case study of challenges faced when applying the Scalasca parallel performance analysis toolset to this intricate example at relatively high processor counts. The Scalasca version used in this study has been extended to distinguish iteration/timestep phases to provide a better view of the underlying mechanisms of the application execution. The added value of the additional analyses and presentations is then assessed to determine requirements for possible future integration within Scalasca.
ieee international symposium on parallel distributed processing workshops and phd forum | 2010
Brian J. N. Wylie; David Böhme; Bernd Mohr; Zoltán Szebenyi; Felix Wolf
In studying the scalability of the Scalasca performance analysis toolset to several hundred thousand MPI processes on IBM Blue Gene/P, we investigated a progressive execution performance deterioration of the well-known ASCI Sweep3D compact application. Scalasca runtime summarization analysis quantified MPI communication time that correlated wth computational imbalance, and automated trace analysis confirmed growing amounts of MPI waiting times. Further instrumentation, measurement and analyses pinpointed a conditional section of highly imbalanced computation which amplified waiting times inherent in the associated wavefront communication that seriously degraded overall execution efficiency at very large scales. By employing effective data collation, management and graphical presentation, Scalasca was thereby able to demonstrate performance measurements and analyses with 294,912 processes for the first time.
Parallel Tools Workshop | 2013
Daniel Lorenz; David Böhme; Bernd Mohr; Alexandre Strube; Zoltán Szebenyi
Scalasca is a performance analysis tool, which parses the trace of an application run for certain patterns that indicate performance inefficiencies. In this paper, we present recently developed new features in Scalasaca. In particular, we describe two newly implemented analysis methods: the root cause analysis which tries to identify the cause of a delay and the critical path analysis, which analyses the path of execution that determines the application runtime. Furthermore, we present time-series profiling, a method that allows to explore time-dependent behavior of an application. Finally, we extended the means of Scalasca and its output format CUBE to define and display topologies.
ieee international conference on high performance computing data and analytics | 2008
Felix Wolf; Brian J. N. Wylie; Erika Ábrahám; Daniel Becker; Wolfgang Frings; Karl Fürlinger; Markus Geimer; Marc André Hermanns; Bernd Mohr; Shirley Moore; Matthias Pfeifer; Zoltán Szebenyi
spec international performance evaluation workshop | 2008
Zoltán Szebenyi; Brian J. N. Wylie; Felix Wolf
6th International Parallel Tools Workshop | 2013
Daniel Lorenz; David Böhme; Alexandre Strube; Bernd Mohr; Zoltán Szebenyi