Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zoltán Szebenyi is active.

Publication


Featured researches published by Zoltán Szebenyi.


parallel computing | 2010

Further improving the scalability of the scalasca toolset

Markus Geimer; Pavel Saviankou; Alexandre Strube; Zoltán Szebenyi; Felix Wolf; Brian J. N. Wylie

Scalasca is an open-source toolset that can be used to analyze the performance behavior of parallel applications and to identify opportunities for optimization. Target applications include simulation codes from science and engineering based on the parallel programming interfaces MPI and/or OpenMP. Scalasca, which has been specifically designed for use on large-scale machines such as IBM Blue Gene and Cray XT, integrates runtime summaries suitable to obtain a performance overview with in-depth studies of concurrent behavior via event tracing. Although Scalasca was already successfully used with codes running with 294,912 cores on a 72-rack Blue Gene/P system, the current software design shows scalability limitations that adversely affect user experience and that will present a serious obstacle on the way to mastering larger scales in the future. In this paper, we outline how to address the two most important ones, namely the unification of local identifiers at measurement finalization as well as collating and displaying analysis reports.


ieee international conference on high performance computing data and analytics | 2009

Space-efficient time-series call-path profiling of parallel applications

Zoltán Szebenyi; Felix Wolf; Brian J. N. Wylie

The performance behavior of parallel simulations often changes considerably as the simulation progresses - with potentially process-dependent variations of temporal patterns. While call-path profiling is an established method of linking a performance problem to the context in which it occurs, call paths reveal only little information about the temporal evolution of performance phenomena. However, generating call-path profiles separately for thousands of iterations may exceed available buffer space - especially when the call tree is large and more than one metric is collected. In this paper, we present a runtime approach for the semantic compression of call-path profiles based on incremental clustering of a series of single-iteration profiles that scales in terms of the number of iterations without sacrificing important performance details. Our approach offers low runtime overhead by using only a condensed version of the profile data when calculating distances and accounts for process-dependent variations by making all clustering decisions locally.


Parallel Processing Letters | 2010

LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET

Brian J. N. Wylie; Markus Geimer; Bernd Mohr; David Böhme; Zoltán Szebenyi; Felix Wolf

Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer systems relying on applications being able to exploit very large configurations of processor cores, and associated analysis tools must also scale commensurately to isolate and quantify performance issues that manifest at the largest scales. In studying the scalability of the Scalasca performance analysis toolset to several hundred thousand MPI processes on XT5 and BG/P systems, we investigated a progressive execution performance deterioration of the well-known ASCI Sweep3D compact application. Scalasca runtime summarization analysis quantified MPI communication time that correlated with computational imbalance, and automated trace analysis confirmed growing amounts of MPI waiting times. Further instrumentation, measurement and analyses pinpointed a conditional section of highly imbalanced computation which amplified waiting times inherent in the associated wavefront communication that seriously degraded overall execution efficiency at very large scales. By employing effective data collation, management and graphical presentation, in a portable and straightforward to use toolset, Scalasca was thereby able to demonstrate performance measurements and analyses with 294,912 processes.


Parallel Tools Workshop | 2010

Recent Developments in the Scalasca Toolset

Markus Geimer; Felix Wolf; Brian J. N. Wylie; Daniel Becker; David Böhme; Wolfgang Frings; Marc-André Hermanns; Bernd Mohr; Zoltán Szebenyi

The number of processor cores on modern supercomputers is increasing from generation to generation, and as a consequence HPC applications are required to harness much higher degrees of parallelism to satisfy their growing demand for computing power. However, writing code that runs efficiently on large processor configurations remains a significant challenge. The situation is exacerbated by the rising number of cores imposing scalability demands not only on applications but also on the software tools needed for their development.


european conference on parallel processing | 2009

Scalasca Parallel Performance Analyses of PEPC

Zoltán Szebenyi; Brian J. N. Wylie; Felix Wolf

PEPC (Pretty Efficient Parallel Coulomb-solver) is a complex HPC application developed at the Julich Supercomputing Centre, scaling to thousands of processors. This is a case study of challenges faced when applying the Scalasca parallel performance analysis toolset to this intricate example at relatively high processor counts. The Scalasca version used in this study has been extended to distinguish iteration/timestep phases to provide a better view of the underlying mechanisms of the application execution. The added value of the additional analyses and presentations is then assessed to determine requirements for possible future integration within Scalasca.


ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset

Brian J. N. Wylie; David Böhme; Bernd Mohr; Zoltán Szebenyi; Felix Wolf

In studying the scalability of the Scalasca performance analysis toolset to several hundred thousand MPI processes on IBM Blue Gene/P, we investigated a progressive execution performance deterioration of the well-known ASCI Sweep3D compact application. Scalasca runtime summarization analysis quantified MPI communication time that correlated wth computational imbalance, and automated trace analysis confirmed growing amounts of MPI waiting times. Further instrumentation, measurement and analyses pinpointed a conditional section of highly imbalanced computation which amplified waiting times inherent in the associated wavefront communication that seriously degraded overall execution efficiency at very large scales. By employing effective data collation, management and graphical presentation, Scalasca was thereby able to demonstrate performance measurements and analyses with 294,912 processes for the first time.


Parallel Tools Workshop | 2013

Extending Scalasca’s Analysis Features

Daniel Lorenz; David Böhme; Bernd Mohr; Alexandre Strube; Zoltán Szebenyi

Scalasca is a performance analysis tool, which parses the trace of an application run for certain patterns that indicate performance inefficiencies. In this paper, we present recently developed new features in Scalasaca. In particular, we describe two newly implemented analysis methods: the root cause analysis which tries to identify the cause of a delay and the critical path analysis, which analyses the path of execution that determines the application runtime. Furthermore, we present time-series profiling, a method that allows to explore time-dependent behavior of an application. Finally, we extended the means of Scalasca and its output format CUBE to define and display topologies.


ieee international conference on high performance computing data and analytics | 2008

Usage of the SCALASCA Toolset for Scalable Performance Analysis of Large-Scale Parallel Applications

Felix Wolf; Brian J. N. Wylie; Erika Ábrahám; Daniel Becker; Wolfgang Frings; Karl Fürlinger; Markus Geimer; Marc André Hermanns; Bernd Mohr; Shirley Moore; Matthias Pfeifer; Zoltán Szebenyi


spec international performance evaluation workshop | 2008

SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications

Zoltán Szebenyi; Brian J. N. Wylie; Felix Wolf


6th International Parallel Tools Workshop | 2013

Extending Scalasca's analysis features

Daniel Lorenz; David Böhme; Alexandre Strube; Bernd Mohr; Zoltán Szebenyi

Collaboration


Dive into the Zoltán Szebenyi's collaboration.

Top Co-Authors

Avatar

Felix Wolf

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Bernd Mohr

Forschungszentrum Jülich

View shared research outputs
Top Co-Authors

Avatar

Markus Geimer

Forschungszentrum Jülich

View shared research outputs
Top Co-Authors

Avatar

David Böhme

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel Lorenz

Forschungszentrum Jülich

View shared research outputs
Top Co-Authors

Avatar

Wolfgang Frings

Forschungszentrum Jülich

View shared research outputs
Researchain Logo
Decentralizing Knowledge