Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Felix Wolf is active.

Publication


Featured researches published by Felix Wolf.


parallel, distributed and network-based processing | 2003

Automatic performance analysis of hybrid MPI/OpenMP applications

Felix Wolf; Bernd Mohr

The EXPERT performance-analysis environment provides a complete tracing-based solution for automatic performance analysis of MPI, OpenMP, or hybrid applications running on parallel computers with SMP nodes. EXPERT describes performance problems using a high level of abstraction in terms of execution patterns that result from an inefficient use of the underlying programming model(s). The set of supported problems can be extended to meet application-specific needs. The analysis is carried out along three interconnected dimensions: class of performance behavior, call-tree position, and thread of execution. Each dimension is arranged in a hierarchy, so that the user can investigate the behavior on varying levels of detail. All three dimensions are interactively accessible using a single integrated view.


Parallel Tools Workshop | 2012

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope,Scalasca, TAU, and Vampir

Andreas Knüpfer; Christian Rössel; Dieter an Mey; Scott Biersdorff; Kai Diethelm; Dominic Eschweiler; Markus Geimer; Michael Gerndt; Daniel Lorenz; Allen D. Malony; Wolfgang E. Nagel; Yury Oleynik; Peter Philippen; Pavel Saviankou; Dirk Schmidl; Sameer Shende; Ronny Tschüter; Michael Wagner; Bert Wesarg; Felix Wolf

This paper gives an overview about the Score-P performance measurement infrastructure which is being jointly developed by leading HPC performance tools groups. It motivates the advantages of the joint undertaking from both the developer and the user perspectives, and presents the design and components of the newly developed Score-P performance measurement infrastructure. Furthermore, it contains first evaluation results in comparison with existing performance tools and presents an outlook to the long-term cooperative development of the new system.


The Journal of Supercomputing | 2002

Design and Prototype of a Performance Tool Interface for OpenMP

Bernd Mohr; Allen D. Malony; Sameer Shende; Felix Wolf

This paper proposes a performance tools interface for OpenMP, similar in spirit to the MPI profiling interface in its intent to define a clear and portable API that makes OpenMP execution events visible to runtime performance tools. We present our design using a source-level instrumentation approach based on OpenMP directive rewriting. Rules to instrument each directive and their combination are applied to generate calls to the interface consistent with directive semantics and to pass context information (e.g., source code locations) in a portable and efficient way. Our proposed OpenMP performance API further allows user functions and arbitrary code regions to be marked and performance measurement to be controlled using new OpenMP directives. To prototype the proposed OpenMP performance interface, we have developed compatible performance libraries for the Expert automatic event trace analyzer [17, 18] and the TAU performance analysis framework [13]. The directive instrumentation transformations we define are implemented in a source-to-source translation tool called OPARI. Application examples are presented for both Expert and TAU to show the OpenMP performance interface and OPARI instrumentation tool in operation. When used together with the MPI profiling interface (as the examples also demonstrate), our proposed approach provides a portable and robust solution to performance analysis of OpenMP and mixed-mode (OpenMP+MPI) applications.


european conference on parallel processing | 2003

KOJAK – A Tool Set for Automatic Performance Analysis of Parallel Programs

Bernd Mohr; Felix Wolf

Today’s parallel computers with SMP nodes provide both multithreading and message passing as their modes of parallel execution. As a consequence, performance analysis and optimization becomes more difficult and creates a need for advanced performance tools that are custom made for this class of computing environments. Current state-of-the-art tools provide valuable assistance in analyzing the performance of mpi and Openmp programs by visualizing the run-time behavior and calculating statistics over the performance data. However, the developer of parallel programs is still required to filter out relevant parts from a huge amount of low-level information shown in numerous displays and map that information onto program abstractions without tool support.


ieee international conference on high performance computing data and analytics | 2009

Scalable massively parallel I/O to task-local files

Wolfgang Frings; Felix Wolf; Ventsislav Petkov

Parallel applications often store data in multiple task-local files, for example, to remember checkpoints, to circumvent memory limitations, or to record performance data. When operating at very large processor configurations, such applications often experience scalability limitations when the simultaneous creation of thousands of files causes metadataserver contention or simply when large file counts complicate file management or operations on those files even destabilize the file system. SIONlib is a parallel I/O library that addresses this problem by transparently mapping a large number of task-local files onto a small number of physical files via internal metadata handling and block alignment to ensure high performance. While requiring only minimal source code changes, SIONlib significantly reduces file creation overhead and simplifies file handling without penalizing read and write performance. We evaluate SIONlibs efficiency with up to 288 K tasks and report significant performance improvements in two application scenarios.


ieee international conference on high performance computing data and analytics | 2013

Using automated performance modeling to find scalability bugs in complex codes

Alexandru Calotoiu; Torsten Hoefler; Marius Poke; Felix Wolf

Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made-a point where remediation can be difficult. However, creating analytical performance models that would allow such issues to be pinpointed earlier is so laborious that application developers attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. In this paper, we show how both coverage and speed of this scalability analysis can be substantially improved. Generating an empirical performance model automatically for each part of a parallel program, we can easily identify those parts that will reduce performance at larger core counts. Using a climate simulation as an example, we demonstrate that scalability bugs are not confined to those routines usually chosen as kernels.


ieee international conference on high performance computing data and analytics | 1999

EARL - A Programmable and Extensible Toolkit for Analyzing Event Traces of Message Passing Programs

Felix Wolf; Bernd Mohr

This paper describes a new meta-tool name EARL which consists of a new high-level trace analysis language and its interpreter which allows to easily construct new trace analysis tools. Because of its programmability and flexibility, EARL can be used for a wide range of event trace analysis tasks. It is especially well-suited for automatic and for application or domain specific trace analysis and program validation. We describe the abstract view on an event trace the EARL interpreter provides to the user, and give an overview about the EARL language. Finally, a set of EARL script examples are used to demonstrate the features of EARL.


parallel computing | 2009

A scalable tool architecture for diagnosing wait states in massively parallel applications

Markus Geimer; Felix Wolf; Brian J. N. Wylie; Bernd Mohr

When scaling message-passing applications to thousands of processors, their performance is often affected by wait states that occur when processes fail to reach synchronization points simultaneously. As a first step in reducing the performance impact, we have shown in our earlier work that wait states can be diagnosed by searching event traces for characteristic patterns. However, our initial sequential search method did not scale beyond several hundred processes. Here, we present a scalable approach, based on a parallel replay of the target applications communication behavior, that can efficiently identify wait states at the previously inaccessible scale of 65,536 processes and that has potential for even larger configurations. We explain how our new approach has been integrated into a comprehensive parallel tool architecture, which we use to demonstrate that wait states may consume a major fraction of the execution time at larger scales.


european conference on parallel processing | 2000

Automatic Performance Analysis of MPI Applications Based on Event Traces

Felix Wolf; Bernd Mohr

This article presents a class library for detecting typical performance problems in event traces of MPI applications. The library is implemented using the powerful high-level trace analysis language EARL and is embedded in the extensible tool component EXPERT described in this paper. One essential feature of EXPERT is a flexible plug-in mechanism which allows the user to easily integrate performance problem descriptions specific to a distinct parallel application without modifying the tool component.


Archive | 2011

Score-P: A Unified Performance Measurement System for Petascale Applications

Dieter an Mey; Scott Biersdorf; Christian H. Bischof; Kai Diethelm; Dominic Eschweiler; Michael Gerndt; Andreas Knüpfer; Daniel Lorenz; Allen D. Malony; Wolfgang E. Nagel; Yury Oleynik; Christian Rössel; Pavel Saviankou; Dirk Schmidl; Sameer Shende; Michael Wagner; Bert Wesarg; Felix Wolf

The rapidly growing number of cores on modern supercomputers imposes scalability demands not only on applications but also on the software tools needed for their development. At the same time, increasing application and system complexity makes the optimization of parallel codes more difficult, creating a need for scalable performance-analysis technology with advanced functionality. However, delivering such an expensive technology can hardly be accomplished by single tool developers and requires higher degrees of collaboration within the HPC community. The unified performance-measurement system Score-P is a joint effort of several academic performance-tool builders, funded under the BMBF program HPC-Software fur skalierbare Parallelrechner in the SILC project (Skalierbare Infrastruktur zur automatischen Leistungsanalyse paralleler Codes). It is being developed with the objective of creating a common basis for several complementary optimization tools in the service of enhanced scalability, improved interoperability, and reduced maintenance cost.

Collaboration


Dive into the Felix Wolf's collaboration.

Top Co-Authors

Avatar

Bernd Mohr

Forschungszentrum Jülich

View shared research outputs
Top Co-Authors

Avatar

Markus Geimer

Forschungszentrum Jülich

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Morris Riedel

Forschungszentrum Jülich

View shared research outputs
Top Co-Authors

Avatar

Achim Streit

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Alexandru Calotoiu

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Ali Jannesari

University of California

View shared research outputs
Top Co-Authors

Avatar

Ali Jannesari

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge