Joachim Protze | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joachim Protze is active.

Explore More

Publication

Featured researches published by Joachim Protze.

ieee international conference on high performance computing data and analytics | 2012

MPI runtime error detection with MUST: advances in deadlock detection

Tobias Hilbrich; Joachim Protze; Martin Schulz; Bronis R. de Supinski; Matthias S. Müller

The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require O(p) analysis time per MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.

ieee international conference on high performance computing data and analytics | 2013

Distributed wait state tracking for runtime MPI deadlock detection

Tobias Hilbrich; Bronis R. de Supinski; Wolfgang E. Nagel; Joachim Protze; Christel Baier; Matthias S. Müller

The widely used Message Passing Interface (MPI) with its multitude of communication functions is prone to usage errors. Runtime error detection tools aid in the removal of these errors. We develop MUST as one such tool that provides a wide variety of automatic correctness checks. Its correctness checks can be run in a distributed mode, except for its deadlock detection. This limitation applies to a wide range of tools that either use centralized detection algorithms or a timeout approach. In order to provide scalable and distributed deadlock detection with detailed insight into deadlock situations, we propose a model for MPI blocking conditions that we use to formulate a distributed algorithm. This algorithm implements scalable MPI deadlock detection in MUST. Stress tests at up to 4,096 processes demonstrate the scalability of our approach. Finally, overhead results for a complex benchmark suite demonstrate an average runtime increase of 34% at 2,048 processes.

international parallel and distributed processing symposium | 2016

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications

Simone Atzeni; Ganesh Gopalakrishnan; Zvonimir Rakamarić; Dong H. Ahn; Ignacio Laguna; Martin Schulz; Gregory L. Lee; Joachim Protze; Matthias S. Müller

OpenMP plays a growing role as a portable programming model to harness on-node parallelism, yet, existing data race checkers for OpenMP have high overheads and generate many false positives. In this paper, we propose the first OpenMP data race checker, ARCHER, that achieves high accuracy, low overheads on large applications, and portability. ARCHER incorporates scalable happens-before tracking, exploits structured parallelism via combined static and dynamic analysis, and modularly interfaces with OpenMP runtimes. ARCHER significantly outperforms TSan and Intel® Inspector XE, while providing the same or better precision. It has helped detect critical data races in the Hypre library that is central to many projects at Lawrence Livermore National Laboratory and elsewhere.

international parallel and distributed processing symposium | 2012

Holistic Debugging of MPI Derived Datatypes

Joachim Protze; Tobias Hilbrich; Andreas Knüpfer; Bronis R. de Supinski; Matthias S. Müller

The Message Passing Interface (MPI) specifies an API that allows programmers to create efficient and scalable parallel applications. The standard defines multiple constraints for each function parameter. For performance reasons, no MPI implementation checks all of these constraints at runtime. Derived data types are an important concept of MPI and allow users to describe an applications data structures for efficient and convenient communication. Using existing infrastructure we present scalable algorithms to detect usage errors of basic and derived MPI data types. We detect errors that include constraints for construction and usage of derived data types, matching their type signatures in communication, and detecting erroneous overlaps of communication buffers. We implement these checks in the MUST runtime error detection framework. We provide a novel representation of error locations to highlight usage errors. Further, approaches to buffer overlap checking can cause unacceptable overheads for non-contiguous data types. We present an algorithm that uses patterns in derived MPI data types to avoid these overheads without losing precision. Application results for the benchmark suites SPEC MPI2007 and NAS Parallel Benchmarks for up to 2048 cores show that our approach applies to a broad range of applications and that our extended overlap check improves performance by two orders of magnitude. Finally, we augment our runtime error detection component with a debugger extension to support in-depth analysis of the errors that we find as well as semantic errors. This extension to gdb provides information about MPI data type handles and enables gdb -- and other debuggers based on gdb -- to display the content of a buffer as used in MPI communications.

international workshop on openmp | 2014

Classification of Common Errors in OpenMP Applications

Jan Felix Münchhalfen; Tobias Hilbrich; Joachim Protze; Christian Terboven; Matthias S. Müller

With the increased core count in current HPC systems, node level parallelization has become more important even on distributed memory systems. The evolution of HPC therefore requires programming models to be capable of not only reacting to errors, but also resolving them. We derive a classification of common OpenMP usage errors and evaluate them in terms of automatic detection by correctness-checking tools, the OpenMP runtime and debuggers. After a short overview of the new features that were introduced in the OpenMP 4.0 standard, we discuss in more detail individual error cases that emerged due to the task construct of OpenMP 3.0 and the target construct of OpenMP 4.0. We further propose a default behavior to resolve the situation if the runtime is capable of handling the usage error. Besides the specific cases of error we discuss in this work, others can be distinctly integrated into our classification.

ieee international conference on high performance computing data and analytics | 2014

Towards providing low-overhead data race detection for large OpenMP applications

Joachim Protze; Simone Atzeni; Dong H. Ahn; Martin Schulz; Ganesh Gopalakrishnan; Matthias S. Müller; Ignacio Laguna; Zvonimir Rakamarić; Gregory L. Lee

Neither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state-of-the-art dynamic race detectors are aimed at low-level threading models, and cannot handle high-level models such as OpenMP. Further, they do not provide mechanisms by which static analysis methods can target selected regions of code with sufficient precision. In this paper, we present our solutions to both challenges. Specifically, we identify patterns within OpenMP runtimes that tend to mislead existing dynamic race checkers and provide mechanisms that help establish an explicit happens-before relation to prevent such misleading checks. We also implement a fine-grained blacklist mechanism to allow a runtime analyzer to exclude regions of code at line number granularity. We support race checking by adapting ThreadSanitizer, a mature data-race checker developed at Google that is now an integral part of Clang and GCC; and we have implemented our techniques within the state-of-the-art Intel OpenMP Runtime. Our results demonstrate that these techniques can significantly improve runtime analysis accuracy and overhead in the context of data race checking of OpenMP applications.

international conference on parallel processing | 2013

Intralayer Communication for Tree-Based Overlay Networks

Tobias Hilbrich; Joachim Protze; Bronis R. de Supinski; Martin Schulz; Matthias S. Müller; Wolfgang E. Nagel

While various HPC tools use Tree-Based Overlay Networks (TBONs) to increase their scalability, some use cases do not map well to a tree-based hierarchy. We provide the concept of intralayer communication to improve this situation, where nodes in a specific hierarchy layer may exchange messages directly with each other. This concept targets data preprocessing that allows tool developers to avoid load imbalances in higher hierarchy levels. We implement intralayer communication within the Generic Tools Infrastructure (GTI) that provides TBON services, as well as a high-level abstraction to ease the creation of scalable runtime tools. An extension of GTIs abstractions allows simple and efficient use of intralayer communication. We demonstrate this capability with a runtime message matching tool for MPIs point-to-point communication, which we evaluate in an application study with up to 16,384 processes. Low overheads for two benchmark suites show the applicability of our approach, while a stress test demonstrates close to constant overheads across scales. The stress test measurements demonstrate that intralayer communication reduces application slowdown by two orders of magnitude at 2,048 processes, compared to a previous TBON-based implementation.

international workshop on openmp | 2017

OpenMP Tools Interface: Synchronization Information for Data Race Detection

Joachim Protze; Jonas Hahnfeld; Dong H. Ahn; Martin Schulz; Matthias S. Müller

When it comes to data race detection, complete information about synchronization, concurrency and memory accesses is needed. This information might be gathered at various levels of abstraction. For best results regarding accuracy this information should be collected at the abstraction level of the parallel programming paradigm. With the latest preview of the OpenMP specification, a tools interface (OMPT) was added to OpenMP. In this paper we discuss whether the synchronization information provided by OMPT is sufficient to apply accurate data race analysis for OpenMP applications. We further present some implementation details and results for our data race detection tool called Archer which derives the synchronization information from OMPT.

Proceedings of the 23rd European MPI Users' Group Meeting on | 2016

Runtime Correctness Analysis of MPI-3 Nonblocking Collectives

Tobias Hilbrich; Matthias Weber; Joachim Protze; Bronis R. de Supinski; Wolfgang E. Nagel

The Message Passing Interface (MPI) includes nonblocking collective operations that support additional overlap between computation and communication. These new operations enable complex data movement between large numbers of processes. However, their asynchronous behavior hides and complicates the detection of defects in their use. We highlight a lack of correctness tool support for these operations and extend the MUST runtime MPI correctness tool to alleviate this complexity. We introduce a classification to summarize the types of correctness analyses that are applicable to MPIs nonblocking collectives. We identify complex wait-for dependencies in deadlock situations and incorrect use of communication buffers as the most challenging types of usage errors. We devise, demonstrate, and evaluate the applicability of correctness analyses for these errors. A scalable analysis mechanism allows our runtime approach to scale with the application. Benchmark measurements highlight the scalability and applicability of our approach at up to 4,096 application processes and with low overhead.

OpenSHMEM 2015 Revised Selected Papers of the Second Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies - Volume 9397 | 2015

Dynamic Analysis to Support Program Development with the Textually Aligned Property for OpenSHMEM Collectives

Andreas Knüpfer; Tobias Hilbrich; Joachim Protze; Joseph Schuchart

The development of correct high performance computing applications is challenged by software defects that result from parallel programming. We present an automatic tool that provides novel correctness capabilities for application developers of OpenSHMEM applications. These applications follow a Single Program Multiple Data SPMD model of parallel programming. A strict form of SPMD programming requires that certain types of operations are textually aligned, i.e., they need to be called from the same source code line in every process. This paper proposes and demonstrates run-time checks that assert such behavior for OpenSHMEM collective communication calls. The resulting tool helps to check program consistency in an automatic and scalable fashion. We introduce the types of checks that we cover and include strict checks that help application developers to detect deviations from expected program behavior. Further, we discuss how we can utilize a parallel tool infrastructure to achieve a scalable and maintainable implementation for these checks. Finally, we discuss an extension of our checks towards further types of OpenSHMEM operations.

Explore More