Tobias Hilbrich | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tobias Hilbrich is active.

Explore More

Publication

Featured researches published by Tobias Hilbrich.

ieee international conference on high performance computing data and analytics | 2012

MPI runtime error detection with MUST: advances in deadlock detection

Tobias Hilbrich; Joachim Protze; Martin Schulz; Bronis R. de Supinski; Matthias S. Müller

The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require O(p) analysis time per MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.

international conference on supercomputing | 2009

A graph based approach for MPI deadlock detection

Tobias Hilbrich; Bronis R. de Supinski; Martin Schulz; Matthias S. Müller

The MPI standard defines several usage patterns that can lead to deadlock, some of which involve collective communications or non-deterministic operations such as wildcard receives. Further, some MPI programming deadlocks only occur for some MPI implementations or certain configurations. Many tools to detect MPI deadlocks exist; however, none precisely handles the increased complexity of deadlock detection created by the richness of the MPI standard, which requires a general deadlock model. We present the first general deadlock model for MPI including a novel necessary and sufficient criterion, the OR-Knot, for deadlock in MPI programs. This model enables visualization of MPI deadlocks and motivates the design of a new deadlock detection mechanism. We compare our implementation of this mechanism to the ad-hoc mechanism previously available in Umpire, which reflected MPI non-determinism and, thus, more completely detected MPI deadlocks than any other existing MPI deadlock detection tool. Overall, our results demonstrate that our mechanism improves performance by as much as two orders of magnitude while providing precise characterization of deadlocks.

Parallel Tools Workshop | 2010

MUST: A Scalable Approach to Runtime Error Detection in MPI Programs

Tobias Hilbrich; Martin Schulz; Bronis R. de Supinski; Matthias S. Müller

The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or non-portable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size. We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses P n MPI to instantiate a tool from a set of individual modules. We present an overview of our design, along with first performance results for a proof of concept implementation.

international parallel and distributed processing symposium | 2012

GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems

Tobias Hilbrich; Matthias S. Müller; Bronis R. de Supinski; Martin Schulz; Wolfgang E. Nagel

Runtime detection of semantic errors in MPI applications supports efficient and correct large-scale application development. However, current approaches scale to at most one thousand processes and design limitations prevent increased scalability. The need for global knowledge for analyses such as type matching, and deadlock detection presents a major challenge. We present a scalable tool infrastructure - the Generic Tool Infrastructure (GTI) - that we will use to implement MPI runtime error detection tools and that applies to other use cases. GTI supports simple offloading of tool processing onto extra processes or threads and provides a tree based overlay network (TBON) for creating scalable tools that analyze global knowledge. We present its abstractions and code generation facilities that ease many hurdles in tool development, including wrapper generation, tool communication, trace reductions, and filters. GTI ultimately allows tool developers to focus on implementing tool functionality instead of the surrounding infrastructure. Further, we demonstrate that GTI supports scalable tool development through a lost message detector and a phase profiler. The former provides a more scalable implementation of important base functionality for MPI correctness checking, while the latter tool demonstrates that GTI can serve as the basis of further types of tools. Experiments with up to 2048 cores show that GTIs scalability features apply to both tools.

ieee international conference on high performance computing data and analytics | 2013

Distributed wait state tracking for runtime MPI deadlock detection

Tobias Hilbrich; Bronis R. de Supinski; Wolfgang E. Nagel; Joachim Protze; Christel Baier; Matthias S. Müller

The widely used Message Passing Interface (MPI) with its multitude of communication functions is prone to usage errors. Runtime error detection tools aid in the removal of these errors. We develop MUST as one such tool that provides a wide variety of automatic correctness checks. Its correctness checks can be run in a distributed mode, except for its deadlock detection. This limitation applies to a wide range of tools that either use centralized detection algorithms or a timeout approach. In order to provide scalable and distributed deadlock detection with detailed insight into deadlock situations, we propose a model for MPI blocking conditions that we use to formulate a distributed algorithm. This algorithm implements scalable MPI deadlock detection in MUST. Stress tests at up to 4,096 processes demonstrate the scalability of our approach. Finally, overhead results for a complex benchmark suite demonstrate an average runtime increase of 34% at 2,048 processes.

Proceedings of the 20th European MPI Users' Group Meeting on | 2013

Runtime MPI collective checking with tree-based overlay networks

Tobias Hilbrich; Bronis R. de Supinski; Fabian Hänsel; Matthias S. Müller; Martin Schulz; Wolfgang E. Nagel

Runtime error detection tools detect many classes of MPI usage errors, including errors in collective communication calls. However, they often face scalability challenges. We present runtime checks for MPI collective operations that use a Tree-Based Overlay Network (TBON) for scalability and that provide full datatype matching. While we can use transitive correctness properties for most checks, some collective operations impose non-transitive correctness properties, e.g., MPI_Alltoallv, where we use an intralayer communication within the TBON to distribute datatype matching information. An overhead study with stress tests and two benchmark suites demonstrates applicability and scalability at 4,096, 2,048 and 16,384 processes respectively.

international parallel and distributed processing symposium | 2012

Holistic Debugging of MPI Derived Datatypes

Joachim Protze; Tobias Hilbrich; Andreas Knüpfer; Bronis R. de Supinski; Matthias S. Müller

The Message Passing Interface (MPI) specifies an API that allows programmers to create efficient and scalable parallel applications. The standard defines multiple constraints for each function parameter. For performance reasons, no MPI implementation checks all of these constraints at runtime. Derived data types are an important concept of MPI and allow users to describe an applications data structures for efficient and convenient communication. Using existing infrastructure we present scalable algorithms to detect usage errors of basic and derived MPI data types. We detect errors that include constraints for construction and usage of derived data types, matching their type signatures in communication, and detecting erroneous overlaps of communication buffers. We implement these checks in the MUST runtime error detection framework. We provide a novel representation of error locations to highlight usage errors. Further, approaches to buffer overlap checking can cause unacceptable overheads for non-contiguous data types. We present an algorithm that uses patterns in derived MPI data types to avoid these overheads without losing precision. Application results for the benchmark suites SPEC MPI2007 and NAS Parallel Benchmarks for up to 2048 cores show that our approach applies to a broad range of applications and that our extended overlap check improves performance by two orders of magnitude. Finally, we augment our runtime error detection component with a debugger extension to support in-depth analysis of the errors that we find as well as semantic errors. This extension to gdb provides information about MPI data type handles and enables gdb -- and other debuggers based on gdb -- to display the content of a buffer as used in MPI communications.

international workshop on openmp | 2014

Classification of Common Errors in OpenMP Applications

Jan Felix Münchhalfen; Tobias Hilbrich; Joachim Protze; Christian Terboven; Matthias S. Müller

With the increased core count in current HPC systems, node level parallelization has become more important even on distributed memory systems. The evolution of HPC therefore requires programming models to be capable of not only reacting to errors, but also resolving them. We derive a classification of common OpenMP usage errors and evaluate them in terms of automatic detection by correctness-checking tools, the OpenMP runtime and debuggers. After a short overview of the new features that were introduced in the OpenMP 4.0 standard, we discuss in more detail individual error cases that emerged due to the task construct of OpenMP 3.0 and the target construct of OpenMP 4.0. We further propose a default behavior to resolve the situation if the runtime is capable of handling the usage error. Besides the specific cases of error we discuss in this work, others can be distinctly integrated into our classification.

international conference on parallel processing | 2013

Intralayer Communication for Tree-Based Overlay Networks

Tobias Hilbrich; Joachim Protze; Bronis R. de Supinski; Martin Schulz; Matthias S. Müller; Wolfgang E. Nagel

While various HPC tools use Tree-Based Overlay Networks (TBONs) to increase their scalability, some use cases do not map well to a tree-based hierarchy. We provide the concept of intralayer communication to improve this situation, where nodes in a specific hierarchy layer may exchange messages directly with each other. This concept targets data preprocessing that allows tool developers to avoid load imbalances in higher hierarchy levels. We implement intralayer communication within the Generic Tools Infrastructure (GTI) that provides TBON services, as well as a high-level abstraction to ease the creation of scalable runtime tools. An extension of GTIs abstractions allows simple and efficient use of intralayer communication. We demonstrate this capability with a runtime message matching tool for MPIs point-to-point communication, which we evaluate in an application study with up to 16,384 processes. Low overheads for two benchmark suites show the applicability of our approach, while a stress test demonstrates close to constant overheads across scales. The stress test measurements demonstrate that intralayer communication reduces application slowdown by two orders of magnitude at 2,048 processes, compared to a previous TBON-based implementation.

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface | 2011

Order preserving event aggregation in TBONs

Tobias Hilbrich; Matthias S. Müller; Martin Schulz; Bronis R. de Supinski

Runtime tools for MPI applications must gather information from all processes to a tool front-end for presentation. Scalability requies that tools aggregate and reduce this information so tool developers often use a Tree Based Overlay Network (TBON). TBONs aggregate multiple associated events through a hierarchical communication structure. We present a novel algorithm to execute multiple aggregations while, at the same time, preserving relevant event orders. We implement this algorithm in our tool infrastructure that provides TBON functionality as one of its services. We demonstrate that our approach provides scalability with experiments for up to 2048 tasks.

Explore More