Matthias S. Müller | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthias S. Müller is active.

Explore More

Publication

Featured researches published by Matthias S. Müller.

international conference on parallel architectures and compilation techniques | 2009

Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

Daniel Molka; Daniel Hackenberg; Robert Schöne; Matthias S. Müller

Todays microprocessors have complex memory subsystems with several cache levels. The efficient use of this memory hierarchy is crucial to gain optimal performance, especially on multicore processors. Unfortunately, many implementation details of these processors are not publicly available. In this paper we present such fundamental details of the newly introduced Intel Nehalem microarchitecture with its integrated memory controller, Quick Path Interconnect, and ccNUMA architecture. Our analysis is based on sophisticated benchmarks to measure the latency and bandwidth between different locations in the memory subsystem. Special care is taken to control the coherency state of the data to gain insight into performance relevant implementation details of the cache coherency protocol. Based on these benchmarks we present undocumented performance data and architectural properties.

ieee international conference on high performance computing data and analytics | 2012

MPI runtime error detection with MUST: advances in deadlock detection

Tobias Hilbrich; Joachim Protze; Martin Schulz; Bronis R. de Supinski; Matthias S. Müller

The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require O(p) analysis time per MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.

international conference on supercomputing | 2009

A graph based approach for MPI deadlock detection

Tobias Hilbrich; Bronis R. de Supinski; Martin Schulz; Matthias S. Müller

The MPI standard defines several usage patterns that can lead to deadlock, some of which involve collective communications or non-deterministic operations such as wildcard receives. Further, some MPI programming deadlocks only occur for some MPI implementations or certain configurations. Many tools to detect MPI deadlocks exist; however, none precisely handles the increased complexity of deadlock detection created by the richness of the MPI standard, which requires a general deadlock model. We present the first general deadlock model for MPI including a novel necessary and sufficient criterion, the OR-Knot, for deadlock in MPI programs. This model enables visualization of MPI deadlocks and motivates the design of a new deadlock detection mechanism. We compare our implementation of this mechanism to the ad-hoc mechanism previously available in Umpire, which reflected MPI non-determinism and, thus, more completely detected MPI deadlocks than any other existing MPI deadlock detection tool. Overall, our results demonstrate that our mechanism improves performance by as much as two orders of magnitude while providing precise characterization of deadlocks.

international workshop on openmp | 2012

SPEC OMP2012 -- an application benchmark suite for parallel systems using OpenMP

Matthias S. Müller; John Baron; William C. Brantley; Huiyu Feng; Daniel Hackenberg; Robert Henschel; Gabriele Jost; Daniel Molka; Chris Parrott; Joe Robichaux; Pavel Shelepugin; G. Matthijs van Waveren; Brian Whitney; Kalyan Kumaran

This paper describes SPEC OMP2012, a benchmark developed by the SPEC High Performance Group. It consists of 15 OpenMP parallel applications from a wide range of fields. In addition to a performance metric based on the run time of the applications the benchmark adds an optional energy metric. The accompanying run rules detail how the benchmarks are executed and the results reported. They also cover the energy measurements. The first set of results provide scalability on three different platforms.

computational science and engineering | 2009

Tools for scalable parallel program analysis: Vampir NG, MARMOT, and DeWiz

Holger Brunst; Dieter Kranzlmüller; Matthias S. Müller; Wolfgang E. Nagel

Large-scale high-performance computing systems pose a tough obstacle for todays program analysis tools. Their demands in computational performance and memory capacity for processing program analysis data exceed the capabilities of standard workstations and traditional analysis tools. A comparison of the sophisticated approaches of Vampir NG (VNG), the Debugging Wizard DeWiz and the correctness-checking tool MARMOT provides novel ideas for scalable parallel program analysis. While VNG exploits the power of cluster architectures for near real-time performance analysis, DeWiz utilises distributed computing infrastructures for distinct analysis activities. MARMOT combines automatic runtime and partially distributed analysis.

international conference on green computing | 2010

Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors

Daniel Molka; Daniel Hackenberg; Robert Schöne; Matthias S. Müller

The energy efficiency of computer systems is influenced by many interdependent aspects. To asses the efficiency, typical benchmarks characterized the total power consumption of a computer system under certain domain specific workloads. For example, in case of the SPECPower benchmark the workload is a typical web server specific Java application. The contribution of individual components is usually not considered in this class of benchmarks. The CPU makes the most significant contribution due to both its high peak power consumption and the high variability depending on the workload. Correlations of workload and energy consumption of parts of the processors are usually done with simulations rather than actual measurements. This is mainly a consequence of the limited time resolution of power meters that is usually orders of magnitude too low to observe variations in the time scale of microarchitectural events. Furthermore, it is usually not possible to solely measure power consumption of processors as they are supplied by multiple power lines that are not easily accessible and are often shared with other components. In this paper we present benchmarks and a measurement methodology that compensate for the time resolution of our power meter by applying a constant and well-defined workload to the system. Using this experimental setup we analyze x86−64 microarchitectures from AMD and Intel. We furthermore characterize the contribution of individual operations and data transfers to the total power consumption of the Intel system.

International Journal of Digital Earth | 2014

Scientific geodata infrastructures: challenges, approaches and directions

Lars Bernard; Stephan Mäs; Matthias S. Müller; Christin Henzen; Johannes Brauner

Based on various experiences in developing Geodata Infrastructures (GDIs) for scientific applications, this article proposes the concept of a Scientific GDI that can be used by scientists in environmental and earth sciences to share and disseminate their research results and related analysis methods. Scientific GDI is understood as an approach to tackle the science case in Digital Earth and to further enhance e-science for environmental research. Creating Scientific GDI to support the research community in efficiently exchanging data and methods related to the various scientific disciplines forming the basis of environmental studies poses numerous challenges on todays GDI developments. The paper summarizes requirements and recommendations on the publication of scientific geospatial data and on functionalities to be provided in Scientific GDI. Best practices and open issues for governance and policies of a Scientific GDI are discussed and are concluded by deriving a research agenda for the next decade.

Parallel Tools Workshop | 2010

MUST: A Scalable Approach to Runtime Error Detection in MPI Programs

Tobias Hilbrich; Martin Schulz; Bronis R. de Supinski; Matthias S. Müller

The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or non-portable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size. We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses P n MPI to instantiate a tool from a set of individual modules. We present an overview of our design, along with first performance results for a proof of concept implementation.

Computer Science - Research and Development | 2010

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks

Daniel Hackenberg; Robert Schöne; Daniel Molka; Matthias S. Müller; Andreas Knüpfer

The power consumption of an HPC system is not only a major concern due to the huge associated operational cost. It also poses high demands on the infrastructure required to operate such a system. The power consumption strongly depends on the executed workload and is influenced by the system hard- and software and its setup. In this paper we analyze the power consumption of a 32-node cluster across a wide range of parallel applications using the SPEC MPI2007 benchmark. By measuring the variations of the power consumed by different hardware nodes and processes of an applications we lay the ground to extrapolate the energy demand of large parallel HPC systems.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2008

Internal Timer Synchronization for Parallel Event Tracing

Jens Doleschal; Andreas Knüpfer; Matthias S. Müller; Wolfgang E. Nagel

Performance analysis and optimization is an important part of the development cycle of HPC applications. Among other prerequisites, it relies on highly precise timers, that are commonly not sufficiently synchronized, especially on distributed systems. Therefore, this paper presents a novel timer synchronization scheme especially adapted to parallel event tracing. It consists of two parts. Firstly, recording synchronization information during run-time and, secondly, subsequent correction, i.e. transformation of asynchronous local time stamps to synchronous global time stamps.

Explore More