Is this you? Create Your Porfile

Jens Doleschal

Dresden University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jens Doleschal is active.

Explore More

Publication

Featured researches published by Jens Doleschal.

Parallel Tools Workshop | 2008

The Vampir Performance Analysis Tool-Set

Andreas Knüpfer; Holger Brunst; Jens Doleschal; Matthias Jurenz; Matthias Lieber; Holger Mickler; Matthias S. Müller; Wolfgang E. Nagel

This paper presents the Vampir tool-set for performance analysis of parallel applications. It consists of the run-time measurement system VampirTrace and the visualization tools Vampir and VampirServer. It describes the major features and outlines the underlying implementation that is necessary to provide low overhead and good scalability. Furthermore, it gives a short overview about the development history and future work as well as related work.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2008

Internal Timer Synchronization for Parallel Event Tracing

Jens Doleschal; Andreas Knüpfer; Matthias S. Müller; Wolfgang E. Nagel

Performance analysis and optimization is an important part of the development cycle of HPC applications. Among other prerequisites, it relies on highly precise timers, that are commonly not sufficiently synchronized, especially on distributed systems. Therefore, this paper presents a novel timer synchronization scheme especially adapted to parallel event tracing. It consists of two parts. Firstly, recording synchronization information during run-time and, secondly, subsequent correction, i.e. transformation of asynchronous local time stamps to synchronous global time stamps.

high performance computing systems and applications | 2014

Selective runtime monitoring: Non-intrusive elimination of high-frequency functions

Michael Wagner; Jens Doleschal; Andreas Knüpfer; Wolfgang E. Nagel

High performance computing (HPC) systems are getting more and more powerful but also more and more complex. Supportive environments such as performance analysis tools are essential to assist developers in utilizing the computing resources of such complex systems. One of the most urgent challenges in event based performance analysis is the enormous amount of collected data. In particular, the recording of high-frequency short-running functions such as getter/setter class methods produces enormous amounts of data while in the same time contributing very less to an analysis of the overall application behavior. In this paper we address the impact of high-frequency function calls and present a method to minimize the amount of stored heavily-used functions while still keeping outliers that have an impact on the applications behavior. We propose a hierarchical memory buffer that is capable to discard recorded function calls when their duration is smaller than a pre-defined lower bound. We demonstrate the capabilities of our method with a prototype implementation that is based on the Open Trace Format 2, a state-of-the-art Open Source event trace library used by the performance analysis tools VAMPIR, SCALASCA, and TAU.

Proceedings of the First Workshop on Visual Performance Analysis | 2014

Visualization of performance data for MPI applications using circular hierarchies

Felix Schmitt; Robert Dietrich; Rene Kuss; Jens Doleschal; Andreas Knüpfer

One of the challenges for the developer of highly-parallel MPI applications running on distributed high performance computing systems is to understand the complex behavior of their applications. It requires to identify inefficiencies, and to optimize them such that communication waiting times can be reduced. This task can only be accomplished with the help of elaborated tools that provide insight into the details of the application using an automatic analysis or an intuitive visualization approach. While the first can only target a specific problem domain, the latter allows humans to discuss performance problems with a broader view and from multiple perspectives. We present a new visualization technique for performance data of MPI applications based on circular hierarchies. It intuitively presents communication patterns and allows developers to correlate those with arbitrary performance metrics. A hierarchy-aware layout increases scalability and helps to identify communication inefficiencies by analyzing and integrating the systems hardware topology. We discuss both our approach as well as its integration into the Score-P performance analysis work flow. Its applicability is presented with a real-world use case of the COSMO+SPECS+FD4 climate simulation code.

Proceedings of the 20th European MPI Users' Group Meeting on | 2013

Runtime message uniquification for accurate communication analysis on incomplete MPI event traces

Michael Wagner; Jens Doleschal; Wolfgang E. Nagel; Andreas Knüpfer

Communication analysis of parallel applications based on event traces depends on correct matching of associated MPI send and receive events. Selective monitoring techniques, however, may result in incomplete MPI event traces and, in that case, current matching strategies fail. In this paper we introduce an additional unique identifier for each message to make MPI events distinguishable from others. Therefore, it is possible to identify missing MPI events and match all remaining MPI events correctly. An overhead study with a real-life application and a benchmark suite demonstrates the applicability and benefits of this approach.

Parallel Tools Workshop | 2013

Generic Support for Remote Memory Access Operations in Score-P and OTF2

Andreas Knüpfer; Robert Dietrich; Jens Doleschal; Markus Geimer; Marc-André Hermanns; Christian Rössel; Ronny Tschüter; Bert Wesarg; Felix Wolf

Remote memory access (RMA) describes the ability of a process to access all or parts of the memory belonging to a remote process directly, without explicit participation of the remote side. There are a number of parallel programming models based on RMA operations that are relevant for High Performance Computing (HPC). On the one hand, Partitioned Global Address Space (PGAS) language extensions use RMA operations as underlying communication substrate, e.g. Co-Array Fortran and UPC. On the other hand, RMA programming APIs provide so called one-sided data transfer primitives as an alternative to the classic two-sided message passing. In this paper, we describe how Score-P, a scalable performance measurement infrastructure for parallel applications, is extended to support trace-based performance analyses of RMA parallelization models. Emphasis is given to the generic event model we designed to record RMA operations in the OTF2 trace format across a range of one-sided APIs and libraries.

Concurrency and Computation: Practice and Experience | 2017

Using adaptive runtime filtering to support an event-based performance analysis.

Jonas Stolle; Michael Wagner; Jens Doleschal; Felix Schmitt; Holger Brunst

Event‐based performance monitoring and analysis are effective means when tuning parallel applications for optimal resource usage. In this article, we address the data capacity challenge that arises when applying the tracing methodology to large‐scale parallel applications and long execution times. Existing approaches use static, pre‐defined event filters to reduce the performance data to a manageable size. In contrast, we propose self‐guided filters that automatically adapt to an applications runtime behaviour and therefore, do not require any previous knowledge or application executions. Our contribution consists of four adaptive runtime filters, which target a specific type of data redundancy each. The filters focus on detecting identical events in loop iterations, constant events with no variation in time, and very short, highly frequent, typically not very meaningful events, having a severe impact on the total data volume. We evaluate our prototype implementation with five real‐world applications and achieve a data reduction of two orders of magnitude while increasing execution time less than 1%. Likewise, we show that the qualitative impact of our filters on performance analysis in state‐of‐the‐art analysis tools can be reduced by adding feedback methods and statistical information to the filtered traces. Copyright

international conference on high performance computing and simulation | 2015

Tracing long running applications: A case study using Gromacs

Michael Wagner; Jens Doleschal; Andreas Knüpfer

Performance analysis is inevitable to develop applications that utilize the enormous capabilities of current HPC systems. While many recent tool studies focused on large scales, performance analysis of long-running applications has not been paid much attention. This paper investigates challenges that arise from monitoring long-running real-life applications, in particular, the disruptive bias of intermediate memory buffer flushes in the measurement environment. We propose a concept for an in-memory event tracing that completely avoids intermediate memory buffer flushes. We evaluate to which extent such an in-memory event tracing workflow helps overcoming the critical properties, such as resulting trace size, application slow down, and measurement bias. We utilize a prototype implementation, based on Score-P and OTF2, with the molecular dynamics packages Gromacs, an application currently infeasible to monitor in a full production run.

computational science and engineering | 2015

Adaptive Runtime Filtering: Reducing Trace Size and Bias in Event-Based Performance Analysis

Jonas Stolle; Michael Wagner; Jens Doleschal; Felix Schmitt; Holger Brunst

In this paper we address the problem of massive event trace sizes, one of the most urgent challenges in the performance analysis of large-scale parallel applications. Reducing trace sizes during the application runtime decreases application slow down, eliminates measurement bias, and cuts down stress on the underlying file system. Previous approaches use static filters to decrease trace size, which relies on preceding knowledge about the application or, otherwise, delivers poor results. In contrast, we propose runtime filters that automatically adapt to an applications runtime behavior and, therefore, do not require any prior knowledge. We present and compare four adaptive runtime filters: for regions that are leaf nodes in the call tree, for regions with similar duration, for activities within iterations, and for blocks of activities with repetitive behavior. We evaluate a prototype implementation of these filters based on the stateof-the-art trace collector Score-P and the Open Trace Format 2 trace library with five real-life applications and achieved a trace size reduction of up to two orders of magnitude and an additional overhead of less than one percent in average.

International Conference on Exascale Applications and Software | 2014

Towards Detailed Exascale Application Analysis — Selective Monitoring and Visualisation

Jens Doleschal; Thomas William; Bert Wesarg; Johannes Ziegenbalg; Holger Brunst; Andreas Knüpfer; Wolfgang E. Nagel

We introduce novel ideas involving aspect-oriented instrumentation, Multi-Faceted Program Monitoring, as well as novel techniques for a selective and detailed event-based application performance analysis, with an eye toward exascale. We give special attention to the spatial, temporal, and level-of-detail aspects of the three important phases of compile-time filtering, application execution, and runtime filtering. We use an event-based monitoring approach to allow selected and focused performance analysis.

Explore More