Andreas Knüpfer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andreas Knüpfer is active.

Explore More

Publication

Featured researches published by Andreas Knüpfer.

Parallel Tools Workshop | 2008

The Vampir Performance Analysis Tool-Set

Andreas Knüpfer; Holger Brunst; Jens Doleschal; Matthias Jurenz; Matthias Lieber; Holger Mickler; Matthias S. Müller; Wolfgang E. Nagel

This paper presents the Vampir tool-set for performance analysis of parallel applications. It consists of the run-time measurement system VampirTrace and the visualization tools Vampir and VampirServer. It describes the major features and outlines the underlying implementation that is necessary to provide low overhead and good scalability. Furthermore, it gives a short overview about the development history and future work as well as related work.

international conference on computational science | 2006

Introducing the open trace format (OTF)

Andreas Knüpfer; Ronny Brendel; Holger Brunst; Hartmut Mix; Wolfgang E. Nagel

This paper introduces the new Open Trace Format. The first part provides a small overview about Trace Format Libraries in general and existing Formats/Libraries and their features. After that the important requirements are discussed. In particular it concerns efficient parallel and selective access to trace data. The following part presents design decisions and features of OTF comprehensively. Finally, there is some early evaluation of OTF. It features comparison of storage size for several examples as well as sequential and parallel I/O benchmarks. At the end, a conclusion will summarize the results and give some outlook.

Parallel Tools Workshop | 2012

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope,Scalasca, TAU, and Vampir

Andreas Knüpfer; Christian Rössel; Dieter an Mey; Scott Biersdorff; Kai Diethelm; Dominic Eschweiler; Markus Geimer; Michael Gerndt; Daniel Lorenz; Allen D. Malony; Wolfgang E. Nagel; Yury Oleynik; Peter Philippen; Pavel Saviankou; Dirk Schmidl; Sameer Shende; Ronny Tschüter; Michael Wagner; Bert Wesarg; Felix Wolf

This paper gives an overview about the Score-P performance measurement infrastructure which is being jointly developed by leading HPC performance tools groups. It motivates the advantages of the joint undertaking from both the developer and the user perspectives, and presents the design and components of the newly developed Score-P performance measurement infrastructure. Furthermore, it contains first evaluation results in comparison with existing performance tools and presents an outlook to the long-term cooperative development of the new system.

international conference on parallel processing | 2005

Construction and compression of complete call graphs for post-mortem program trace analysis

Andreas Knüpfer; Wolfgang E. Nagel

Compressed complete call graphs (cCCGs) are a newly developed memory data structure for event based program traces. The most important advantage over linear lists or arrays traditionally used is the ability to apply lossy or lossless data compression. The compression scheme is completely transparent with respect to read access decompression is not required. This approach is a new way to cope with todays challenges when analyzing enormous amounts of trace data. The article focuses on CCG construction and compression, querying and evaluation are briefly covered.

Archive | 2011

Score-P: A Unified Performance Measurement System for Petascale Applications

Dieter an Mey; Scott Biersdorf; Christian H. Bischof; Kai Diethelm; Dominic Eschweiler; Michael Gerndt; Andreas Knüpfer; Daniel Lorenz; Allen D. Malony; Wolfgang E. Nagel; Yury Oleynik; Christian Rössel; Pavel Saviankou; Dirk Schmidl; Sameer Shende; Michael Wagner; Bert Wesarg; Felix Wolf

The rapidly growing number of cores on modern supercomputers imposes scalability demands not only on applications but also on the software tools needed for their development. At the same time, increasing application and system complexity makes the optimization of parallel codes more difficult, creating a need for scalable performance-analysis technology with advanced functionality. However, delivering such an expensive technology can hardly be accomplished by single tool developers and requires higher degrees of collaboration within the HPC community. The unified performance-measurement system Score-P is a joint effort of several academic performance-tool builders, funded under the BMBF program HPC-Software fur skalierbare Parallelrechner in the SILC project (Skalierbare Infrastruktur zur automatischen Leistungsanalyse paralleler Codes). It is being developed with the objective of creating a common basis for several complementary optimization tools in the service of enhanced scalability, improved interoperability, and reduced maintenance cost.

parallel, distributed and network-based processing | 2005

High performance event trace visualization

Andreas Knüpfer; Holger Brunst; Wolfgang E. Nagel

This paper presents a joint effort to make huge event traces accessible for interactive program analysis. It combines a distributed software architecture with compressible data structures and customized query algorithms. The advanced technologies are discussed both theoretically and practically. Based on a proof-of-concept implementation the response times for typical queries are presented to show the practical relevance of this combined approach.

Future Generation Computer Systems | 2006

Compressible memory data structures for event-based trace analysis

Andreas Knüpfer; Wolfgang E. Nagel

The paper presents a new compressible memory data structure for trace events. Its primary intention is to aid the analysis of huge traces by reducing the memory requirements significantly. Furthermore, customized evaluation algorithms reduce the computational effort. The data structure as well as algorithms for construction and evaluation are discussed in detail. Experiments with real-life traces demonstrate the theoretically derived capabilities of the new approach.

european conference on parallel processing | 2014

DASH: Data Structures and Algorithms with Support for Hierarchical Locality

Karl Fürlinger; Colin W. Glass; José Gracia; Andreas Knüpfer; Jie Tao; Denis Hünich; Kamran Idrees; Matthias Maiterth; Yousri Mhedheb; Huan Zhou

DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library. Operator overloading is used to provide global-view PGAS semantics without the need for a custom PGAS (pre-)compiler. The DASH library is implemented on top of our runtime system DART, which provides an abstraction layer on top of existing one-sided communication substrates. DART contains methods to allocate memory in the global address space as well as collective and one-sided communication primitives. To support the development of applications that exploit a hierarchical organization, either on the algorithmic or on the hardware level, DASH features the notion of teams that are arranged in a hierarchy. Based on a team hierarchy, the DASH data structures support locality iterators as a generalization of the conventional local/global distinction found in many PGAS approaches.

Computer Science - Research and Development | 2010

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks

Daniel Hackenberg; Robert Schöne; Daniel Molka; Matthias S. Müller; Andreas Knüpfer

The power consumption of an HPC system is not only a major concern due to the huge associated operational cost. It also poses high demands on the infrastructure required to operate such a system. The power consumption strongly depends on the executed workload and is influenced by the system hard- and software and its setup. In this paper we analyze the power consumption of a 32-node cluster across a wide range of parallel applications using the SPEC MPI2007 benchmark. By measuring the variations of the power consumed by different hardware nodes and processes of an applications we lay the ground to extrapolate the energy demand of large parallel HPC systems.

high performance distributed computing | 2012

Enabling event tracing at leadership-class scale through I/O forwarding middleware

Thomas Ilsche; Joseph Schuchart; Jason Cope; Dries Kimpe; Terry Jones; Andreas Knüpfer; Kamil Iskra; Robert B. Ross; Wolfgang E. Nagel; Stephen W. Poole

Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel file system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impact of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing toolset to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processes.

Explore More