Andreas Knüpfer
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andreas Knüpfer.
Parallel Tools Workshop | 2008
Andreas Knüpfer; Holger Brunst; Jens Doleschal; Matthias Jurenz; Matthias Lieber; Holger Mickler; Matthias S. Müller; Wolfgang E. Nagel
This paper presents the Vampir tool-set for performance analysis of parallel applications. It consists of the run-time measurement system VampirTrace and the visualization tools Vampir and VampirServer. It describes the major features and outlines the underlying implementation that is necessary to provide low overhead and good scalability. Furthermore, it gives a short overview about the development history and future work as well as related work.
international conference on computational science | 2006
Andreas Knüpfer; Ronny Brendel; Holger Brunst; Hartmut Mix; Wolfgang E. Nagel
This paper introduces the new Open Trace Format. The first part provides a small overview about Trace Format Libraries in general and existing Formats/Libraries and their features. After that the important requirements are discussed. In particular it concerns efficient parallel and selective access to trace data. The following part presents design decisions and features of OTF comprehensively. Finally, there is some early evaluation of OTF. It features comparison of storage size for several examples as well as sequential and parallel I/O benchmarks. At the end, a conclusion will summarize the results and give some outlook.
Parallel Tools Workshop | 2012
Andreas Knüpfer; Christian Rössel; Dieter an Mey; Scott Biersdorff; Kai Diethelm; Dominic Eschweiler; Markus Geimer; Michael Gerndt; Daniel Lorenz; Allen D. Malony; Wolfgang E. Nagel; Yury Oleynik; Peter Philippen; Pavel Saviankou; Dirk Schmidl; Sameer Shende; Ronny Tschüter; Michael Wagner; Bert Wesarg; Felix Wolf
This paper gives an overview about the Score-P performance measurement infrastructure which is being jointly developed by leading HPC performance tools groups. It motivates the advantages of the joint undertaking from both the developer and the user perspectives, and presents the design and components of the newly developed Score-P performance measurement infrastructure. Furthermore, it contains first evaluation results in comparison with existing performance tools and presents an outlook to the long-term cooperative development of the new system.
international conference on parallel processing | 2005
Andreas Knüpfer; Wolfgang E. Nagel
Compressed complete call graphs (cCCGs) are a newly developed memory data structure for event based program traces. The most important advantage over linear lists or arrays traditionally used is the ability to apply lossy or lossless data compression. The compression scheme is completely transparent with respect to read access decompression is not required. This approach is a new way to cope with todays challenges when analyzing enormous amounts of trace data. The article focuses on CCG construction and compression, querying and evaluation are briefly covered.
Archive | 2011
Dieter an Mey; Scott Biersdorf; Christian H. Bischof; Kai Diethelm; Dominic Eschweiler; Michael Gerndt; Andreas Knüpfer; Daniel Lorenz; Allen D. Malony; Wolfgang E. Nagel; Yury Oleynik; Christian Rössel; Pavel Saviankou; Dirk Schmidl; Sameer Shende; Michael Wagner; Bert Wesarg; Felix Wolf
The rapidly growing number of cores on modern supercomputers imposes scalability demands not only on applications but also on the software tools needed for their development. At the same time, increasing application and system complexity makes the optimization of parallel codes more difficult, creating a need for scalable performance-analysis technology with advanced functionality. However, delivering such an expensive technology can hardly be accomplished by single tool developers and requires higher degrees of collaboration within the HPC community. The unified performance-measurement system Score-P is a joint effort of several academic performance-tool builders, funded under the BMBF program HPC-Software fur skalierbare Parallelrechner in the SILC project (Skalierbare Infrastruktur zur automatischen Leistungsanalyse paralleler Codes). It is being developed with the objective of creating a common basis for several complementary optimization tools in the service of enhanced scalability, improved interoperability, and reduced maintenance cost.
parallel, distributed and network-based processing | 2005
Andreas Knüpfer; Holger Brunst; Wolfgang E. Nagel
This paper presents a joint effort to make huge event traces accessible for interactive program analysis. It combines a distributed software architecture with compressible data structures and customized query algorithms. The advanced technologies are discussed both theoretically and practically. Based on a proof-of-concept implementation the response times for typical queries are presented to show the practical relevance of this combined approach.
Future Generation Computer Systems | 2006
Andreas Knüpfer; Wolfgang E. Nagel
The paper presents a new compressible memory data structure for trace events. Its primary intention is to aid the analysis of huge traces by reducing the memory requirements significantly. Furthermore, customized evaluation algorithms reduce the computational effort. The data structure as well as algorithms for construction and evaluation are discussed in detail. Experiments with real-life traces demonstrate the theoretically derived capabilities of the new approach.
european conference on parallel processing | 2014
Karl Fürlinger; Colin W. Glass; José Gracia; Andreas Knüpfer; Jie Tao; Denis Hünich; Kamran Idrees; Matthias Maiterth; Yousri Mhedheb; Huan Zhou
DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library. Operator overloading is used to provide global-view PGAS semantics without the need for a custom PGAS (pre-)compiler. The DASH library is implemented on top of our runtime system DART, which provides an abstraction layer on top of existing one-sided communication substrates. DART contains methods to allocate memory in the global address space as well as collective and one-sided communication primitives. To support the development of applications that exploit a hierarchical organization, either on the algorithmic or on the hardware level, DASH features the notion of teams that are arranged in a hierarchy. Based on a team hierarchy, the DASH data structures support locality iterators as a generalization of the conventional local/global distinction found in many PGAS approaches.
Computer Science - Research and Development | 2010
Daniel Hackenberg; Robert Schöne; Daniel Molka; Matthias S. Müller; Andreas Knüpfer
The power consumption of an HPC system is not only a major concern due to the huge associated operational cost. It also poses high demands on the infrastructure required to operate such a system. The power consumption strongly depends on the executed workload and is influenced by the system hard- and software and its setup. In this paper we analyze the power consumption of a 32-node cluster across a wide range of parallel applications using the SPEC MPI2007 benchmark. By measuring the variations of the power consumed by different hardware nodes and processes of an applications we lay the ground to extrapolate the energy demand of large parallel HPC systems.
high performance distributed computing | 2012
Thomas Ilsche; Joseph Schuchart; Jason Cope; Dries Kimpe; Terry Jones; Andreas Knüpfer; Kamil Iskra; Robert B. Ross; Wolfgang E. Nagel; Stephen W. Poole
Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel file system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impact of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing toolset to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processes.