Matthias Lieber
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matthias Lieber.
Parallel Tools Workshop | 2008
Andreas Knüpfer; Holger Brunst; Jens Doleschal; Matthias Jurenz; Matthias Lieber; Holger Mickler; Matthias S. Müller; Wolfgang E. Nagel
This paper presents the Vampir tool-set for performance analysis of parallel applications. It consists of the run-time measurement system VampirTrace and the visualization tools Vampir and VampirServer. It describes the major features and outlines the underlying implementation that is necessary to provide low overhead and good scalability. Furthermore, it gives a short overview about the development history and future work as well as related work.
extreme science and engineering discovery environment | 2012
Robert Henschel; Matthias Lieber; Le-Shin Wu; Phillip M. Nista; Brian J. Haas; Richard D. LeDuc
RNA-sequencing is a technique to study RNA expression in biological material. It is quickly gaining popularity in the field of transcriptomics. Trinity is a software tool that was developed for efficient de novo reconstruction of transcriptomes from RNA-Seq data. In this paper we first conduct a performance study of Trinity and compare it to previously published data from 2011. The version from 2011 is much slower than many other de novo assemblers and biologists have thus been forced to choose between quality and speed. We examine the runtime behavior of Trinity as a whole as well as its individual components and then optimize the most performance critical parts. We find that standard best practices for HPC applications can also be applied to Trinity, especially on systems with large amounts of memory. When combining best practices for HPC applications along with our specific performance optimization, we can decrease the runtime of Trinity by a factor of 3.9. This brings the runtime of Trinity in line with other de novo assemblers while maintaining superior quality. The purpose of this paper is to describe a series of improvements to Trinity, quantify the execution improvements achieved, and document the new version of the software.
parallel computing | 2010
Matthias Lieber; Verena Grützun; Ralf Wolke; Matthias S. Müller; Wolfgang E. Nagel
To study the complex interactions between cloud processes and the atmosphere, several atmospheric models have been coupled with detailed spectral cloud microphysics schemes. These schemes are computationally expensive, which limits their practical application. Additionally, our performance analysis of the model system COSMO-SPECS (atmospheric model of the Consortium for Small-scale Modeling coupled with SPECtral bin cloud microphysicS) shows a significant load imbalance due to the cloud model. To overcome this issue and enable dynamic load balancing, we propose the separation of the cloud scheme from the static partitioning of the atmospheric model. Using the framework FD4 (Four-Dimensional Distributed Dynamic Data structures), we show that this approach successfully eliminates the load imbalance and improves the scalability of the model system. We present a scalability analysis of the dynamic load balancing and coupling for two different supercomputers. The observed overhead is 6% on 1600 cores of an SGI Altix 4700 and less than 7% on a BlueGene/P system at 64Ki cores.
Concurrency and Computation: Practice and Experience | 2015
Amnon Barak; Zvi Drezner; Ely Levy; Matthias Lieber; Amnon Shiloh
Management of forthcoming exascale clusters requires frequent collection of run‐time information about the nodes and the running applications. This paper presents a new paradigm for providing online information to the management system of scalable clusters, consisting of a large number of nodes and one or more masters that manage these nodes. We describe the details of resilient gossip algorithms for sharing local information within subsets of nodes and for sending global information to a master, which holds information on all the nodes. The presented algorithms are decentralized, scalable and resilient, working well even when some nodes fail, without needing any recovery protocol. The paper gives formal expressions for approximating the average ages of the local information at each node and the information collected by the master. It then shows that these results closely match the results of simulations and measurements on a real cluster. The paper also investigates the resilience of the algorithms and the impact on the average age when nodes or masters fail. The main outcome of this paper is that partitioning of large clusters can improve the quality of information available to the management system without increasing the number of messages per node. Copyright
international workshop on runtime and operating systems for supercomputers | 2014
Ely Levy; Amnon Barak; Amnon Shiloh; Matthias Lieber; Carsten Weinhold; Hermann Härtig
Gossip algorithms can provide online information about the availability and the state of the resources in supercomputers. These algorithms require minimal computing and storage capabilities at each node and when properly tuned, they are not expected to overload the nodes or the network that connects these nodes. These properties make gossip interesting for future exascale systems. This paper examines the overhead of a decentralized gossip algorithm on the performance of parallel MPI applications running on up to 8192 nodes of an IBM BlueGene/Q supercomputer. The applications that were used in the experiments include PTRANS and MPI-FFT from the HPCC benchmark suite as well as the coupled weather and cloud simulation model COSMO-SPECS+FD4. In most cases, no gossip overhead was observed when the gossip messages were sent at intervals of 256ms or more. As expected, the overhead that is observed at higher rates is sensitive to the communication pattern of the application and the amount of gossip information being circulated.
Software for Exascale Computing | 2016
Carsten Weinhold; Adam Lackorzynski; Jan Bierbaum; Martin Küttler; Maksym Planeta; Hermann Härtig; Amnon Shiloh; Ely Levy; Tal Ben-Nun; Amnon Barak; Thomas Steinke; Thorsten Schütt; Jan Fajerski; Alexander Reinefeld; Matthias Lieber; Wolfgang E. Nagel
The FFMK project designs, builds and evaluates a system-software architecture to address the challenges expected in Exascale systems. In particular, these challenges include performance losses caused by the much larger impact of runtime variability within applications, hardware, and operating system (OS), as well as increased vulnerability to failures. The FFMK OS platform is built upon a multi-kernel architecture, which combines the L4Re microkernel and a virtualized Linux kernel into a noise-free, yet feature-rich execution environment. It further includes global, distributed platform management and system-level optimization services that transparently minimize checkpoint/restart overhead for applications. The project also researched algorithms to make collective operations fault tolerant in presence of failing nodes. In this paper, we describe the basic components, algorithms, and services we developed in Phase 2 of the project.
high performance computing systems and applications | 2014
Matthias Lieber; Wolfgang E. Nagel
The decomposition of one-dimensional workload arrays into consecutive partitions is a core problem of many load balancing methods, especially those based on space-filling curves. While previous work has shown that heuristics can be parallelized, only sequential algorithms exist for the optimal solution. However, centralized partitioning will become infeasible in the exascale era due to the vast amount of tasks to be mapped to millions of processors. In this work, we first introduce optimizations to a published exact algorithm. Further, we investigate a hierarchical approach which combines a parallel heuristic and an exact algorithm to form a scalable and high-quality 1D partitioning algorithm. We compare load balance, execution time, and task migration of the algorithms for up to 262 144 processes using real-life workload data. The results show a 300 times speed-up compared to an existing fast exact algorithm, while achieving nearly the optimal load balance.
ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010 | 2010
Matthias Lieber; Verena Grützun; Ralf Wolke; Matthias S. Müller; Wolfgang E. Nagel
More and more detailed simulation codes are developed promoted by the growing capability of high performance computers and the increasing knowledge about the underlying processes. This includes multiphase and multiphysics simulations as well as the coupling of multidisciplinary models. This paper introduces the framework FD4 (Four‐Dimensional Distributed Dynamic Data structures), which enables highly scalable implementations of multiphase models. The separation of the data structures for the single phases allows the use of different partitionings and also dynamic load balancing, which is essential to efficiently utilize high performance computers. The application of FD4 for an atmospheric modeling system with a detailed description of cloud processes successfully eliminates the load imbalance with a moderate overhead of 5% at 32k processors.
IEEE Transactions on Multi-Scale Computing Systems | 2018
Jeronimo Castrillon; Matthias Lieber; Sascha Klüppelholz; Marcus Völp; Nils Asmussen; Uwe Aßmann; Franz Baader; Christel Baier; Gerhard P. Fettweis; Jochen Fröhlich; Andrés Goens; Sebastian Haas; Dirk Habich; Hermann Härtig; Mattis Hasler; Immo Huismann; Tomas Karnagel; Sven Karol; Akash Kumar; Wolfgang Lehner; Linda Leuschner; Siqi Ling; Steffen Märcker; Christian Menard; Johannes Mey; Wolfgang E. Nagel; Benedikt Nöthen; Rafael Peñaloza; Michael Raitza; Jörg Stiller
Plenty of novel emerging technologies are being proposed and evaluated today, mostly at the device and circuit levels. It is unclear what the impact of different new technologies at the system level will be. What is clear, however, is that new technologies will make their way into systems and will increase the already high complexity of heterogeneous parallel computing platforms, making it ever so difficult to program them. This paper discusses a programming stack for heterogeneous systems that combines and adapts well-understood principles from different areas, including capability-based operating systems, adaptive application runtimes, dataflow programming models, and model checking. We argue why we think that these principles built into the stack and the interfaces among the layers will also be applicable to future systems that integrate heterogeneous technologies. The programming stack is evaluated on a tiled heterogeneous multicore.
Proceedings of the 23rd European MPI Users' Group Meeting on | 2016
Matthias Lieber; Kerstin Gößner; Wolfgang E. Nagel
Dynamic load balancing with diffusive methods is known to provide minimal load transfer and requires communication between neighbor nodes only. These are very attractive properties for highly parallel systems. We compare diffusive methods with state-of-the-art geometrical and graph-based partitioning methods on thousands of nodes. When load balancing overheads, i.e. repartitioning computation time and migration, have to be minimized, diffusive methods provide substantial benefits.