Is this you? Create Your Porfile

Thomas Moschny

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas Moschny is active.

Explore More

Publication

Featured researches published by Thomas Moschny.

international parallel and distributed processing symposium | 2003

Transparent distributed threads for Java

Bernhard Haumacher; Thomas Moschny; Jürgen Reuter; Walter F. Tichy

Remote method invocation in Java RMI allows the flow of control to pass across local Java threads and thereby span multiple virtual machines. However, the resulting distributed threads do not strictly follow the paradigm of their local Java counterparts for at least three reasons. Firstly, the absence of a global thread identity causes problems when reentering monitors. Secondly, blocks synchronized on remote objects do not work properly. Thirdly, the thread interruption mechanism for threads executing a remote call is broken. These problems make multi-threaded distributed programming complicated and error prone. We present a two-level solution: On the library level, we extend KaRMI (Philippsen et al. (2000)), a fast replacement for RMI, with global thread identities for eliminating problems with monitor reentry. Problem with synchronization on remote objects are solved with a facility for remote monitor acquisition. Our interrupt forwarding mechanism enables the application to get full control over its distributed threads. On the language level, we integrate these extensions with JavaPartys transparent remote objects (Philippsen et al. (1997)) to get transparent distributed threads. We finally evaluate our approach with benchmarks that show costs and benefits of our overall design.

Proceedings of the 1st international workshop on Multicore software engineering | 2008

Finding synchronization defects in java programs: extended static analyses and code patterns

Frank Otto; Thomas Moschny

Concurrent programming is getting more and more important. Managing concurrency requires the usage of synchronization mechanisms, which is error-prone. Well-known examples for synchronization defects are deadlocks and race conditions. Detecting such errors is known to be difficult. There are several approaches to identify potential errors, but they either produce a high number of false positives or suffer from high computational overhead, catching only a small number of defects. Our approach uses static analysis techniques combined with points-to and may-happen-in-parallel (MHP) information to reduce the number of false positives. Additionally, we present code patterns indicating possible synchronization problems. We have implemented our approach using the Java framework Soot. Our tool was tested with small code examples, an open source web server, and commercial software. First results show that the number of false positives is reduced significantly.

international conference on parallel processing | 2013

The DEEP Project - Pursuing Cluster-Computing in the Many-Core Era

Norbert Eicker; Thomas Lippert; Thomas Moschny; Estela Suarez

Homogeneous cluster architectures dominating high-performance computing (HPC) today are challenged, in particular when thinking about reaching Exascale by the end of the decade, by heterogeneous approaches utilizing accelerator elements. The DEEP (Dynamical Exascale Entry Platform) project aims for implementing a novel architecture for high-performance computing consisting of two components - a standard HPC Cluster and a cluster of many-core processors called Booster. In order to make the adaptation of application codes to this Cluster-Booster architecture as seamless as possible, DEEP provides a complete programming environment. It integrates the offloading functionality given by the MPI standard with an abstraction layer based on the task-based OmpSs programming paradigm. This paper presents the DEEP project with an emphasis on the DEEP programming environment.

international parallel and distributed processing symposium | 2016

Non-intrusive Migration of MPI Processes in OS-Bypass Networks

Simon Pickartz; Carsten Clauss; Stefan Lankes; Stephan Krempel; Thomas Moschny; Antonello Monti

Load balancing, maintenance, and energy efficiency are key challenges for upcoming supercomputers. An indispensable tool for the accomplishment of these tasks is the ability to migrate applications during runtime. Especially in HPC, where any performance hit is frowned upon, such migration mechanisms have to come with minimal overhead. This constraint is usually not met by current practice adding further abstraction layers to the software stack. In this paper, we propose a concept for the migration of MPI processes communicating over OS-bypass networks such as InfiniBand. While being transparent to the application, our solution minimizes the runtime overhead by introducing a protocol for the shutdown of individual connections prior to the migration. It is implemented on the basis of an MPI library and evaluated using virtual machines based on KVM. Our evaluation reveals that the runtime overhead is negligible small. The migration time itself is mainly determined by the particular migration mechanism, whereas the additional execution time of the presented protocol converges to 2ms per connection if more than a few dozen connections are shut down at a time.

parallel, distributed and network-based processing | 2008

Runtime Locality Optimizations of Distributed Java Applications

Christian Huetter; Thomas Moschny

In distributed Java environments, locality of objects and threads is crucial for the performance of parallel applications. We introduce dynamic locality optimizations in the context of JavaParty, a programming and runtime environment for parallel Java applications. Until now, an optimal distribution of the individual objects of an application has to be found manually, which has several drawbacks. Based on a former static approach, we develop a dynamic methodology for automatic locality optimizations. By measuring processing and communication times of remote method calls at runtime, a placement strategy can be computed that maps each object of the distributed system to its optimal virtual machine. Objects then are migrated between the processing nodes in order to realize this placement strategy. We evaluate our approach by comparing the performance of two benchmark applications with manually distributed versions. It is shown that our approach is particularly suitable for dynamic applications where the optimal object distribution varies at runtime.

cluster computing and the grid | 2006

Integrating logical and physical file models in the MPI-IO implementation for "Clusterfile"

Florin Isaila; David E. Singh; Jesús Carretero; Félix García; Gábor Szeder; Thomas Moschny

This paper presents the design and implementation of the MPI-IO interface for the Clusterfile parallel file system. The approach offers the opportunity of achieving a high correlation between the file access patterns of parallel applications and the physical file distribution. First, any physical file distribution can be expressed by means of MPI data types. Second, mechanisms such as views and collective I/O operations are portably implemented inside the file system, unifying the I/O scheduling strategies of the MPI-IO library and the file system. The experimental section demonstrates performance benefits of more than one order of magnitude.

Concurrency and Computation: Practice and Experience | 2016

The DEEP Project An alternative approach to heterogeneous cluster‐computing in the many‐core era

Norbert Eicker; Thomas Lippert; Thomas Moschny; Estela Suarez

Homogeneous cluster architectures, which used to dominate high‐performance computing (HPC), are challenged today by heterogeneous approaches utilizing accelerator or co‐processor devices. The DEEP (Dynamical Exascale Entry Platform) project is implementing a novel architecture for HPC, in which a standard HPC cluster is directly connected to a so‐called ‘Booster’: a cluster of many‐core processors. By these means heterogeneity is organized differently as in todays standard approach, where accelerators are added to each node of the cluster. In order to adapt application codes to this Cluster‐Booster architecture as seamless as possible, DEEP has developed a complete programming environment. It integrates the offloading functionality given by the Message Passing Interface standard with an abstraction layer based on the task‐based OmpSs programming paradigm. This paper presents the DEEP project with an emphasis on the DEEP programming environment. Copyright

international conference on supercomputing | 2017

Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques

Antonio J. Peña; Vicenç Beltran; Carsten Clauss; Thomas Moschny

In this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the features that, being standard compliant, an MPI stack must support to provide the necessary fault tolerance guarantees, based on MPIs dynamic process management. Our results, including synthetic benchmarks and applications, reveal low runtime overhead and efficient recovery, demonstrating that the existing MPI standard provided us with sufficient mechanisms to implement an effective and efficient fault-tolerant solution.

parallel computing | 2004

Fast parallel I/O on parastation clusters

N. Eicker; Florin Isaila; Thomas Lippert; Thomas Moschny; Walter F. Tichy

7. Summary We have demonstrated that the ParaStation3 communication system speeds up the performance of parallel I/O on cluster computers such as ALiCE. I/O benchmarks with PVFS using Parastation over Myrinet achieve a throughput for write operations of up to 1 GB/s from a 32-processor compute partition, given a 32-processor PVFS I/O partition. These results outperform known benchmark results for PVFS on 1.28 Gbit Myrinet by more than a factor of 2, a fact that is mainly due to the superior communication features of ParaStation. Read performance from buffer cache reaches up to 2.2 GB/s, while reading from hard disk saturates at the cumulative hard disk performance. The I/O performance achieved with PVFS using ParaStation enables us to carry out extremely data-intensive eigenmode computations on ALiCE in the application field of lattice quantum chromodynamics. For the future, we plan to utilize the I/O system for storing and processing mass data in high energy physics data analysis on clusters.

Proceedings of the 1st COSH Workshop on Co-Scheduling of HPC Applications | 2016