Thomas Soddemann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas Soddemann is active.

Explore More

Publication

Featured researches published by Thomas Soddemann.

Facing the Multicore-Challenge | 2013

GASPI – A Partitioned Global Address Space Programming Interface

Thomas Alrutz; Jan Backhaus; Thomas Brandes; Vanessa End; Thomas Gerhold; Alfred Geiger; Daniel Grünewald; Vincent Heuveline; Jens Jägersküpper; Andreas Knüpfer; Olaf Krzikalla; Edmund Kügeler; Carsten Lojewski; Guy Lonsdale; Ralph Müller-Pfefferkorn; Wolfgang E. Nagel; Lena Oden; Franz-Josef Pfreundt; Mirko Rahn; Michael Sattler; Mareike Schmidtobreick; Annika Schiller; Christian Simmendinger; Thomas Soddemann; Godehard Sutmann; Henning Weber; Jan-Philipp Weiss

At the threshold to exascale computing, limitations of the MPI programming model become more and more pronounced. HPC programmers have to design codes that can run and scale on systems with hundreds of thousands of cores. Setting up accordingly many communication buffers, point-to-point communication links, and using bulk-synchronous communication phases is contradicting scalability in these dimensions. Moreover, the reliability of upcoming systems will worsen.

Computer Science - Research and Development | 2013

Using LAMA for efficient AMG on hybrid clusters

Jiri Kraus; Malte Förster; Thomas Brandes; Thomas Soddemann

In this paper, we describe the implementation of an AMG solver for a hybrid cluster that exploits distributed and shared memory parallelization and uses the available GPU accelerators on each node. This solver has been written by using LAMA (Library for Accelerated Math Applications). This library does not only provide an easy-to-use framework for solvers that might run on different devices with different matrix formats, but also comes with features to optimize and hide communication and memory transfers between CPUs and GPUs. These features are explained and their impact on the efficiency of the AMG solver is shown in this paper. The benchmark results show that an efficient use of hybrid clusters is even possible for multi-level methods like AMG where fast solutions are needed on all levels for multiple problem sizes.

Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores | 2015

Hardware-Aware Automatic Code-Transformation to Support Compilers in Exploiting the Multi-Level Parallel Potential of Modern CPUs

Dustin Feld; Thomas Soddemann; Michael Jünger; Sven Mallach

Modern compilers offer more and more capabilities to automatically parallelize code-regions if these match certain properties. However, there are several application kernels that, although rather simple transformations would suffice in order to make them match these properties, are either not at all parallelized by state-of-the-art compilers or could at least be improved w.r.t. their performance. This paper proposes a loop-tiling approach focusing on automatic vectorization and multi-core parallelization, with emphasis on a smart cache exploitation. The method is based on polyhedral code transformations that are applied as a pre-compilation step and it is shown to help compilers in generating more and better parallel code-regions. It automatically adapts to hardware parameters such as the SIMD register width and cache sizes. Further, it takes memory-access patterns into account and is capable to minimize communication among tiles that are to be processed by different cores. An extensive computational study shows significant improvements in the number of instructions vectorized, cache miss rates, and running times for a range of application kernels. The method often outperforms the internal auto-parallelization techniques implemented into gcc and icc.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2015

Multicore Processors and Graphics Processing Unit Accelerators for Parallel Retrieval of Aerosol Optical Depth From Satellite Data: Implementation, Performance, and Energy Efficiency

Jia Liu; Dustin Feld; Yong Xue; Jochen Garcke; Thomas Soddemann

Quantitative retrieval is a growing area in remote sensing due to the rapid development of remote instruments and retrieval algorithms. The aerosol optical depth (AOD) is a significant optical property of aerosol which is involved in further applications such as the atmospheric correction of remotely sensed surface features, monitoring of volcanic eruptions or forest fires, air quality, and even climate changes from satellite data. The AOD retrieval can be computationally expensive as a result of huge amounts of remote sensing data and compute-intensive algorithms. In this paper, we present two efficient implementations of an AOD retrieval algorithm from the moderate resolution imaging spectroradiometer (MODIS) satellite data. Here, we have employed two different high performance computing architectures: multicore processors and a graphics processing unit (GPU). The compute unified device architecture C (CUDA-C) has been used for the GPU implementation for NVIDIAs graphic cards and open multiprocessing (OpenMP) for thread-parallelism in the multicore implementation. We observe for the GPU accelerator, a maximal overall speedup of 68.x for the studied data, whereas the multicore processor achieves a reasonable 7.x speedup. Additionally, for the largest benchmark input dataset, the GPU implementation also shows a great advantage in terms of energy efficiency with an overall consumption of 3.15 kJ compared to 58.09 kJ on a CPU with 1 thread and 38.39 kJ with 16 threads. Furthermore, the retrieval accuracy of all implementations has been checked and analyzed. Altogether, using the GPU accelerator shows great advantages for an application in AOD retrieval in both performance and energy efficiency metrics. Nevertheless, the multicore processor provides the easier programmability for the majority of todays programmers. Our work exploits the parallel implementations, the performance, and the energy efficiency features of GPU accelerators and multicore processors. With this paper, we attempt to give suggestions to geoscientists demanding for efficient desktop solutions.

International Journal of Digital Earth | 2016

An efficient geosciences workflow on multi-core processors and GPUs: a case study for aerosol optical depth retrieval from MODIS satellite data

Jia Liu; Dustin Feld; Yong Xue; Jochen Garcke; Thomas Soddemann; Peiyuan Pan

ABSTRACT Quantitative remote sensing retrieval algorithms help understanding the dynamic aspects of Digital Earth. However, the Big Data and complex models in Digital Earth pose grand challenges for computation infrastructures. In this article, taking the aerosol optical depth (AOD) retrieval as a study case, we exploit parallel computing methods for high efficient geophysical parameter retrieval. We present an efficient geocomputation workflow for the AOD calculation from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data. According to their individual potential for parallelization, several procedures were adapted and implemented for a successful parallel execution on multi-core processors and Graphics Processing Units (GPUs). The benchmarks in this paper validate the high parallel performance of the retrieval workflow with speedups of up to 5.x on a multi-core processor with 8 threads and 43.x on a GPU. To specifically address the time-consuming model retrieval part, hybrid parallel patterns which combine the multi-core processor’s and the GPU’s compute power were implemented with static and dynamic workload distributions and evaluated on two systems with different CPU–GPU configurations. It is shown that only the dynamic hybrid implementation leads to a greatly enhanced overall exploitation of the heterogeneous hardware environment in varying circumstances.

international conference on parallel processing | 2009

A data management system for UNICORE 6

Tobias Schlauch; Anastasia Eifer; Thomas Soddemann; Andreas Schreiber

Data produced in scientific and industrial applications is growing exponentially but most resource middleware systems lack of appropriate support for data and metadata management. In particular easy and intuitive retrieval of data for later use is a serious problem. In this context the paper proposes a pragmatic approach for data management of distributed data with focus on appropriate means for data organization improving data retrieval. The paper presents the key concepts and architecture of a dedicated data management system for sharing data located on heterogeneous storage resources. The different specifics of storage systems such as data object names, data locations, and data access methods are abstracted to allow transparent data access. Moreover, the system provides means for data structuring and organization by supporting custom data models and annotation of individual metadata on data objects. Current development status of the system is illustrated by presenting an integration with the UNICORE Rich Client which has been validated in the context of the AeroGrid project.

Archive | 2017

Energy-Efficiency and Performance Comparison of Aerosol Optical Depth Retrieval on Distributed Embedded SoC Architectures

Dustin Feld; Jochen Garcke; Jia Liu; Eric Schricker; Thomas Soddemann; Yong Xue

The Aerosol Optical Depth (AOD) is a significant optical property of aerosols and is applied to the atmospheric correction of remotely sensed surface features as well as for monitoring volcanic eruptions, forest fires, and air quality in general, as well as gathering data for climate predictions on the basis of observations from satellites. We have developed an AOD retrieval workflow for processing satellite data not only with ordinary CPUs but also with parallel processors and GPU accelerators in a distributed hardware environment. This workflow includes pre-processing procedures which are followed by the runtime dominating main retrieval method.

Archive | 2017

The LAMA Approach for Writing Portable Applications on Heterogenous Architectures

Thomas Brandes; Eric Schricker; Thomas Soddemann

Ensuring longevity and maintainability of modern software applications is mandatory for a proper return on investment. Since the hardware landscape is changing rapidly and will continue to do so, it is imperative to take on those topics also in the HPC (High Performance Computing, uses parallel processing for running advanced application programs efficiently.) domain where applications traditionally have a long live-span. For recent years, we have observed a trend towards more and more heterogeneous systems in computing. Realizing the performance promises of the hardware vendors is a huge challenge to the software developer. Portability is the second challenge to be met in this context. In this paper we present our library LAMA (Library for Accelerated Math Applications). LAMA is a framework for developing hardware-independent, high performance code for heterogeneous computing systems. We created this library to address both challenges successfully in the realm of linear algebra and numerical mathematics. We introduce our solutions to heterogeneous memory and kernel management as well as our solutions to task parallelism. In the end we do performance and scalability benchmarks drawing a comparison to PETSc (Portable, Extensible Toolkit for Scientific Computation, open-source project developed at the Argonne National Laboratory.) for the example of a CG (Conjugate Gradient method, an algorithm for the numerical solution of particular systems of linear equations) solver.

international conference on cluster computing | 2016

VarySched: A Framework for Variable Scheduling in Heterogeneous Environments

Tim SuB; Nils Döring; Ramy Gad; Lars Nagel; André Brinkmann; Dustin Feld; Thomas Soddemann; Stefan Lankes

Despite many efforts to better utilize the potential of GPUs and CPUs, it is far from being fully exploited. Although many tasks can be easily sped up by using accelerators, most of the existing schedulers are not flexible enough to really optimize the resource usage of the complete system. The main reasons are (i) that each processing unit requires a specific program code and that this code is often not provided for every task, and (ii) that schedulers may follow the run-until-completion model and, hence, disallow resource changes during runtime. In this paper, we present VarySched, a configurable task scheduler framework tailored to efficiently utilize all available computing resources in a system. VarySched allows a more fine-grained task-to-resource placement which is even further enhanced by allowing the tasks to migrate to another resource during their runtime. In addition, VarySched can manage multiple scheduling strategies - optimizing, for instance, throughput or energy efficiency - and switch between them at any time.

Annales Des Télécommunications | 2010

UNICORE 6 - Recent and Future Advancements

Achim Streit; Piotr Bała; Alexander Beck-Ratzka; Krzysztof Benedyczak; Sandra Bergmann; Rebecca Breu; Jason Milad Daivandy; Bastian Demuth; Anastasia Eifer; André Giesler; Björn Hagemeier; Sonja Holl; Valentina Huber; Nadine Lamla; Daniel Mallmann; Ahmed Shiraz Memon; Mohammad Shahbaz Memon; Michael Rambadt; Morris Riedel; Mathilde Romberg; Bernd Schuller; Tobias Schlauch; Andreas Schreiber; Thomas Soddemann; Wolfgang Ziegler

Explore More