Clemens Grelck | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Clemens Grelck is active.

Explore More

Publication

Featured researches published by Clemens Grelck.

International Journal of Parallel Programming | 2006

SAC: a functional array language for efficient multi-threaded execution

Clemens Grelck; Sven-Bodo Scholz

We give an in-depth introduction to the design of our functional array programming language SaC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The language design of SaC aims at combining high-level, compositional array programming with fully automatic resource management for highly productive code development and maintenance. We outline the compilation process that maps SaC programs to computing machinery. Here, our focus is on optimisation techniques that aim at restructuring entire applications from nested compositions of general fine-grained operations into specialised coarse-grained operations. We present our implicit parallelisation technology for shared memory architectures based on multi-threading and discuss further optimisation opportunities on this level of code generation. Both optimisation and parallelisation rigorously exploit the absence of side-effects and the explicit data flow characteristic of a functional setting.

Parallel Processing Letters | 2008

A GENTLE INTRODUCTION TO S-NET: TYPED STREAM PROCESSING AND DECLARATIVE COORDINATION OF ASYNCHRONOUS COMPONENTS

Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko

We present the design of S-NET, a coordination language and component technology based on stream processing. S-NET achieves a near-complete separation between application code, written in a conventional programming language, and coordination code, written in S-NET itself. S-NET boxes integrate existing sequential code as stream-processing components into streaming networks, whose construction is based on algebraic formulae built out of four network combinators. Subtyping on the level of boxes and networks and a tailor-made inheritance mechanism achieve flexible software reuse.

Journal of Functional Programming | 2005

Shared memory multiprocessor support for functional array processing in SAC

Clemens Grelck

Classical application domains of parallel computing are dominated by processing large arrays of numerical data. Whereas most functional languages focus on lists and trees rather than on arrays, SAC is tailor-made in design and in implementation for efficient high-level array processing. Advanced compiler optimizations yield performance levels that are often competitive with low-level imperative implementations. Based on SAC, we develop compilation techniques and runtime system support for the compiler-directed parallel execution of high-level functional array processing code on shared memory architectures. Competitive sequential performance gives us the opportunity to exploit the conceptual advantages of the functional paradigm for achieving real performance gains with respect to existing imperative implementations, not only in comparison with uniprocessor runtimes. While the design of SAC facilitates parallelization, the particular challenge of high sequential performance is that realization of satisfying speedups through parallelization becomes substantially more difficult. We present an initial compilation scheme and multi-threaded execution model, which we step-wise refine to reduce organizational overhead and to improve parallel performance. We close with a detailed analysis of the impact of certain design decisions on runtime performance, based on a series of experiments.

International Journal of Parallel Programming | 2010

Asynchronous Stream Processing with S-Net

Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko

We present the rationale and design of S-Net, a coordination language for asynchronous stream processing. The language achieves a near-complete separation between the application code, written in any conventional programming language, and the coordination/communication code written in S-Net. Our approach supports a component technology with flexible software reuse. No extension of the conventional language is required. The interface between S-Net and the application code is in terms of one additional library function. The application code is componentised and presented to S-Net as a set of components, called boxes, each encapsulating a single tuple-to-tuple function. Apart from the boxes defined using an external compute language, S-Net features two built-in boxes: one for network housekeeping and one for data-flow style synchronisation. Streaming network composition under S-Net is based on four network combinators, which have both deterministic and nondeterministic versions. Flexible software reuse is comprehensive, with the box interfaces and even the network structure being subject to subtyping. We propose an inheritance mechanism, named flow inheritance, that is specifically geared towards stream processing. The paper summarises the essential language constructs and type concepts and gives a short application example.

international conference on conceptual structures | 2010

Parallel signal processing with S-Net

Frank Penczek; Stephan Herhut; Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko; Eric Lenormand

We argue that programming high-end stream-processing applications requires a form of coordination language that enables the designer to represent interactions between stream-processing functions asynchronously. We further argue that the level of abstraction that current programming tools engender should be drastically increased and present a coordination language and component technology that is suitable for that purpose. We demonstrate our approach on a real radar-data processing application from which we reuse all existing components and present speed-ups that we were able to achieve on contemporary multi-core hardware.

Parallel Processing Letters | 2003

SAC - From High-level Programming with Arrays to Efficient Parallel Execution.

Clemens Grelck; Sven-Bodo Scholz

SAC is a purely functional array processing language designed with numerical applications in mind. It supports generic, high-level program specifications in the style of APL. However, rather than providing a fixed set of built-in array operations, SAC provides means to specify such operations in the language itself in a way that still allows their application to arrays of any rank and size. This paper illustrates the major steps in compiling generic, rank- and shape-invariant SAC specifications into efficiently executable multithreaded code for parallel execution on shared memory multiprocessors. The effectiveness of the compilation techniques is demonstrated by means of a small case study on the PDE1 benchmark, which implements 3-dimensional red/black successive over-relaxation. Comparisons with HPF and ZPL show that despite the genericity of code, SAC achieves highly competitive runtime performance characteristics.

international parallel and distributed processing symposium | 2002

Implementing the NAS benchmark MG in SAC

Clemens Grelck

SAC is a purely functional array processing language designed with numerical applications in mind. It supports generic, high-level program specifications in the style of APL. However, rather than providing a fixed set of builtin array operations, SAC provides means to specify such operations in the language itself in a way that still allows their application to arrays of any dimension and size. This paper illustrates the specificational benefits of this approach by means of a high-level SAC implementation of the NAS benchmark MG realizing 3-dimensional multigrid relaxation with periodic boundary conditions.Despite the high-level approach, experiments show that by means of aggressive compiler optimizations SAC manages to achieve performance characteristics in the range of low-level Fortran and C implementations. For benchmark size class A, SAC is outperformed by the serial Fortran-77 reference implementation of the benchmark by only 23%, whereas SAC itself outperforms a C implementation by the same figure. Furthermore, implicit parallelization of the SAC code for shared memory multiprocessors achieves a speedup of 7.6 with 10 processors. With these figures, SAC outperforms both automatic parallelization of the serial Fortran-77 reference implementation as well as an OpenMP solution based on C code.

international parallel and distributed processing symposium | 2007

Coordinating Data Parallel SAC Programs with S-Net

Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko

We propose a two-layered approach for exploiting different forms of concurrency in complex systems: we specify computational components in our functional array language SAC, which exploits data parallel properties of array processing code. The declarative stream processing language S-Net is used to orchestrate the collaborative behaviour of these components in a streaming network. We illustrate our approach by a hybrid implementation of a sudoku puzzle solver as a representative for more complex search problems.

implementation and application of functional languages | 2008

Implementation architecture and multithreaded runtime system of S-NET

Clemens Grelck; Frank Penczek

S-NET is a declarative coordination language and component technology aimed at modern multi-core/many-core architectures and systems-on-chip. It builds on the concept of stream processing to structure networks of communicating asynchronous components, which can be implemented using a conventional (sequential) language. In this paper we present the architecture of our S-NET implementation. After sketching out the interplay between compiler and runtime system, we characterise the deployment and operational behaviour of our multithreaded runtime system for contemporary multi-core processors. Preliminary runtime figures demonstrate the effectiveness of our approach.

parallel computing | 2006

Merging compositions of array skeletons in SAC

Clemens Grelck; Sven-Bodo Scholz

The design of skeletons for expressing concurrent computations usually faces a conflict between software engineering demands and performance issues. Whereas the former favour versatile fine-grain skeletons that can be successively combined into larger programs, coarse-grain skeletons are more desirable from a performance perspective.We describe a way out of this dilemma for array skeletons. In the functional array language SAC we internally represent individual array skeletons by one or more meta skeletons, called WITH-loops. The design of WITH-loops is carefully chosen to be versatile enough to cope with a large variety of skeletons, yet to be simple enough to allow for compilation into efficiently executable (parallel) code. Furthermore, WITH-loops are closed with respect to three tailor-made optimisation techniques, that systematically transform compositions of simple, computationally light-weight skeletons into few complex and computationally heavier-weight skeletons.

Explore More