Sven-Bodo Scholz
Heriot-Watt University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sven-Bodo Scholz.
International Journal of Parallel Programming | 2006
Clemens Grelck; Sven-Bodo Scholz
We give an in-depth introduction to the design of our functional array programming language SaC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The language design of SaC aims at combining high-level, compositional array programming with fully automatic resource management for highly productive code development and maintenance. We outline the compilation process that maps SaC programs to computing machinery. Here, our focus is on optimisation techniques that aim at restructuring entire applications from nested compositions of general fine-grained operations into specialised coarse-grained operations. We present our implicit parallelisation technology for shared memory architectures based on multi-threading and discuss further optimisation opportunities on this level of code generation. Both optimisation and parallelisation rigorously exploit the absence of side-effects and the explicit data flow characteristic of a functional setting.
Parallel Processing Letters | 2008
Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko
We present the design of S-NET, a coordination language and component technology based on stream processing. S-NET achieves a near-complete separation between application code, written in a conventional programming language, and coordination code, written in S-NET itself. S-NET boxes integrate existing sequential code as stream-processing components into streaming networks, whose construction is based on algebraic formulae built out of four network combinators. Subtyping on the level of boxes and networks and a tailor-made inheritance mechanism achieve flexible software reuse.
International Journal of Parallel Programming | 2010
Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko
We present the rationale and design of S-Net, a coordination language for asynchronous stream processing. The language achieves a near-complete separation between the application code, written in any conventional programming language, and the coordination/communication code written in S-Net. Our approach supports a component technology with flexible software reuse. No extension of the conventional language is required. The interface between S-Net and the application code is in terms of one additional library function. The application code is componentised and presented to S-Net as a set of components, called boxes, each encapsulating a single tuple-to-tuple function. Apart from the boxes defined using an external compute language, S-Net features two built-in boxes: one for network housekeeping and one for data-flow style synchronisation. Streaming network composition under S-Net is based on four network combinators, which have both deterministic and nondeterministic versions. Flexible software reuse is comprehensive, with the box interfaces and even the network structure being subject to subtyping. We propose an inheritance mechanism, named flow inheritance, that is specifically geared towards stream processing. The paper summarises the essential language constructs and type concepts and gives a short application example.
international conference on conceptual structures | 2010
Frank Penczek; Stephan Herhut; Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko; Eric Lenormand
We argue that programming high-end stream-processing applications requires a form of coordination language that enables the designer to represent interactions between stream-processing functions asynchronously. We further argue that the level of abstraction that current programming tools engender should be drastically increased and present a coordination language and component technology that is suitable for that purpose. We demonstrate our approach on a real radar-data processing application from which we reuse all existing components and present speed-ups that we were able to achieve on contemporary multi-core hardware.
workshop on declarative aspects of multicore programming | 2011
Jing Guo; Jeyarajan Thiyagalingam; Sven-Bodo Scholz
Over recent years, the use of Graphics Processing Units (GPUs) for general-purpose computing has become increasingly popular. The main reasons for this development are the attractive performance/price and performance/power ratios of these architectures. However, substantial performance gains from GPUs come at a price: they require extensive programming expertise and, typically, a substantial re-coding effort. Although the programming experience has been significantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices. Directive-based approaches such as hiCUDA or OpenMP-variants offer further improvements but have not eliminated the need for the expertise on these complex architectures. Similarly, special purpose programming languages such as Microsofts Accelerator try to lower the barrier further. They provide the programmer with a special form of GPU data structures and operations on them which are then compiled into GPU code. In this paper, we take this trend towards a completely implicit, high-level approach yet another step further. We generate CUDA code from a MATLAB-like high level functional array programming language, Single Assignment C (SaC). To do so, we identify which data structures and operations can be successfully mapped on GPUs and transform existing programs accordingly. This paper presents the first runtime results from our GPU backend and it presents the basic set of GPU-specific program optimisations that turned out to be essential. Despite our high-level program specifications, we show that for a number of benchmarks speedups between a factor of 5 and 50 can be achieved through our parallelising compiler.
Parallel Processing Letters | 2003
Clemens Grelck; Sven-Bodo Scholz
SAC is a purely functional array processing language designed with numerical applications in mind. It supports generic, high-level program specifications in the style of APL. However, rather than providing a fixed set of built-in array operations, SAC provides means to specify such operations in the language itself in a way that still allows their application to arrays of any rank and size. This paper illustrates the major steps in compiling generic, rank- and shape-invariant SAC specifications into efficiently executable multithreaded code for parallel execution on shared memory multiprocessors. The effectiveness of the compilation techniques is demonstrated by means of a small case study on the PDE1 benchmark, which implements 3-dimensional red/black successive over-relaxation. Comparisons with HPF and ZPL show that despite the genericity of code, SAC achieves highly competitive runtime performance characteristics.
international parallel and distributed processing symposium | 2007
Clemens Grelck; Sven-Bodo Scholz; Alexander V. Shafarenko
We propose a two-layered approach for exploiting different forms of concurrency in complex systems: we specify computational components in our functional array language SAC, which exploits data parallel properties of array processing code. The declarative stream processing language S-Net is used to orchestrate the collaborative behaviour of these components in a streaming network. We illustrate our approach by a hybrid implementation of a sudoku puzzle solver as a representative for more complex search problems.
parallel computing | 2006
Clemens Grelck; Sven-Bodo Scholz
The design of skeletons for expressing concurrent computations usually faces a conflict between software engineering demands and performance issues. Whereas the former favour versatile fine-grain skeletons that can be successively combined into larger programs, coarse-grain skeletons are more desirable from a performance perspective.We describe a way out of this dilemma for array skeletons. In the functional array language SAC we internally represent individual array skeletons by one or more meta skeletons, called WITH-loops. The design of WITH-loops is carefully chosen to be versatile enough to cope with a large variety of skeletons, yet to be simple enough to allow for compilation into efficiently executable (parallel) code. Furthermore, WITH-loops are closed with respect to three tailor-made optimisation techniques, that systematically transform compositions of simple, computationally light-weight skeletons into few complex and computationally heavier-weight skeletons.
international conference on parallel processing | 2010
Frank Penczek; Stephan Herhut; Sven-Bodo Scholz; Alexander V. Shafarenko; Jungsook Yang; Chun-Yi Chen; Nader Bagherzadeh; Clemens Grelck
Development and implementation of the coordination language S-NET has been reported previously. In this paper we apply the S-NET design methodology to a computer graphics problem. We demonstrate (i) how a complete separation of concerns can be achieved between algorithm engineering and concurrency engineering and (ii) that the S-NET implementation is quite capable of achieving performance that matches what can be achieved using low-level tools such as MPI. We find this remarkable as under S-NET communication, concurrency and synchronization are completely separated from algorithmic code. We argue that our approach delivers a flexible component technology which liberates application developers from the logistics of task and data management while at the same time making it unnecessary for a distributed computing professional to acquire detailed knowledge of the application area.
implementation and application of functional languages | 2005
Clemens Grelck; Karsten Hinckfuß; Sven-Bodo Scholz
With are versatile array comprehensions used in the functional array language SaC to implement aggregate array operations that are applicable to arrays of any rank and shape. We describe the fusion of with as a novel optimisation technique to improve both the data locality of compiled code in general and the synchronisation behaviour of compiler-parallelised code in particular. Some experiments demonstrate the impact of With-loop-fusion on the runtime performance of compiled SaC code.