Jost Berthold | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jost Berthold is active.

Explore More

Publication

Featured researches published by Jost Berthold.

practical aspects of declarative languages | 2008

Hierarchical master-worker skeletons

Jost Berthold; Mischa Dieterle; Rita Loogen; Steffen Priebe

Master-worker systems are a well-known and often applicable scheme for the parallel evaluation of a pool of tasks, a work pool. The system consists of a master process managing a set of worker processes. After an initial phase with a fixed amount of tasks for each worker, further tasks are distributed in reply to results sent back by the workers. As this setup quickly leads to a bottleneck in the master process, the paper investigates techniques for hierarchically nesting the basic master-worker scheme. We present implementations of hierarchical master-worker skeletons, and how to automatically calculate parameters of the nested skeleton for good performance. Nesting master-worker systems is nontrivial especially in cases where newtasks are dynamically created from previous results (typically breadthordepth-first treesearchalgorithms).Wediscusshowtohandledynamically growing pools in a hierarchy and present a declarative implementation for nested master-worker systems with dynamic task creation. The skeletons are experimentally evaluated with two typical test programs. We analyse their runtime behaviour and the effects of different hierarchies on runtimes via trace visualisations.

Parallel Processing Letters | 2003

Automatic skeletons in Template Haskell

Kevin Hammond; Jost Berthold; Rita Loogen

This paper uses Template Haskell to automatically select appropriate skeleton implementations in the Eden parallel dialect of Haskell. The approach allows implementation parameters to be statically tuned according to architectural cost models based on source analyses. This permits us to target a range of parallel architecture classes from a single source specification. A major advantage of the approach is that cost models are user-definable and can be readily extended to new data or computation structures etc.

european conference on parallel processing | 2009

Implementing Parallel Google Map-Reduce in Eden

Jost Berthold; Mischa Dieterle; Rita Loogen

Recent publications have emphasised map-reduce as a general programming model (labelled Google map-reduce), and described existing high-performance implementations for large data sets. We present two parallel implementations for this Google map-reduce skeleton, one following earlier work, and one optimised version, in the parallel Haskell extension Eden. Edens specific features, like lazy stream processing, dynamic reply channels, and nondeterministic stream merging, support the efficient implementation of the complex coordination structure of this skeleton. We compare the two implementations of the Google map-reduce skeleton in usage and performance, and deliver runtime analyses for example applications. Although very flexible, the Google map-reduce skeleton is often too general, and typical examples reveal a better runtime behaviour using alternative skeletons.

european conference on parallel processing | 2003

High-Level Process Control in Eden

Jost Berthold; Ulrike Klusik; Rita Loogen; Steffen Priebe; Nils Weskamp

High-level control of parallel process behaviour simplifies the development of parallel software substantially by freeing the programmer from low-level process management and coordination details. The latter are handled by a sophisticated runtime system which controls program execution. In this paper we look behind the scenes and show how the enormous gap between high-level parallel language constructs and their low-level implementation has been bridged in the implementation of the parallel functional language Eden. The main idea has been to implement the process control in a functional language and to restrict the extensions of the low-level runtime system to a few selected primitive operations.

functional high performance computing | 2012

Financial software on GPUs: between Haskell and Fortran

Cosmin E. Oancea; Christian Andreetta; Jost Berthold; Alain Frisch; Fritz Henglein

This paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU). The language issues refer to preserving algorithmic invariants, e.g., inherent parallelism made explicit by map-reduce-scan functional combinators. Efficient execution is achieved by manually; applying a series of generally-applicable compiler transformations that allows the generated-OpenCL code to yield speedups as high as 70x and 540x on a commodity mobile and desktop GPU, respectively. Apart from the concrete speed-ups attained, our contributions are twofold: First, from a language perspective;, we illustrate that even state-of-the-art auto-parallelization techniques are incapable of discovering all the requisite data parallelism when rendering the functional code in Fortran-style imperative array processing form. Second, from a performance perspective;, we study which compiler transformations are necessary to map the high-level functional code to hand-optimized OpenCL code for GPU execution. We discover a rich optimization space with nontrivial trade-offs and cost models. Memory reuse in map-reduce patterns, strength reduction, branch divergence optimization, and memory access coalescing, exhibit significant impact individually. When combined, they enable essentially full utilization of all GPU cores. Functional programming has played a crucial double role in our case study: Capturing the naturally data-parallel structure of the pricing algorithm in a transparent, reusable and entirely hardware-independent fashion; and supporting the correctness of the subsequent compiler transformations to a hardware-oriented target language by a rich class of universally valid equational properties. Given the observed difficulty of automatically parallelizing imperative sequential code and the inherent labor of porting hardware-oriented and -optimized programs, our case study suggests that functional programming technology can facilitate high-level; expression of leading-edge performant portable; high-performance systems for massively parallel hardware architectures.

international symposium on functional and logic programming | 2010

A skeleton for distributed work pools in eden

Mischa Dieterle; Jost Berthold; Rita Loogen

We present a flexible skeleton for implementing distributed work pools in our parallel functional language Eden. The skeleton manages a pool of tasks (work pool) in a distributed manner using a demand-driven work stealing approach for load balancing. All coordination is done locally within the worker processes. The latter are arranged in a ring topology and exchange additional channels to shortcut communication paths. The skeleton is suited for different types of algorithms, namely simple data parallel ones and standard tree search algorithms like backtracking, and using a global state as needed for branch-and-bound. Runtime experiments reveal a stable runtime behaviour for the different algorithm classes as illustrated by activity profiles (timeline diagrams). Acceptable speedups can be achieved with low effort.

parallel computing technologies | 2009

Parallel FFT with Eden Skeletons

Jost Berthold; Mischa Dieterle; Oleg Lobachev; Rita Loogen

The paper investigates and compares skeleton-based Eden implementations of different FFT-algorithms on workstation clusters with distributed memory. Our experiments show that the basic divide-and-conquer versions suffer from an inherent input distribution and result collection problem. Advanced approaches like calculating FFT using a parallel map-and-transpose skeleton provide more flexibility to overcome these problems. Assuming a distributed access to input data and re-organising computation to return results in a distributed way improves the parallel runtime behaviour.

international symposium on parallel and distributed processing and applications | 2008

Parallelism without Pain: Orchestrating Computational Algebra Components into a High-Performance Parallel System

Abdallah Al Zain; Philip W. Trinder; Kevin Hammond; Alexander Konovalov; Steve Linton; Jost Berthold

This paper describes a very high-level approach that aims to orchestrate sequential components written using high-level domain-specific programming into high-performance parallel applications. By achieving this goal, we hope to make parallel programming more accessible to experts in mathematics, engineering and other domains. A key feature of our approach is that parallelism is achieved without any modification to the underlying sequential computational algebra systems, or to the user-level components: rather, all orchestration is performed at an outer level, with sequential components linked through a standard communication protocol, the Symbolic Computing Software Composability Protocol, SCSCP. Despite the generality of our approach, our results show that we are able to achieve very good, and even, in some cases, super-linear, speedups on clusters of commodity workstations: up to a factor of 33.4 on a 28-processor cluster. We are, moreover, able to parallelise a wider variety of problem, and achieve higher performance than typical specialist parallel computational algebra implementations.

international conference on computational science | 2004

Towards a Generalised Runtime Environment for Parallel Haskells

Jost Berthold

Implementations of parallel dialects (or: coordination languages) on a functional base (or: computation) language always have to extend complex runtime environments by the even more complex parallelism to maintain a high level of abstraction. Starting from two parallel dialects of the purely functional language Haskell and their implementations, we generalise the characteristics of Haskell-based parallel language implementations, abstracting over low-level details. This generalisation is the basis for a shared runtime environment which can support different coordination concepts and alleviate the implementation of new constructs by a well-defined API and a layered structure.

international conference on parallel processing | 2009

Comparing and Optimising Parallel Haskell Implementations for Multicore Machines

Jost Berthold; Simon Marlow; Kevin Hammond; Abdallah Al Zain

In this paper, we investigate the differences and tradeoffs imposed by two parallel Haskell dialects running on multicore machines. GpH and Eden are both constructed using the highly-optimising sequential GHC compiler, and share thread scheduling, and other elements, from a common code base. The GpH implementation investigated here uses a physically-shared heap, which should be well-suited to multicore architectures. In contrast, the Eden implementation adopts an approach that has been designed for use on distributed-memory parallel machines: a system of multiple, independent heaps (one per core), with inter-core communication handled by message-passing rather than through shared heap cells. We report two main results. Firstly, we report on the effect of a number of optimisations that we applied to the shared-memory GpH implementation in order to address some performance issues that were revealed by our testing: for example, we implemented a work-stealing approach to task allocation. Our optimisations improved the performance of the shared-heap GpH implementation by as much as 30% on eight cores. Secondly, the shared heap approach is, rather surprisingly, not superior to a distributed heap implementation: both give similar performance results.

Explore More