Ben Lippmeier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ben Lippmeier is active.

Explore More

Publication

Featured researches published by Ben Lippmeier.

international conference on functional programming | 2010

Regular, shape-polymorphic, parallel arrays in Haskell

Gabriele Keller; Manuel M. T. Chakravarty; Roman Leshchinskiy; Simon L. Peyton Jones; Ben Lippmeier

We present a novel approach to regular, multi-dimensional arrays in Haskell. The main highlights of our approach are that it (1) is purely functional, (2) supports reuse through shape polymorphism, (3) avoids unnecessary intermediate structures rather than relying on subsequent loop fusion, and (4) supports transparent parallelisation. We show how to embed two forms of shape polymorphism into Haskells type system using type classes and type families. In particular, we discuss the generalisation of regular array transformations to arrays of higher rank, and introduce a type-safe specification of array slices. We discuss the runtime performance of our approach for three standard array algorithms. We achieve absolute performance comparable to handwritten C code. At the same time, our implementation scales well up to 8 processor cores.

international conference on functional programming | 2013

Optimising purely functional GPU programs

Trevor L. McDonell; Manuel M. T. Chakravarty; Gabriele Keller; Ben Lippmeier

Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slow-down is not acceptable on target hardware that is usually chosen to achieve high performance. In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures. Both techniques are well known from other contexts, but they present unique challenges for an embedded language compiled for execution on a GPU. We present novel methods for implementing sharing recovery and array fusion, and demonstrate their effectiveness on a set of benchmarks.

symposium/workshop on haskell | 2013

Guiding parallel array fusion with indexed types

Ben Lippmeier; Manuel M. T. Chakravarty; Gabriele Keller; Simon L. Peyton Jones

We present a refined approach to parallel array fusion that uses indexed types to specify the internal representation of each array. Our approach aids the client programmer in reasoning about the performance of their program in terms of the source code. It also makes the intermediate code easier to transform at compile-time, resulting in faster compilation and more reliable runtimes. We demonstrate how our new approach improves both the clarity and performance of several end-user written programs, including a fluid flow solver and an interpolator for volumetric data.

symposium/workshop on haskell | 2012

Efficient parallel stencil convolution in Haskell

Ben Lippmeier; Gabriele Keller

Stencil convolution is a fundamental building block of many scientific and image processing algorithms. We present a declarative approach to writing such convolutions in Haskell that is both efficient at runtime and implicitly parallel. To achieve this we extend our prior work on the Repa array library with two new features: partitioned and cursored arrays. Combined with careful management of the interaction between GHC and its back-end code generator LLVM, we achieve performance comparable to the standard OpenCV library.

symposium/workshop on haskell | 2013

Data flow fusion with series expressions in Haskell

Ben Lippmeier; Manuel M. T. Chakravarty; Gabriele Keller; Amos Robinson

Existing approaches to array fusion can deal with straight-line producer consumer pipelines, but cannot fuse branching data flows where a generated array is consumed by several different consumers. Branching data flows are common and natural to write, but a lack of fusion leads to the creation of an intermediate array at every branch point. We present a new array fusion system that handles branches, based on Waterss series expression framework, but extended to work in a functional setting. Our system also solves a related problem in stream fusion, namely the introduction of duplicate loop counters. We demonstrate speedup over existing fusion systems for several key examples.

Sigplan Notices | 2013

Vectorisation avoidance

Gabriele Keller; Manuel M. T. Chakravarty; Roman Leshchinskiy; Ben Lippmeier; Simon L. Peyton Jones

Flattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data structures and traversals are introduced, which may or may not be eliminated by subsequent optimisation. We present a novel program analysis to identify parts of the program where flattening would only introduce overhead, without appropriate gain. We present empirical evidence that avoiding vectorisation in these cases leads to more efficient programs than if we had applied vectorisation and then relied on array fusion to eliminate intermediates from the resulting code.

international conference on functional programming | 2012

Work efficient higher-order vectorisation

Ben Lippmeier; Manuel M. T. Chakravarty; Gabriele Keller; Roman Leshchinskiy; Simon L. Peyton Jones

Existing approaches to higher-order vectorisation, also known as flattening nested data parallelism, do not preserve the asymptotic work complexity of the source program. Straightforward examples, such as sparse matrix-vector multiplication, can suffer a severe blow-up in both time and space, which limits the practicality of this method. We discuss why this problem arises, identify the mis-handling of index space transforms as the root cause, and present a solution using a refined representation of nested arrays. We have implemented this solution in Data Parallel Haskell (DPH) and present benchmarks showing that realistic programs, which used to suffer the blow-up, now have the correct asymptotic work complexity. In some cases, the asymptotic complexity of the vectorised program is even better than the original.

functional high performance computing | 2016

Polarized data parallel data flow

Ben Lippmeier; Fil Mackay; Amos Robinson

We present an approach to writing fused data parallel data flow programs where the library API guarantees that the client programs run in constant space. Our constant space guarantee is achieved by observing that binary stream operators can be provided in several polarity versions. Each polarity version uses a different combination of stream sources and sinks, and some versions allow constant space execution while others do not. Our approach is embodied in the Repa Flow Haskell library, which we are currently using for production workloads at Vertigo.

functional high performance computing | 2014

Fusing filters with integer linear programming

Amos Robinson; Ben Lippmeier; Gabriele Keller

The key to compiling functional, collection oriented array programs into efficient code is to minimise memory traffic. Simply fusing subsequent array operations into a single computation is not sufficient; we also need to cluster separate traversals of the same array into a single traversal. Previous work demonstrated how Integer Linear Programming (ILP) can be used to cluster the operators in a general data-flow graph into subgraphs, which can be individually fused. However, these approaches can only handle operations which preserve the size of the array, thereby missing out on some optimisation opportunities. This paper addresses this shortcoming by extending the ILP approach with support for size-changing operations, using an external ILP solver to find good clusterings.

principles and practice of declarative programming | 2017

Machine fusion: merging merges, more or less

Amos Robinson; Ben Lippmeier

Compilers for stream programs often rely on a fusion transformation to convert the implied dataflow network into low-level iteration based code. Different fusion transformations handle different sorts of networks, with the distinguishing criteria being whether the network may contain splits and joins, and whether the set of fusible operators can be extended. We present the first fusion system that simultaneously satisfies all three of these criteria: networks can contain splits, joins, and new operators can be added to the system without needing to modify the overall fusion transformation.

Explore More