Michael Haidl | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Haidl is active.

Explore More

Publication

Featured researches published by Michael Haidl.

Proceedings of the 2014 LLVM Compiler Infrastructure in HPC on | 2014

PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

Michael Haidl; Sergei Gorlatch

We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has to be written using a device-dependent, explicitly parallel programming model, e.g., OpenCL or CUDA. This leads to long, poorly structured and error-prone codes. In PACXX, both host and device programs are written in the same programming language -- the newest C++14 standard, with all modern features including type inference (auto), variadic templates, generic lambda expressions, as well as STL containers and algorithms. We implement PACXX by a custom compiler (based on the Clang front-end and LLVM IR) and a runtime system, that together perform major tasks of memory management and data synchronization automatically and transparently for the programmer. We evaluate our approach by comparing it to CUDA and OpenCL regarding program size and target performance.

Parallel Processing Letters | 2014

High-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library

Michel Steuwer; Michael Haidl; Stefan Breuer; Sergei Gorlatch

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA. This makes development of stencil applications a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high-level programming abstractions with competitive performance on multi-GPU systems. SkelCL extends the OpenCL standard by three high-level features: 1) pre-implemented parallel patterns (a.k.a. skeletons); 2) container data types for vectors and matrices; 3) automatic data (re)distribution mechanism. We introduce two new SkelCL skeletons which specifically target stencil computations – MapOverlap and Stencil – and we describe their use for particular application examples, discuss their efficient parallel implementation, and report experimental results on systems with multiple GPUs. Our evaluation of three real-world applications shows that stencil code written with SkelCL is considerably shorter and offers competitive performance to hand-tuned OpenCL code.

Optics Express | 2013

Linearly polarized emission from random lasers with anisotropically amplifying media

Sebastian Knitter; Michael Kues; Michael Haidl; Carsten Fallnich

Simulations on three-dimensional random lasers were performed by finite-difference time-domain integration of Maxwells equations combined with rate-equations providing gain. We investigated the frequency-dependent emission polarization of random lasers in the far-field of the sample and characterized the influence of anisotropic pumping in orthogonal polarizations. Under weak scattering, the polarization states of random lasing modes were random for isotropic pumping and linear under anisotropic pumping. These findings are in accordance with recent experimental observations. A crossover was observed towards very strong scattering, in which the scattering destroys the pump-induced polarization-anisotropy of the random lasing modes and randomizes (scrambles) the mode-polarization.

The Journal of Supercomputing | 2017

A GPU parallelization of branch-and-bound for multiproduct batch plants optimization

Andrey Borisenko; Michael Haidl; Sergei Gorlatch

Branch-and-bound (B&B) is a popular approach to accelerate the solution of the optimization problems, but its parallelization on graphics processing units (GPUs) is challenging because of B&B’s irregular data structures and poor computation/communication ratio. The contributions of this paper are as follows: (1) we develop two CUDA-based implementations (iterative and recursive) of B&B on systems with GPUs for a practical application scenario—optimal design of multi-product batch plants, with a particular example of a chemical-engineering system (CES); (2) we propose and implement several optimizations of our CUDA code by reducing branch divergence and by exploiting the properties of the GPU memory hierarchy; and(3) we evaluate our implementations and their optimizations on a modern GPU-based system and we report our experimental results.

acm sigplan symposium on principles and practice of parallel programming | 2016

Multi-stage programming for GPUs in C++ using PACXX

Michael Haidl; Michel Steuwer; Tim Humernbrum; Sergei Gorlatch

Writing and optimizing programs for high performance on systems with Graphics Processing Units (GPUs) remains a challenging task even for expert programmers. A promising optimization technique is multi-stage programming -- evaluating parts of the program upfront on the CPU and embedding the computed values in the GPU code, thus allowing for more aggressive compiler optimizations. Unfortunately, such optimizations are not possible in CUDA, whereas to apply them in OpenCL, programmers are forced to manipulate the GPU source code as plain strings, which is error-prone and type-unsafe. In this paper, we describe PACXX -- our approach to GPU programming in C++, with the convenient features of modern C++14 standard: type deduction, lambda expressions, and algorithms from the standard template library (STL). Using PACXX, a GPU program is written as a single C++ program, rather than two distinct host and kernel programs. We extend PACXX with an easy-to-use and type-safe API for multi-stage programming avoiding the pitfalls of string manipulation. Using just-in-time compilation techniques, PACXX generates efficient GPU code at runtime. Our evaluation shows that using PACXX allows for writing multi-stage code easier and safer than currently possible in CUDA or OpenCL. With two application studies we demonstrate that multi-stage programs can significantly outperform equivalent non-staged versions. Furthermore, we show that PACXX generates code with high performance, comparable to industrial-strength OpenCL compilers.

international conference: beyond databases, architectures and structures | 2015

TripleID: A Low-Overhead Representation and Querying Using GPU for Large RDFs

Chantana Chantrapornchai; Chidchanok Choksuchat; Michael Haidl; Sergei Gorlatch

Resource Description Framework (RDF) is a commonly used format for semantic web processing. It basically contains strings representing terms and their relationships which can be queried or inferred. RDF is usually a large text file which contains many million relationships. In this work, we propose a framework, TripleID, for processing queries of large RDF data. The framework utilises Graphics Processing Units (GPUs) to search RDF relations. The RDF data is first transformed to the encoded form suitable for storing in the GPU memory. Then parallel threads on the GPU search the required data. We show in the experiments that one GPU on a personal desktop can handle 100 million triple relations, while a traditional RDF processing tool can process up to 10 million triples. Furthermore, we can query sample relations within 0.18 s with the GPU in 7 million triples, while the traditional tool takes at least 6 s for 1.8 million triples.

Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC | 2017

PACXXv2 + RV: An LLVM-based Portable High-Performance Programming Model

Michael Haidl; Simon Moll; Lars Klein; Huihui Sun; Sebastian Hack; Sergei Gorlatch

To achieve high performance on todays high-performance computing (HPC) systems multiple programming models have to be used. An example for this burden to the developer is OpenCL: the OpenCLs SPMD programming model must be used together with a host programming model, commonly C or C++. Different programming models require different compilers for code generation, which introduce challenges for the software developer, e. g., different compilers must be convinced to agree on basic properties like type layouts to avoid subtle bugs. Moreover, the resulting performance highly depends on the features of the used compilers and may vary unpredictably. We present PACXXv2 -- an LLVM based, single-source, single-compiler programming model which integrates explicitly parallel SPMD programming into C++. Our novel CPU back-end provides portable and predictable performance on various state-of-the-art CPU architectures comprising Intel x86 architectures, IBM Power8 and ARM Cortex CPUs. We efficiently integrate the Region Vectorizer (RV) into our back-end and exploit its whole function vectorization capabilities for our kernels. PACXXv2 utilizes C++ generalized attributes to transparently propagate information about memory allocations to the PACXX back-ends to enable additional optimizations. We demonstrate the high-performance capabilities of PACXXv2 together with RV on benchmarks from well-known benchmark suites and compare the performance of the generated code to Intels OpenCL driver and POCL -- the portable OpenCL project based on LLVM.

new trends in software methodologies, tools and techniques | 2015

Accelerating Keyword Search for Big RDF Web Data on Many-Core Systems

Chidchanok Choksuchat; Chantana Chantrapornchai; Michael Haidl; Sergei Gorlatch

Resource Description Framework (RDF) is the commonly used format for Semantic Web data. Nowadays, huge amounts of data on the Internet in the RDF format are used by search engines for providing answers to the queries of users. Querying through big data needs suitable searching methods supported by a very high processing power, because the traditional, sequential keyword matching on a semantic web server may take a prohibitively long time. In this paper, we aim at accelerating the search in big RDF data by exploiting modern many-core architectures based on Graphics Processing Units (GPUs). We develop several implementations of the RDF search for many-core architectures using two programming approaches: OpenMP for systems with CPUs and CUDA for systems comprising CPUs and GPUs. Experiments show that our approach is 20.5 times faster than the sequential search.

Proceedings of the Second Workshop on Optimizing Stencil Computations | 2014

HLSF: A High-Level; C++-Based Framework for Stencil Computations on Accelerators

Fabian Dütsch; Karim Djelassi; Michael Haidl; Sergei Gorlatch

The development of programs for modern systems with GPUs and other accelerators is a complex and error-prone task. The popular GPU programming approaches like CUDA and OpenCL require a deep knowledge of the underlying architecture to achieve good performance. We present HLSF -- a high-level framework that greatly simplifies the development of stencil-based applications on systems with accelerators. The main novel features of HLSF are as follows: 1) it provides a high-level interface for stencils that hides from the programmer the low-level management of the parallelism and memory on accelerators; 2) it allows the developer to write programs in the pure C++ style, using all convenient features of the most recent C++14 standard. Our experimental evaluation shows that the framework significantly reduces the programming effort for stencil-based applications, while delivering performance competitive to CUDA and OpenCL.

International Journal of Parallel Programming | 2018

High-Level Programming for Many-Cores Using C++14 and the STL

Michael Haidl; Sergei Gorlatch

Programming many-core systems with accelerators (e.g., GPUs) remains a challenging task, even for expert programmers. In the current, low-level approaches—OpenCL and CUDA—two distinct programming models are employed: the host code for the CPU is written in C/C++ with a restricted memory model, while the device code for the accelerator is written using a device-dependent model of CUDA or OpenCL. The programmer is responsible for explicitly specifying parallelism, memory transfers, and synchronization, and also for configuring the program and optimizing its performance for a particular many-core system. This leads to long, poorly structured and error-prone codes, often with a suboptimal performance. We present PACXX—an alternative, unified programming approach for accelerators. In PACXX, both host and device programs are written in the same programming language—the newest C++14 standard with the Standard Template Library (STL), including all modern features: type inference (auto), variadic templates, generic lambda expressions, and the newly proposed parallel extensions of the STL. PACXX includes an easy-to-use and type-safe API for multi-stage programming which allows for aggressive runtime compiler optimizations. We implement PACXX by developing a custom compiler (based on the Clang and LLVM frameworks) and a runtime system, that together perform memory management and synchronization automatically and transparently for the programmer. We evaluate our approach by comparing it to OpenCL regarding program size and target performance.

Explore More