Philipp Kegel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philipp Kegel is active.

Explore More

Publication

Featured researches published by Philipp Kegel.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

SkelCL - A Portable Skeleton Library for High-Level GPU Programming

Michel Steuwer; Philipp Kegel; Sergei Gorlatch

While CUDA and OpenCL made general-purpose programming for Graphics Processing Units (GPU) popular, using these programming approaches remains complex and error-prone because they lack high-level abstractions. The especially challenging systems with multiple GPU are not addressed at all by these low-level programming models. We propose SkelCL -- a library providing so-called algorithmic skeletons that capture recurring patterns of parallel computation and communication, together with an abstract vector data type and constructs for specifying data distribution. We demonstrate that SkelCL greatly simplifies programming GPU systems. We report the competitive performance results of SkelCL using both a simple Mandelbrot set computation and an industrial-strength medical imaging application. Because the library is implemented using OpenCL, it is portable across GPU hardware of different vendors.

Behavior Research Methods | 2009

Behavioral phenotyping of a murine model of Alzheimer's disease in a seminaturalistic environment using RFID tracking.

Lars Lewejohann; Anne Hoppmann; Philipp Kegel; Mareike Kritzler; Antonio Krüger; Norbert Sachser

Neurodegen erative disorders such as Alzheimer’s disease (AD) are increasingly threatening public health. Most animal models of AD consist of transgenic mice that are usually housed singly or in unisexual groups in small barren cages. Such restricted environments, however, prevent the mice from showing a variety of speciesspecific behaviors and consequently may constrain comprehensive behavioral phenotyping. On the other hand, allowing the animals to freely organize their lives in a spacious physically and socially enriched environment makes behavioral phenotyping laborious and time consuming. Radio frequency identification (RFID) using a network of antennae and small glass-coated transponders labeling each individual allows for gathering spatiotemporal information about a large number of individuals in parallel. The aim of this project was to use the RFID technique to facilitate the characterization of mice carrying a genetic disposition to develop AD-like pathology and of their wild-type conspecifics in a spacious seminaturalistic environment.

international parallel and distributed processing symposium | 2012

dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

Philipp Kegel; Michel Steuwer; Sergei Gorlatch

Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we presentd OpenCL (Distributed OpenCL) - a uniform approach to programming distributed heterogeneous systems with accelerators. d OpenCL extends the OpenCL standard, such that arbitrary computing devices installed on any node of a distributed system can be used together within a single application. OpenCL allows moving data and program code to these devices in a transparent, portable manner. Sinced OpenCL is designed as a fully-fledged implementation of the OpenCL API, it allows running existing OpenCL applications in a heterogeneous distributed environment without any modifications. We describe in detail the mechanisms that are required to implement OpenCL for distributed systems, including a device management mechanism for running multiple applications concurrently. Using three application studies, we compare the performance of dOpenCL with MPI+OpenCL and a standard OpenCL implementation.

european conference on parallel processing | 2009

Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

Philipp Kegel; Maraike Schellmann; Sergei Gorlatch

We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.

acm/ieee international conference on mobile computing and networking | 2008

Indoor tracking of laboratory mice via an rfid-tracking framework

Mareike Kritzler; Stephanie Jabs; Philipp Kegel; Antonio Krüger

In this paper a solution for tracking of laboratory mice in an indoor semi natural environment based on RIFD-Technology is presented. A tracking framework is built where combined sensors identify and track the mice continuously 24 hours a day and 7 days a week. Besides the hardware setup for the data collection we present a software solution which prototypically implements an analytic module for the mouse movements.

international parallel and distributed processing symposium | 2012

Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

Michel Steuwer; Philipp Kegel; Sergei Gorlatch

Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches - CUDA and OpenCL - are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper is built on top of the OpenCL standard and offers pre-implemented recurring computation and communication patterns (skeletons) which greatly simplify programming for multi-GPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between the systems main memory and multiple GPUs. In this paper, we focus on the specific support in SkelCL for systems with multiple GPUs and use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous systems in order to simplify their programming.

Journal of Parallel and Distributed Computing | 2013

dOpenCL: Towards uniform programming of distributed heterogeneous multi-/many-core systems

Philipp Kegel; Michel Steuwer; Sergei Gorlatch

Modern computer systems become increasingly distributed and heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the systems full performance potential. In this paper, we present dOpenCL (distributed OpenCL)-a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL allows the user to run unmodified existing OpenCL applications in a heterogeneous distributed environment. We describe the challenges of implementing the OpenCL programming model for distributed systems, as well as its extension for running multiple applications concurrently. Using several example applications, we compare the performance of dOpenCL with MPI + OpenCL and standard OpenCL implementations.

parallel computing technologies | 2011

Optimal design of multi-product batch plants using a parallel branch-and-bound method

Andrey Borisenko; Philipp Kegel; Sergei Gorlatch

In this paper we develop and implement a parallel algorithm for a real-world application: finding optimal designs for multiproduct batch plants. We describe two parallelization strategies - for systems with shared-memory and distributed-memory - based on the branch-and-bound paradigm and implement them using OpenMP (Open Multi-Processing) and MPI (Message Passing Interface), correspondingly. Experimental results demonstrate that our approach provides competitive speedup on modern clusters of multi-core processors.

Concurrency and Computation: Practice and Experience | 2011

Comparing programming models for medical imaging on multi-core systems

Philipp Kegel; Maraike Schellmann; Sergei Gorlatch

Multi‐core processors offer a huge potential of parallelism but pose a challenge of program development for achieving high performance in real applications. We compare three popular parallel programming models—POSIX threads (Pthreads), OpenMP, and Threading Building Blocks (TBB)—regarding their use for multi‐core systems. We analyze how these models can be employed for implementing various parallelizations of a real‐world application from the area of medical imaging, and we conduct extensive runtime experiments to measure performance. Our main contribution is a comprehensive comparison of Pthreads, OpenMP, and TBB with respect to the following criteria: program development effort, programming style, level of abstraction, and runtime performance on multi‐cores. Copyright

new trends in software methodologies, tools and techniques | 2012

A High-Level Programming Approach for Distributed Systems with Accelerators.

Michel Steuwer; Philipp Kegel; Sergei Gorlatch

Application programming for modern heterogeneous systems which comprise multiple accelerators (multi-core CPUs and GPUs) is complex and error-prone. Popular approaches, like OpenCL and CUDA, are low-level and offer no support for the two most complicated issues: 1) programming multiple GPUs within a stand-alone computer, and 2) managing distributed systems that integrate several such computers. In particular, distributed systems require application developers to use a mix of different programming models, e.g., MPI together with OpenCL or CUDA. We propose a uniform approach based on OpenCL for programming both stand-alone and distributed systems with GPUs. The approach implementation is based on two parts: 1) the SkelCL library for high-level application programming on heterogeneous stand-alone computers with multi-core CPUs and multiple GPUs, and 2) the dOpenCL middleware for transparent execution of OpenCL programs on several stand-alone computers connected over a network. Both SkelCL and dOpenCL are built on top of the OpenCL standard which ensures their high portability across different kinds of processors and GPUs. The dOpenCL middleware extends OpenCL, such that arbitrary computing devices (multi-core CPUs and GPUs) in a distributed system can be used within a single application, with data and program code moved to these devices transparently. The SkelCL library offers a set of pre-implemented patterns (skeletons) of parallel computation and communication which greatly simplify programming for multi-GPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between a systems main memory and multiple GPUs. This paper describes dOpenCL and SkelCL and illustrates how they are used to simplify programming of heterogeneous distributed systems with accelerators.

Explore More