Enes Bajrovic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Enes Bajrovic is active.

Explore More

Publication

Featured researches published by Enes Bajrovic.

international conference on conceptual structures | 2013

High-level Support for Hybrid Parallel Execution of C++ Applications Targeting Intel® Xeon Phi™ Coprocessors☆

Jiri Dokulil; Enes Bajrovic; Siegfried Benkner; Sabri Pllana; Martin Sandrieser; Beverly Bachmayer

Abstract The introduction of Intel ® Xeon Phi™ coprocessors opened up new possibilities in development of highly parallel applications. Even though the architecture allows developers to use familiar programming paradigms and techniques, high-level development of programs that utilize all available processors (host+coprocessors) in a system at the same time is a challenging task. In this paper we present a new high-level parallel library construct which makes it easy to apply a function to every member of an array in parallel. In addition, it supports the dynamic distribution of work between the host CPUs and one or more coprocessors. We describe associated runtime support and use a physical simulation example to demonstrate that our library can facilitate the creation of C++ applications that benefit significantly from hybrid execution. Experimental results show that a single optimized source code is sufficient to simultaneously exploit all of the hosts CPU cores and coprocessors efficiently.

international conference on parallel processing | 2013

HyPHI - Task Based Hybrid Execution C++ Library for the Intel Xeon Phi Coprocessor

Jiri Dokulil; Enes Bajrovic; Siegfried Benkner; Martin Sandrieser; Beverly Bachmayer

The Intel Threading Building Blocks (TBB) C++ library introduced task parallelism to a wide audience of application developers. The library is easy to use and powerful, but it is limited to shared-memory machines. In this paper we present HyPHI, a novel library for the Intel Xeon Phi coprocessor for building applications which execute using a hybrid parallel model that exploits parallelism across host CPUs and Xeon Phi coprocessors simultaneously. Our library currently provides hybrid for-each and map-reduce. It hides the details of parallelization, work distribution and computation offloading from users while using internally TBB as its foundation. Despite the higher level of abstraction provided by our library we show that for certain types of applications we outperform codes that rely on the built-in offload support currently provided by the Intel compiler. We have performed a set of experiments with the library and created guidelines that help the developers decide in which situations they should use the HyPHI library.

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface | 2011

Using MPI derived datatypes in numerical libraries

Enes Bajrovic; Jesper Larsson Träff

By way of example this paper examines the potential of MPI user-defined datatypes for distributed datastructure manipulation in numerical libraries. The three examples, namely gather/scatter of columnwise distributed two dimensional matrices, matrix transposition, and redistribution of doubly cyclically distributed matrices as used in the Elemental dense matrix library, show that distributed data structures can be conveniently expressed with the derived datatype mechanisms of MPI, yielding at the same time worthwhile performance advantages over straight-forward, hand written implementations. Experiments have been performed with on different systems with mpich2 and OpenMPI library implementations. We report results for a SunFire X4100 system with the mvapich2 library. We point out cases where the current MPI collective interfaces do not provide sufficient functionality.

international conference on parallel processing | 2012

High-level support for pipeline parallelism on many-core architectures

Siegfried Benkner; Enes Bajrovic; Erich Marth; Martin Sandrieser; Raymond Namyst; Samuel Thibault

With the increasing architectural diversity of many-core architectures the challenges of parallel programming and code portability will sharply rise. The EU project PEPPHER addresses these issues with a component-based approach to application development on top of a task-parallel execution model. Central to this approach are multi-architectural components which encapsulate different implementation variants of application functionality tailored for different core types. An intelligent runtime system selects and dynamically schedules component implementation variants for efficient parallel execution on heterogeneous many-core architectures. On top of this model we have developed language, compiler and runtime support for a specific class of applications that can be expressed using the pipeline pattern. We propose C/C++ language annotations for specifying pipeline patterns and describe the associated compilation and runtime infrastructure. Experimental results indicate that with our high-level approach performance comparable to manual parallelization can be achieved.

hawaii international conference on system sciences | 2016

Tuning OpenCL Applications with the Periscope Tuning Framework

Enes Bajrovic; Robert Mijakovic; Jiri Dokulil; Siegfried Benkner; Michael Gerndt

Due to the complexity and diversity of new parallel architectures automatic tuning of parallel applications has become increasingly important for achieving acceptable performance levels as well as performance portability. The European AutoTune project developed a tuning framework which closely integrates and automates performance analysis and performance tuning. The Periscope Tuning Framework relies on a flexible plugin mechanism providing tuning plugins for different tuning aspects. This paper presents plugins for tuning the execution time of OpenCL kernels on three different architectures, namely standard multicore CPUs, Xeon Phi coprocessors, and GPUs. We present OpenCL tuning via the flags used during offline kernel compilation as well as through the selection of the most appropriate NDRange configuration, which defines the organization of parallel threads used for kernel execution. Both tuning plugins show significant performance impact and a clear dependence on the target architecture and thus improve performance portability via automatic tuning.

Software Quality Journal | 2018

A multi-aspect online tuning framework for HPC applications

Michael Gerndt; Siegfried Benkner; Eduardo César; Carmen B. Navarrete; Enes Bajrovic; Jiri Dokulil; Carla Guillén; Robert Mijakovic; Anna Sikora

Developing software applications for high-performance computing (HPC) requires careful optimizations targeting a myriad of increasingly complex, highly interrelated software, hardware and system components. The demands placed on minimizing energy consumption on extreme-scale HPC systems and the associated shift towards hete rogeneous architectures add yet another level of complexity to program development and optimization. As a result, the software optimization process is often seen as daunting, cumbersome and time-consuming by software developers wishing to fully exploit HPC resources. To address these challenges, we have developed the Periscope Tuning Framework (PTF), an online automatic integrated tuning framework that combines both performance analysis and performance tuning with respect to the myriad of tuning parameters available to today’s software developer on modern HPC systems. This work introduces the architecture, tuning model and main infrastructure components of PTF as well as the main tuning plugins of PTF and their evaluation.

complex, intelligent and software intensive systems | 2009

Experimental Study of Multithreading to Improve Memory Hierarchy Performance of Multi-core Processors for Scientific Applications

Enes Bajrovic; Eduard Mehofer

In this paper we study performance characteristics and parallelization strategies for recently shipped, powerful multi-core processors - IBM Power6 and Sun T2 Plus - for high-end scientific computing. Central aspect is data locality. First, we investigate the impacts of good and bad data locality by modifying data accesses. Next, we study the impact of multithreading with respect to data locality based on the data-parallel programming approach. The level of parallelism is increased by assigning multiple threads onto one core in order to hide processor stalls caused by bad data locality. We measure the impacts of data locality and multithreading in terms of execution times and bandwidth for synthetic micro-benchmarks, a matrix multiplication kernel, and an application from Bioinformatics. The results indicate that substantial performance improvements can be obtained with minor effort by utilizing multithreading.

Archive | 2014