Pieter Bellens | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pieter Bellens is active.

Explore More

Publication

Featured researches published by Pieter Bellens.

conference on high performance computing (supercomputing) | 2006

CellSs: a programming model for the cell BE architecture

Pieter Bellens; Josep M. Perez; Rosa M. Badia; Jesús Labarta

In this work we present Cell superscalar (CellSs) which addresses the automatic exploitation of the functional parallelism of a sequential program through the different processing elements of the Cell BE architecture. The focus in on the simplicity and flexibility of the programming model. Based on a simple annotation of the source code, a source to source compiler generates the necessary code and a runtime library exploits the existing parallelism by building at runtime a task dependency graph. The runtime takes care of the task scheduling and data handling between the different processors of this heterogeneous architecture. Besides, a locality-aware task scheduling has been implemented to reduce the overhead of data transfers. The approach has been implemented and tested with a set of examples and the results obtained since now are promising

Ibm Journal of Research and Development | 2007

CellSs: making it easier to program the cell broadband engine processor

Josep M. Perez; Pieter Bellens; Rosa M. Badia; Jesús Labarta

With the appearance of new multicore processor architectures, there is a need for new programming paradigms, especially for heterogeneous devices such as the Cell Broadband Engine™ (Cell/B.E.) processor. CellSs is a programming model that addresses the automatic exploitation of functional parallelism from a sequential application with annotations. The focus is on the flexibility and simplicity of the programming model. Although the concept and programming model are general enough to be extended to other devices, its current implementation has been tailored to the Cell/B.E. device. This paper presents an overview of CellSs and a newly implemented scheduling algorithm. An analysis of the results--both performance measures and a detailed analysis with performance analysis tools--was performed and is presented here.

International Journal of Parallel Programming | 2010

Extending OpenMP to Survive the Heterogeneous Multi-Core Era

Eduard Ayguadé; Rosa M. Badia; Pieter Bellens; Daniel Cabrera; Alejandro Duran; Roger Ferrer; Marc Gonzàlez; Francisco D. Igual; Daniel Jiménez-González; Jesus Labarta; Luis Martinell; Xavier Martorell; Rafael Mayo; Josep M. Perez; Judit Planas; Enrique S. Quintana-Ortí

This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs programming model. The proposed extensions allow the programmer to write portable code easily for a number of different platforms, relieving him/her from developing the specific code to off-load tasks to the accelerators and the synchronization of tasks. Our results obtained from the StarSs instantiations for SMPs, the Cell, and GPUs report reasonable parallel performance. However, the real impact of our approach in is the productivity gains it yields for the programmer.

languages and compilers for parallel computing | 2010

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

Roger Ferrer; Judit Planas; Pieter Bellens; Alejandro Duran; Marc Gonzàlez; Xavier Martorell; Rosa M. Badia; Eduard Ayguadé; Jesús Labarta

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmers productivity.

ieee international conference on high performance computing data and analytics | 2009

CellSs: Scheduling techniques to better exploit memory hierarchy

Pieter Bellens; Josep M. Perez; Felipe Cabarcas; Alex Ramirez; Rosa M. Badia; Jesús Labarta

Cell Superscalars (CellSs) main goal is to provide a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of the applications at a task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that takes care of the concurrent execution of the application. The first efforts for task scheduling in CellSs derived from very simple heuristics. This paper presents new scheduling techniques that have been developed for CellSs for the purpose of improving an applications performance. Additionally, the design of a new scheduling algorithm is detailed and the algorithm evaluated. The CellSs scheduler takes an extension of the memory hierarchy for Cell/B.E. into account, with a cache memory shared between the SPEs. All new scheduling practices have been evaluated showing better behavior of our system.

advanced concepts for intelligent vision systems | 2011

Parallel implementation of the integral histogram

Pieter Bellens; Kannappan Palaniappan; Rosa M. Badia; Jesús Labarta

The integral histogram is a recently proposed preprocessing technique to compute histograms of arbitrary rectangular gridded (i.e. image or volume) regions in constant time. We formulate a general parallel version of the the integral histogram and analyse its implementation in Star Superscalar (StarSs). StarSs provides a uniform programming and runtime environment and facilitates the development of portable code for heterogeneous parallel architectures. In particular, we discuss the implementation for the multi-core IBM Cell Broadband Engine (Cell/B.E.) and provide extensive performance measurements and tradeoffs using two different scan orders or histogram propagation methods. For 640 × 480 images, a tile or block size of 28 × 28 and 16 histogram bins the parallel algorithm is able to reach greater than real-time performance of more than 200 frames per second.

IEEE Micro | 2010

Parallel Programming Models for Heterogeneous Multi-Core Architectures

Roger Ferrer; Pieter Bellens; Vicenç Beltran; Marc Gonzàlez; Xavier Martorell; Rosa M. Badia; Eduard Ayguadé; Jae-seung Yeom; Scott Schneider; Konstantinos Koukos; Michail Alvanos; Dimitrios S. Nikolopoulos; Angelos Bilas

This article evaluates the scalability and productivity of six parallel programming models for heterogeneous architectures, and finds that task-based models using code and data annotations require the minimum programming effort while sustaining nearly best performance. However, achieving this result requires both extensions of programming models to control locality and granularity and proper interoperability with platform-specific optimizations.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2009

Exploiting Locality on the Cell/B.E. through Bypassing

Pieter Bellens; Josep M. Perez; Rosa M. Badia; Jesús Labarta

Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. In the context of our parallel runtime we analyse the effect of the bandwidth of the Element Interconnect Bus (EIB) on an applications performance. We introduce a technique called bypassing that potentially increases the observed bandwidth and improves the execution time for applications with a distributed computation pattern. Although the integration of bypassing with CellSs is work in progress we present results for five fundamental linear algebra kernels to demonstrate the applicability of bypassing and to attempt to quantify the benefit that can be reaped.

ieee international conference on high performance computing data and analytics | 2011

Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E

Pieter Bellens; Josep M. Perez; Rosa M. Badia; Jesús Labarta

Cell Superscalar (CellSs) provides a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of applications at a function or task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that orchestrates the concurrent execution of the application. We introduce a technique called bypassing that allows CellSs to perform core-to-core Direct Memory Access (DMA) transfers for generic applications. In this review we concisely summarize the bypassing practice and introduce two improvements: just-in-time renaming and lazy write-back. These extensions come at no additional cost and potentially increase performance by improving the perceived bandwidth of the Element Interconnect Bus (EIB). Experiments on five fundamental linear algebra kernels demonstrate the applicability of these techniques and quantify the benefit that can be reaped. We also present performance results for a first prototype of CellSs with bypassing.

ICNAAM 2010: International Conference of Numerical Analysis and Applied Mathematics 2010 | 2010

Particle Methods on Multicore Architectures: Experiences and Future Plans

Annika Schiller; Godehard Sutmann; Luis Martinell; Pieter Bellens; Rosa M. Badia

The requirement of high performance and memory for computer simulations is still growing. Due to hardware constraints like power consumption, heat dissipation and other physical limitations the development trend in high performance computing (HPC) tends to multicore design patterns. As new computational platforms become increasingly more complicated and heterogeneous, there is the need for portable programming models that easily enable the exploitation of these architectures. Additionally, algorithms are needed that are able to match the platform specific requirements and exploit their potential power.This work focuses on the particle‐based algorithm Multiparticle Collision Dynamics (MPC) for the calculation of hydrody‐namic properties of fluid and flow phenomena. This algorithm has already been ported to Cell Broadband Engine (Cell/BE) by using the high‐level programming model Cell Superscalar (CellSs). Performance results of the Cell/BE implementation and a recently developed OpenMP version are presente...

Explore More