Bruno Lavigueur | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bruno Lavigueur is active.

Explore More

Publication

Featured researches published by Bruno Lavigueur.

international conference on hardware/software codesign and system synthesis | 2006

Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia

Pierre G. Paulin; Chuck Pilkington; Michel Langevin; Essaid Bensoudane; Damien Lyonnard; Olivier Benny; Bruno Lavigueur; David Lo; Giovanni Beltrame; Vincent Gagné; Gabriela Nicolescu

The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components-H/W or S/W- into a homogeneous platform programming environment. This leads to higher quality designs through encapsulation and abstraction. Two high-level parallel programming models are supported by the following MultiFlex platform mapping tools: a distributed system object component (DSOC) object-oriented message passing model and a symmetrical multiprocessing (SMP) model using shared memory. We demonstrate the combined use of the MultiFlex multiprocessor mapping tools, supported by high-speed hardware-assisted messaging, context-switching, and dynamic scheduling using the StepNP demonstrator multiprocessor system-on-chip platform, for two representative applications: 1) an Internet traffic management application running at 2.5 Gb/s and 2) an MPEG4 video encoder (VGA resolution, at 30 frames/s). For these applications, a combination of the DSOC and SMP programming models were used in interoperable fashion. After optimization and mapping, processor utilization rates of 85%-91% were demonstrated for the traffic manager. For the MPEG4 decoder, the average processor utilization was 88%

ACM Transactions on Design Automation of Electronic Systems | 2007

MPSoC memory optimization using program transformation

Youcef Bouchebaba; Bruno Girodias; Gabriela Nicolescu; El Mostapha Aboulhamid; Bruno Lavigueur; Pierre G. Paulin

Multiprocessor system-on-a-chip (MPSoC) architectures have received a lot of attention in the past years, but few advances in compilation techniques target these architectures. This is particularly true for the exploitation of data locality. Most of the compilation techniques for parallel architectures discussed in the literature are based on a single loop nest. This article presents new techniques that consist in applying loop fusion and tiling to several loop nests and to parallelize the resulting code across different processors. These two techniques reduce the number of memory accesses. However, they increase dependencies and thereby reduce the exploitable parallelism in the code. This article tries to address this contradiction. To optimize the memory space used by temporary arrays, smaller buffers are used as a replacement. Different strategies are studied to optimize the processing time spent accessing these buffers. The experiments show that these techniques yield a significant reduction in the number of data cache misses (30%) and in processing time (50%).

rapid system prototyping | 2006

Application-Level Memory Optimization for MPSoC

Bruno Girodias; Youcef Bouchebaba; Gabriela Nicolescu; El Mostapha Aboulhamid; Pierre G. Paulin; Bruno Lavigueur

Multiprocessor system-on-chip is one of the main drivers of the semiconductor industry revolution by enabling the integration of complex functionality on a single chip. Memory is becoming a key player for significant improvements in embedded systems (power, performance and area). With the emergence of more embedded multimedia applications in the industry, this issue becomes increasingly vital. These applications often use multi-dimensional arrays to store intermediate results during multimedia processing tasks. A couple of key optimization techniques exist and have been demonstrated on SoC architecture. This paper presents these techniques and their impact on a MPSoC environment and brings forward improvements. These techniques allow for optimization of memory space, reduction of the number of cache misses and extensive improvement of processing time extensively. In this papers case study, theses techniques yield an average increase of the data cache hit rate by 20% and an average decrease of processing time by 50%

rapid system prototyping | 2010

MpAssign: A framework for solving the many-core platform mapping problem

Youcef Bouchebaba; Pierre G. Paulin; Ali Erdem Özcan; Bruno Lavigueur; Michel Langevin; Olivier Benny; Gabriela Nicolescu

Many-core platforms, providing large numbers of parallel execution resources, emerge as a response to the increasing computation needs of embedded applications. A major challenge raised by this trend is the efficient mapping of applications on parallel resources. This is a nontrivial problem because of the number of parameters to be considered for characterizing both the applications and the underlying platform architectures. Recently, several authors have proposed to use Multi-Objective Evolutionary Algorithm (MOEA) to solve this problem within the context of mapping applications on Network-on-Chips (NoC). However, these proposals have several limitations: (1) only few meta-heuristics are explored (mainly NSGAII and SPEA2), (2) only few cost functions are provided, and (3) they only deal with a small number of the application and architecture constraints. In this paper, we propose a new framework which avoids all of the problems cited above. Our framework allows designers to (1) explore several new meta-heuristics, (2) easily add a new cost function (or to use an existing one) and (3) take into account any number of architecture and application constraints. The paper also presents experiments illustrating how our framework is applied to the problem of mapping streaming applications on a NoC based many-core platform.

signal processing systems | 2009

Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications

Bruno Girodias; Youcef Bouchebaba; Gabriela Nicolescu; El Mostapha Aboulhamid; Pierre G. Paulin; Bruno Lavigueur

Multiprocessor System-on-Chip is one of the main drivers of the semiconductor industry revolution by enabling the integration of complex functionality on a single chip. The techniques for processor design and application optimizations can be combined together for more efficient design of these systems. Thus, the memory optimization techniques improving the data locality can be combined with multithreading technology, improving the overall processor efficiency. The combination of these techniques is mainly challenged by the adaptation of memory optimization techniques to the high parallelism offered by the multithreading environments. This paper presents an in-depth analysis of the impact of multiprocessor and multithreading environments on memory optimization techniques. A discussion is provided on the different types of parallelization (fine and coarse grain) and their influence on memory optimization technique. Some improvements on existing memory optimization techniques are presented as well some adaptation necessary to use them in this type of environment.

APPT 2013 Revised Selected Papers of the 10th International Symposium on Advanced Parallel Processing Technologies - Volume 8299 | 2013

Programming Real-Time Image Processing for Manycores in a High-Level Language

Essayas Gebrewahid; Zain-ul-Abdin; Bertil Svensson; Verónica Gaspes; Bruno Jego; Bruno Lavigueur; Mathieu Robart

Manycore architectures are gaining attention as a means to meet the performance and power demands of high-performance embedded systems. However, their widespread adoption is sometimes constrained by the need for mastering proprietary programming languages that are low-level and hinder portability. We propose the use of the concurrent programming language occam-pi as a high-level language for programming an emerging class of manycore architectures. We show how to map occam-pi programs to the manycore architecture Platform 2012 P2012. We describe the techniques used to translate the salient features of the language to the native programming model of the P2012. We present the results from a case study on a representative algorithm in the domain of real-time image processing: a complex algorithm for corner detection called Features from Accelerated Segment Test FAST. Our results show that the occam-pi program is much shorter, is easier to adapt and has a competitive performance when compared to versions programmed in the native programming model of P2012 and in OpenCL.

ACM Transactions in Embedded Computing Systems | 2013

Parallel programming patterns for multi-processor SoC: Application to video processing

Pierre G. Paulin; Ali Erdem Özcan; Vincent Gagné; Bruno Lavigueur; Olivier Benny

Efficient, scalable and productive parallel programming is a major challenge for exploiting the future multi-processor SoC platforms. This article presents the MultiFlex programming environment which has been developed to address this challenge. It is targeted for use on Platform 2012, a scalable multi-processor fabric. The MultiFlex environment supports high-level simulation, iterative platform mapping, and includes tools for programming model aware debug, trace, visualization and analysis. This article focuses on the two classes of programming abstractions supported in MultiFlex. The first is a set of Parallel Programming Patterns (PPP) which offer a rich set of programming abstractions for implementing efficient data- and task-level parallel applications. The second is a Reactive Task Management (RTM) abstraction, which offers a lightweight C-based API to support dynamic dispatching of small grain tasks on tightly coupled parallel processing resources. The use of the MultiFlex native programming model is illustrated through the capture and mapping of two representative video applications. The first is a high-quality rescaling (HQR) application on a multi-processor platform. We present the details of the optimization process which was required for mapping the HQR application, for which the reference code requires 350 GIPS (giga instructions per second), onto a 16 processor cluster. Our results show that the parallel implementation using the PPP model offers almost linear acceleration with respect to the number of processing elements. The second application is a high-definition VC-1 decoder. For this application, we illustrate two different parallel programming model variants, one using PPPs, the other based on RTM. These two versions are mapped onto two variants of a homogeneous version of the Platform 2012 multi-core fabric.

application specific systems architectures and processors | 2007

Two-level tiling for MPSoC architecture

Youcef Bouchebaba; Essaid Bensoudane; Bruno Lavigueur; Pierre G. Paulin; Gabriela Nicolescu

Multiprocessor systems-on-a-chip (MPSoCs architectures) have received a lot of attention in the past years, but few advances in compilation techniques target these architectures. This is particularly true for the exploitation of several level of memory hierarchy. Usually tiling is applied to one loop nest; in this paper we apply simultaneously loop fusion with two-level tiling to several loop nests in the context of a MPSoC architecture. The two level-tiling allows the simultaneous optimization of caches and registers. To optimize the memory space used by temporary arrays, buffers and registers are used as a replacement. The experiments show that these techniques yield a significant reduction in the number of data cache misses and in processing time.

symposium on cloud computing | 2006

Integration of Configurable Processors in a Multiprocessor Platform

Simon Provost; Bruno Lavigueur; Guy Bois; Gabriela Nicolescu

To reach current SoCs constraint of flexibility without loosing performance, system designers need to avoid custom logic and replace it by multiple processors and custom instruction extensions. In this paper, we bring a new codesign methodology which targets MPSoC comprising of multiple configurable processors. By applying this methodology to an MPEG-4 encoder we show that the approach gives a speedup factor that is almost linear with the number of processors.

design, automation, and test in europe | 2006