Thierry Grandpierre | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thierry Grandpierre is active.

Explore More

Publication

Featured researches published by Thierry Grandpierre.

Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450) | 1999

Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors

Thierry Grandpierre; Christophe Lavarenne; Yves Sorel

This paper presents an enhancement of our Algorithm Architecture Adequation (AAA) prototyping methodology which allows to rapidly develop and optimize the implementation of a reactive real-time dataflow algorithm on a embedded heterogeneous multiprocessor architecture, predict its real-time behavior and automatically generate the corresponding distributed and optimized static executive. It describes a new optimization heuristic able to support heterogeneous architectures and takes into account accurately inter-processor communications, which are usually neglected but may reduce dramatically multiprocessor performances.

The Journal of Supercomputing | 2004

A methodology to implement real-time applications onto reconfigurable circuits

Linda Kaouane; Mohamed Akil; Thierry Grandpierre; Yves Sorel

This paper presents an extension of the AAA rapid prototyping methodology for the optimized implementation of real-time applications onto reconfigurable circuits. This extension is based on an unified model of factorized data dependence graphs as well to specify the application algorihtm, as to deduce the possible implementations onto reconfigurable hardware. This is formalized in terms of graphs transformations. This seamless transformation flow has been implemented in a CAD software tool called SynDEx-IC.

advanced concepts for intelligent vision systems | 2008

Parallel Algorithm for Concurrent Computation of Connected Component Tree

Petr Matas; Eva Dokladalova; Mohamed Akil; Thierry Grandpierre; L. Najman; Martin Poupa; Vjaceslav Georgiev

The paper proposes a new parallel connected-component-tree construction algorithm based on line independent building and progressive merging of partial 1-D trees. Two parallelization strategies were developed: the parallelism maximization strategy, which balances the workload of the processes, and the communication minimization strategy, which minimizes communication among the processes. The new algorithm is able to process any pixel data type, thanks to not using a hierarchical queue. The algorithm needs only the input and output buffers and a small stack. A speedup of 3.57 compared to the sequential algorithm was obtained on Opteron 4-core shared memory ccNUMA architecture. Performance comparison with existing state of the art is also discussed.

Neural Computing and Applications | 2010

Implementation of an LVQ neural network with a variable size: algorithmic specification, architectural exploration and optimized implementation on FPGA devices

Mohamed Boubaker; Mohamed Akil; Khaled Ben Khalifa; Thierry Grandpierre; Mohamed Hedi Bedoui

This paper presents an optimizing methodology for the implementation of a Learning Vector Quantization (LVQ) neural network in a Field Programmable Gate Array (FPGA) device. Starting from an algorithmic specification in the form of a Factorized and Conditioned Data Dependence Graph (GFCDD), we suggest a design methodology of the LVQ-dedicated architecture. This formal methodology is called AAA, “Algorithm Architecture Adequation”. Using graph transformations, it allows the generation of an optimized circuit implementation at the Register Transfer Level (RTL). It is associated to the SynDEx-IC software tool. Based on this formal methodology, we are able to explore and generate various LVQ network implementations by varying the LVQ sizes while minimizing the hardware resources and the design time. In addition, real-time constraints should be respected to ensure a reliable classification of vigilance states in humans from electroencephalographic signals (EEG). To validate our approach, the optimized LVQ implementation was tried on two types of Virtex devices.

Journal of Real-time Image Processing | 2012

Real-time dynamic tone-mapping operator on GPU

Mohamed Akil; Thierry Grandpierre; Laurent Perroton

This article presents the parallel implementation on a GPU of a real-time dynamic tone-mapping operator. The operator we describe in this article is generic and may be used by any application. However, the goal of our work is to integrate this operator into the graphic rendering process of a car driving simulator; thus, we studied its real-time implementation. The tone-mapping operator outputs a low dynamic range (LDR) image keeping as much as possible the contrast and luminance of the original input high dynamic range (HDR) image. It performs the mapping between the luminances of the original scene to the output device’s display values. We address the problem of mapping HDR images to standard displays. In this case, the tone mapping compresses the luminances ratio. Several tone-mapping operators can be found in the literature as well as some parallelizations. However, they use either static or adaptations of static operators. We have adapted the dynamic operator of Irawan and parallelized it on GPU. Algorithmic optimizations have been performed, and we have explored the different strategies of repartition of the computation among the CPU and the GPU. We have chosen to implement on the GPU the changes between the color spaces and the interpolation of the histogram which are the most time-consuming steps on the CPU (1–2xa0s per image 1,002xa0×xa0666). All of these optimizations lead to an increase of the processing rate and the number of HDR-quality images displayed to LDR per second. This operator has been implemented on a RISC processor Pentium 4xa0at 3.6xa0GHz and a GPU Nvidia 8800xa0GTX (728MB, 518GFLOPS). The execution speed has been multiplied by a factor of 15 compared to the naive implementation of the algorithm. The display rate reaches 30 images per second, which fulfills our goal for real-time video rate of 25 images per second.

international conference on computer vision | 2015

Real-time H264/AVC high definition video encoder on a multicore DSP TMS320C6678

Nejmeddine Bahri; Nidhameddine Belhadj; Med Ali Ben Ayed; Nouri Masmoudi; Thierry Grandpierre; Mohamed Akil

In this paper, the newest Texas Instruments multicore DSP TMS320C6678 is used in order to perform a real-time H264/AVC high definition (HD) embedded video encoder. We exploit the high computing performance offered by this eight-core DSP in order to meet the real-time encoding compliant. To enhance the encoding speed, Frame Level Parallelism (FLP) approach is applied. A master core is reserved to handle data transfers to/from DSP. Multithreading algorithm combined with a ping-pong buffers technique are exploited in order to optimize the standard FLP approach and hide communication overhead. Experimental results show that our enhanced FLP implementation allows achieving real-time HD (1280×720) video encoding by reaching up to 26 f/s (frame/second) as encoding speed. Experiments show also that our parallel implementation, performed on seven C6678 DSP cores running each @ 1 GHz, allows accelerating the encoding run-time by a factor of 6,38 without inducing any quality degradation or bit-rate increase.

international conference on electronics, circuits, and systems | 2010

Latency and power optimization in AAA methodology for integrated circuits

Yaroub Elloumi; Mohamed Akil; Thierry Grandpierre; Mohamed Hedi Bedoui

Field Programmable Gate Arrays (FPGA) are flexible, so they are commonly used in many high speed applications. However, power constraints are the most important limiting factors while implementing high speed adaptable applications. This work addresses the optimization of the execution time and power consumption. We propose a new design methodology by extending Algorithm-Architecture-Adequacy (AAA) methodology. It provides an implementation which meets real time constraints and allows the designer to optimize power consumption or material resources. The extension has been implemented in AAA software tool called Synchronized Distributed Executive for Integrated Circuits (SynDEx-IC). The experimental results show that the mentioned software tool provides an architecture that consumes less power among the explored ones, which the average power is reduced by 15.75%.

Sensor and Transducers Journal | 2011