Emmanuel Casseau | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emmanuel Casseau is active.

Explore More

Publication

Featured researches published by Emmanuel Casseau.

signal processing systems | 2011

Efficient multicore scheduling of dataflow process networks

Hervé Yviquel; Emmanuel Casseau; Matthieu Wipliez; Mickaël Raulet

Although multi-core processors are now available everywhere, few applications are able to truly exploit their multiprocessing capabilities. Dataflow programming attempts to solve this problem by expressing explicit parallelism within an application. In this paper, we describe two scheduling strategies for executing a dataflow program on a single-core processor. We also describe an extension of these strategies on multi-core architectures using distributed schedulers and lock-free communications. We show the efficiency of these scheduling strategies on MPEG-4 Simple Profile and MPEG-4 Advanced Video Coding decoders.

IEEE Transactions on Very Large Scale Integration Systems | 2008

Dynamic Memory Access Management for High-Performance DSP Applications Using High-Level Synthesis

Bertrand Le Gal; Emmanuel Casseau; Sylvain Huet

Multimedia applications such as video and image processing are often characterized by a huge number of data accesses. In many digital signal processing applications, array access patterns are regular and periodic. In these cases, optimized architectures using pipelined memory access controllers can be generated. In this paper, we focus on implementing memory interfacing modules that can be automatically generated from a high-level synthesis tool and which can efficiently handle predictable address patterns as well as random ones (i.e., dynamic address computations). The benefits of balancing dynamic address computations from datapath to dedicated computation units in the memory controller is also analyzed as well as operator bitwidth optimization and data locality to save power consumption and reduce latency.

international conference on computer aided design | 2007

A design flow dedicated to multi-mode architectures for DSP applications

Cyrille Chavet; Caaliph Andriamisaina; Philippe Coussy; Emmanuel Casseau; Emmanuel Juin; Pascal Urard; Eric Martin

This paper addresses the design of multi-mode architectures for digital signal processing applications. We present a dedicated design flow and its associated high-level synthesis tool, named GAUT. Given a unified description of a set of time-wise mutually exclusive tasks and their associated throughput constraints, a single RTL hardware architecture optimized in area is generated. In order to reduce the register, steering logic (multiplexers) and controller (decoding logic) complexities, we propose a joint-scheduling algorithm which maximizes the similarities between control steps and specific binding approaches for both functional units and storage elements which maximize the similarities between the datapaths. We show through a set of test cases that our approach offers significant area saving relative to the state-of-the-art.

ACM Transactions in Embedded Computing Systems | 2006

A formal method for hardware IP design and integration under I/O and timing constraints

Philippe Coussy; Emmanuel Casseau; Pierre Bomel; Adel Baganne; Eric Martin

IP integration, which is one of the most important SoC design steps, requires taking into account communication and timing constraints. In that context, design and reuse can be improved using IP cores described at a high abstraction level. In this paper, we present an IP design approach that relies on three main phases: (1) constraint modeling, (2) IP constraint analysis steps for feasibility checking, and (3) synthesis. We propose a set of techniques dedicated to the digital signal processing domain that lead to an optimized IP core integration. Based on a generic architecture of components, the method we propose provides automatic generation of IP cores designed under integration constraints. We show the effectiveness of our approach with a DCT core design case study.

signal processing systems | 2015

Embedded Multi-Core Systems Dedicated to Dynamic Dataflow Programs

Hervé Yviquel; Alexandre Sanchez; Pekka Jääskeläinen; Jarmo Takala; Mickaël Raulet; Emmanuel Casseau

Multimedia applications and embedded platforms are both becoming very complex in order to improve user experience. Thus, multimedia developers need high-level methods to automate time-consuming and error-prone tasks. Dynamic dataflow modeling is attractive to describe complex applications, such as video codecs, at a high level of abstraction. This paper presents a dataflow-based design approach to implement video codecs on embedded multi-core platforms. First, we introduce a custom architecture model to design low-power multi-core chips based on distributed memory and Transport-Triggered Architecture processor cores. Then, we describe software synthesis techniques to improve dynamic dataflow implementations. This methodology has been implemented into open-source tools and demonstrated on video decoders based on the MPEG-4 Visual standard and the new High Efficiency Video Coding standard. The simulations achieve real-time decoding (40FPS) of high definition (720P) MPEG-4 Visual video sequences on a custom multi-core platform clocked at 1Ghz, which is an improvement of more than 100 % over previously proposed implementations.

international symposium on parallel and distributed processing and applications | 2013

Towards run-time actor mapping of dynamic dataflow programs onto multi-core platforms

Hervé Yviquel; Emmanuel Casseau; Mickaël Raulet; Pekka Jääskeläinen; Jarmo Takala

The emergence of massively parallel architectures, along with the necessity of new parallel programming models, has revived the interest on dataflow programming due to its ability to express concurrency. Although dynamic dataflow programming can be considered as a flexible approach for the development of scalable applications, there are still some open problems in concern of their execution. In this paper, we propose a low-cost mapping methodology to map dynamic dataflow programs over any multi-core platform. Our approach finds interesting mapping solutions in few milliseconds that makes it doable at regular time by translating it in an equivalent graph partitioning problem. Consequently, a good load balancing over the targeted platform can be maintained even with such unpredictable applications. We conduct experiments across three MPEG video decoders, including one based on the new High Efficiency Video Coding standard. Those dataflow-based video decoders are executed on two different platform: A desktop multi-core processor, and an embedded platform composed of interconnected and tiny Very Long Instruction Word - style processors. Our entire design flow is based on open-source tools. We present the influence of the number of processors on the performance and show that our method obtains a maximum decoding rate for 16 processors.

conference on design and architectures for signal and image processing | 2010

Scheduling, binding and routing system for a run-time reconfigurable operator based multimedia architecture

Erwan Raffin; Christophe Wolinski; François Charot; Krzysztof Kuchcinski; Stéphane Guyetant; Stéphane Chevobbe; Emmanuel Casseau

This paper presents a system for application scheduling, binding and routing for a run-time reconfigurable operator based multimedia architecture (ROMA). We use constraint programming to formalize our architecture model together with a specific application program. For this purpose we use an abstract representation of our architecture, which models memories, reconfigurable operator cells and communication networks.We also model network topology. The use of constraints programming makes it possible to model the application scheduling, binding and routing as well as architectural and temporal constraints in a single model and solve it simultaneously. We have used several multimedia applications from the Mediabench set to evaluate our system. In 78% of cases, our system provides results that are proved optimal.

international conference on systems | 2009

High-level synthesis for the design of FPGA-based signal processing systems

Emmanuel Casseau; Bertrand Le Gal

High-level synthesis (HLS) currently seems to be an interesting process to reduce the design time substantially. HLS tools actually map algorithms to architectures. While such tools were developed targeting ASIC technologies, HLS currently draws wide interest for FPGA designers. However with most of HLS techniques, traditional resource sharing models are very inaccurate for FPGAs: for example, multiplexers can be very expensive with such technologies. Resource usage optimizations and dedicated resource binding have to be applied. In this paper a HLS process which takes care of data-width and combines scheduling and binding to carefully take into account interconnect cost is presented. Experimental results show that our approach achieves significant reduction for area (34%) and dynamic power (28%) compared to a traditional synthesis.

signal processing systems | 2011

Stochastic modeling for floating-point to fixed-point conversion

Andrei Banciu; Emmanuel Casseau; Daniel Menard; Thierry Michel

The floating-point to fixed-point transformation process is error prone and time consuming as the distortion introduced by the limited data size is difficult to evaluate. In this paper a method to estimate the range of variables in LTI systems with respect to the corresponding overflow probability is presented. Furthermore, we will show that the quantization noise evaluation can be realized using the same approach. The variance and the probability density function of the error are computed. The results obtained for several typical applications are presented.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2010

High-Level Synthesis for Designing Multimode Architectures

Caaliph Andriamisaina; Philippe Coussy; Emmanuel Casseau; Cyrille Chavet

This paper addresses the design of multimode architectures for digital signal and image processing applications. We present a dedicated design flow and its associated high-level synthesis tool, named GAUT. Given a unified description of a set of time-wise mutually exclusive tasks and their associated throughput constraints, a single register transfer level hardware architecture optimized in area is generated. In order to reduce the register, the steering logic, and the controller complexities, this paper proposes a joint-scheduling algorithm, which maximizes the similarities between the control steps and specific binding approaches for both operators and storage elements which maximize the similarities between the datapaths. It is shown through a set of test cases that the proposed approach offers significant area saving and low-performance penalties compared to both state-of-the-art techniques and dedicated mono-mode architectures.

Explore More