Kevin Martin
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kevin Martin.
signal processing systems | 2017
Thanh Dinh Ngo; Kevin Martin; Jean-Philippe Diguet
Considering the evolution towards highly variable data flow applications based on an increasing impact of dynamic actors, we must target at runtime the best matching between dataflow graphs and heterogeneous multiprocessor platforms. Thus the mapping must be dynamically adapted depending on data and on communication loads between the computation cores. This is typically the case for mobile devices that run multimedia applications. The problem of mapping a dataflow application, e.g. a network of computational actors, on a multiprocessor platform can be modeled as a problem of partitioning where the cells are the dataflow actors and the partitions are the processors. While the benefit of executing a computational part by one processor rather than another one is usually well shown, the migration overhead is also usually not considered. This paper presents a dynamic mapping algorithm that is performed at runtime, based on a single-move possibility that jointly considers the cost and benefit of possible migrations. The method is first applied on a set of randomly generated benchmarks with different features and different scenarios. Then it is applied to a MPEG4 simple profile video decoder with different input sequences. The results systematically show that the runtime mapping significantly improves the initial mapping. It is fast enough to be executed at runtime in order to track the best mapping according to data variations. The other observation is that not considering the migration cost of the new mapping could lead to worst performance than the original one.
design automation conference | 2016
Kevin Martin; Mostafa Rizk; Martha Johanna Sepúlveda; Jean-Philippe Diguet
NoC-based architectures overcome the limitations of traditional buses by exploiting parallelism and offer large band-widths. NoC adoption also increases communication latency, which is especially penalising for data-flow applications (DF). We introduce the notifying memories (NM) concept to reduce this overhead. Our original approach eliminates useless memory requests. This paper demonstrates NM in the context of video coding applications implemented with dynamic DF. We have conducted cycle accurate systemC simulation of the NoC on an MPEG4 decoder to evaluate NM efficiency. The results show significant reductions in terms of latency (78%), injection rate (60%), and power savings (49%) along with throughput improvement (16%).
conference on design and architectures for signal and image processing | 2014
Thanh Dinh Ngo; Daniel Sepulveda; Kevin Martin; Jean-Philippe Diguet
Mapping a dataflow application onto a heterogeneous multiprocessor platform cannot longer be static. It has to adapt dynamically depending on the data and on the communication between the computation cores. This is typically the case for mobile devices that run multimedia applications. This paper presents an algorithm fast enough to be executed at run-time. In addition to computation cost, our approach relies on a communication model to estimate the delay for transmitting data. The algorithm is compared with METIS tool for random dataflow graphs and two video decoders, MPEG4-SP and HEVC, considering heterogeneous multiprocessor platforms composed of 4 to 8 processors and 6 accelerators. Results on a Zynq platform show that our algorithm is about 40x faster than METIS tool for the same throughput (frames per second) on a platform with 8 processors and 6 accelerators.
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013
Pierre Bomel; Kevin Martin; Jean-Philippe Diguet
This paper targets the time-consuming problem of user IOs and debugging in MPSoC. It introduces the concept of dynamic allocation of virtual UARTs to implement standard-IOs in various hardware and software contexts, on a hybrid FPGA. It discusses the advantages and limitations of this abstraction, presents an implementation on the hybrid Xilinx Zynq device. Multiple experiences illustrate how hardware (processors) and software heterogeneities can be handled. As a result, a real MJPEG decoder is debugged and validated in a couple of hours thanks to standard-IOs and virtual UARTs.
design automation conference | 2018
Rodrigo Cataldo; Ramon Fernandes; Kevin Martin; Johanna Sepulveda; Altamiro Amadeu Susin; César A. M. Marcon; Jean-Philippe Diguet
Parallel applications are essential for efficiently using the computational power of a Multiprocessor System-on-Chip (MPSoC). Unfortunately, these applications do not scale effortlessly with the number of cores because of synchronization operations that take away valuable computational time and restrict the parallelization gains. Moreover, synchronization is also a bottleneck due to sequential access to shared memory. We address this issue and introduce ”Subutai”, a hardware/software (HW/SW) architecture designed to distribute essential synchronization mechanisms over the Network-on-Chip (NoC). It includes Network Interfaces (NIs), drivers and a custom library of a NoC-based MPSoC architecture that speeds up the essential synchronization primitives of any legacy parallel application. Besides, we provide a fast simulation tool for parallel applications and a HW architecture of the NI. Experimental results with PARSEC benchmark show an average application speedup of 2.05 compared to the same architecture running legacy SW solutions for 36% overhead of HW architecture.
model driven engineering languages and systems | 2015
Paola Vallejo; Mickaël Kerboeuf; Kevin Martin; Jean-Philippe Babau
The legacy code of a tool handling domain specific data gathers valuable expertise. However in many cases, it must be rewritten to make it apply to structurally incompatible data. We investigate a co-evolution approach to avoid this update by making the call context meet the a legacy tool definition domain. The data conforming to the call context co-evolve into data conforming to the definition domain. Once processed by the tool, they can be put back into their original context thanks to a specific reverse transformation which enables the recovery of elements that had been initially removed. This approach is applied to Orcc, a compiler for dataflow applications. Orcc requires many common functions that are expected to be adapted to its own context. Our approach is an effective way to reuse them instead of rewriting them.
conference on design and architectures for signal and image processing | 2015
Kevin Martin; Yvan Eustache; Jean-Philippe Diguet; Thanh Dinh Ngo; Emmanuel Casseau; Yaset Oliva
In this demo we will present a design flow for multi-core based embedded systems. Namely, we implement a kernel capable of modifying the system at run time to increase data throughput. The design flow starts with the Dynamic Dataflow and RVC-CAL (Reconfigurable Video Coding Cal Actor Language) descriptions of an application and goes up to the deployment of the system onto the hardware platform. As a use case, we implement an MPEG-4 decoder algorithm onto a multi-core heterogeneous system deployed onto the Zynq platform from Xilinx.
digital systems design | 2014
Pierre Bomel; Kevin Martin; Jean-Philippe Diguet
When partially reconfigurable, FPGA-based, systems allow to dynamically hot-plug processors, the number of possible software configurations increases and the dynamic sharing of hardware peripherals becomes problematic. Moreover, the debugging of application processes, which needs physical devices to communicate with remote users or debuggers, is a critical service that becomes extremely difficult to implement. This work puts forward the concept of virtual devices to reduce software complexity and isolate system services from applications. It is illustrated by a methodology making the design of debug paths easier. Several experiments show that heterogeneous systems of up to 24 hot-pluggable processors can take advantage of virtual devices.
conference on design and architectures for signal and image processing | 2014
Yaset Oliva; Emmanuel Casseau; Kevin Martin; Pierre Bomel; Jean-Philippe Diguet; Hervé Yviquel; Mickaël Raulet; Erwan Raffin; Laurent Morin
This paper presents the implementation of a video decoding application starting from its dataflow and CAL representations. Our objective is to demonstrate the ability of the Open RVC-CAL Compiler (Orcc) to generate code for embedded systems. For the demonstration, the video application will be an MPEG-4 Part2 decoder. The targeted architecture is a multi-core heterogeneous system deployed onto the Zynq platform from Xilinx.
conference on design and architectures for signal and image processing | 2014
Kevin Martin
Since their first appearance, FPGAs have been widely used, studied, and enhanced. It makes it an attractive component for low-volume devices, for prototyping, and for efficient implementation exploiting its parallelism features. The design on FPGA also relies on methods and tools that improve productivity in a convinient way. In this session, an FPGA implementation of a flexible synchronizer for cognitive radio applications is presented. An FPGA is used for prototyping memory controllers and predictive cache for image processing algorithms. A methodology and tool for automatic generation of data-flow based reconfigurable accelerators is presented and used for MPEG Reconfigurable Video Coding applications. A new soft-core is implemented in a Zynq to show the benefit of offloading operating system services.