Marc Duranton
Philips
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marc Duranton.
IEEE Transactions on Neural Networks | 1992
Nicolas Mauduit; Marc Duranton; Jean Gobert; Jacques-Ariel Sirat
Neural network simulations on a parallel architecture are reported. The architecture is scalable and flexible enough to be useful for simulating various kinds of networks and paradigms. The computing device is based on an existing coarse-grain parallel framework (INMOS transputers), improved with finer-grain parallel abilities through VLSI chips, and is called the Lneuro 1.0 (for LEP neuromimetic) circuit. The modular architecture of the circuit makes it possible to build various kinds of boards to match the expected range of applications or to increase the power of the system by adding more hardware. The resulting machine remains reconfigurable to accommodate a specific problem to some extent. A small-scale machine has been realized using 16 Lneuros, to experimentally test the behavior of this architecture. Results are presented on an integer version of Kohonen feature maps. The speedup factor increases regularly with the number of clusters involved (to a factor of 80). Some ways to improve this family of neural network simulation machines are also investigated.
symposium on principles of programming languages | 2006
Albert Cohen; Marc Duranton; Christine Eisenbeis; Claire Pagetti; Florence Plateau; Marc Pouzet
The design of high-performance stream-processing systems is a fast growing domain, driven by markets such like high-end TV, gaming, 3D animation and medical imaging. It is also a surprisingly demanding task, with respect to the algorithmic and conceptual simplicity of streaming applications. It needs the close cooperation between numerical analysts, parallel programming experts, real-time control experts and computer architects, and incurs a very high level of quality insurance and optimization.In search for improved productivity, we propose a programming model and language dedicated to high-performance stream processing. This language builds on the synchronous programming model and on domain knowledge -- the periodic evolution of streams -- to allow correct-by-construction properties to be proven by the compiler. These properties include resource requirements and delays between input and output streams. Automating this task avoids tedious and error-prone engineering, due to the combinatorics of the composition of filters with multiple data rates and formats. Correctness of the implementation is also difficult to assess with traditional (asynchronous, simulation-based) approaches. This language is thus provided with a relaxed notion of synchronous composition, called n-synchrony: two processes are n-synchronous if they can communicate in the ordinary (0-)synchronous model with a FIFO buffer of size n.Technically, we extend a core synchronous data-flow language with a notion of periodic clocks, and design a relaxed clock calculus (a type system for clocks) to allow non strictly synchronous processes to be composed or correlated. This relaxation is associated with two sub-typing rules in the clock calculus. Delay, buffer insertion and control code for these buffers are automatically inferred from the clock types through a systematic transformation into a standard synchronous program. We formally define the semantics of the language and prove the soundness and completeness of its clock calculus and synchronization transformation. Finally, the language is compared with existing formalisms.
compilers architecture and synthesis for embedded systems | 2010
Cupertino Miranda; Antoniu Pop; Philippe Dumont; Albert Cohen; Marc Duranton
Tuning applications for multicore systems involve subtle concurrency concepts and target-dependent optimizations. This paper advocates for a streaming execution model, called ER, where persistent processes communicate and synchronize through a multi-consumer processing applications, we demonstrate the scalability and efficiency advantages of streaming compared to data-driven scheduling. To exploit these benefits in compilers for parallel languages, we propose an intermediate representation enabling the compilation of data-flow tasks into streaming processes. This intermediate representation also facilitates the application of classical compiler optimizations to concurrent programs.
Archive | 1990
J. B. Theeten; Marc Duranton; N. Mauduit; J.-A. Sirat
Neural network simulations are often limited because of the time required for both the learning and the evaluation phase of the simulation. Our parallel digital LNcuro circuit drastically reduces these times by updating synaptic coefficients related to one neuron in parallel. Contributions of ‘input’ neurons to one output neuron are also computed in parallel.
asia and south pacific design automation conference | 2014
Fabien Clermidy; Rodolphe Héliot; Alexandre Valentian; Christian Gamrat; Olivier Bichler; Marc Duranton; Bilel Blehadj; Olivier Temam
This paper aims at presenting how new technologies can overcome classical implementation issues of Neural Networks. Resistive memories such as Phase Change Memories and Conductive-Bridge RAM can be used for obtaining low-area synapses thanks to programmable resistance also called Memristors. Similarly, the high capacitance of Through Silicon Vias can be used to greatly improve analog neurons and reduce their area. The very same devices can also be used for improving connectivity of Neural Networks as demonstrated by an application. Finally, some perspectives are given on the usage of 3D monolithic integration for better exploiting the third dimension and thus obtaining systems closer to the brain.
digital systems design | 2003
Om Prakash Gangwal; Johan Janssen; Selliah Rathnam; Erwin B. Bellers; Marc Duranton
Media processing system-on-chips (SoCs) mainly consist of audio encoding/decoding (e.g. AC-3, MP3), video encoding/decoding (e.g. H263, MPEG-2) and video pixel processing functions (e.g. de-interlacing, noise reduction). Video pixel processing functions have very high computational demands, as they require a large amount of computations on large amount of data (note that the data are pixels of completely decoded pictures). In this paper, we focus on video pixel processing functions. Usually, these functions are implemented in dedicated hardware. However, flexibility (by means of programmability or reconfigurability) is needed to introduce the latest innovative algorithms, to allow differentiation of products, and to allow bug fixing after fabricating chips. It is impossible to fulfill the computational requirements of these functions by current programmable media processors. To achieve efficient implementations for flexible solutions, we will study, in this paper, the application characteristics of some representative video pixel processing functions. The characteristics considered are granularity of operations, amount and kind of data accesses and degree of parallelism present in these functions. We observe that from computational granularity point of view many functions can be expressed in terms of kernels e.g. Median3 (i.e. median of three values), finite impulse response (FIR) filters, table lookups (LUT) etc. that are coarser grain than ALU, Mult, MAC, etc. Regarding the kind of data accesses, we categorize these functions as regular, regular with some data rearrangement and irregular data access patterns. Furthermore, the degree of parallelism present in these functions is expressed in terms of data level parallelism (DLP) and instruction/operation level parallelism (ILP). We show with an example that these properties can be exploited to make specialized programmable processors.
embedded software | 2005
Albert Cohen; Marc Duranton; Christine Eisenbeis; Claire Pagetti; Florence Plateau; Marc Pouzet
We propose a programming model dedicated to real-time video-streaming applications for embedded media devices, including high-definition TVs. This model is built on the synchronous programming model extended with domain-specific knowledge --- periodic evolution of streams --- to allow correct-by-construction properties of the application to be proven by the compiler. These properties include buffer requirements and delays between input and output streams.Such properties are tedious to analyze by hand, due to the combinatorics of video filters, multiple data rates and formats. We show how to extend a core synchronous data-flow language with a notion of periodic clocks, and to design a relaxed clock calculus (a type system for clocks) to allow non strictly synchronous processes to be composed. This relaxation is associated with a subtyping rule in the clock calculus. Delay, buffer insertion and control code for these buffers are automatically inferred from the clock types through a systematic program transformation.
international conference on microelectronics | 1996
Marc Duranton
Real-time and embedded applications of image processing like pattern recognition, shape analysis etc. (using classical or less classical methods such as neural networks) are computer intensive tasks that lead to complex systems. Furthermore, the skyrocketting demand for those techniques has led to a flurry of algorithms that must be rapidly implemented, evaluated and finally tuned to real-world cases. This is why LEP has developed the fully programmable vectorial processor L-Neuro 2.3, which is a parallel chip composed of an array of twelve DSPs (Digital Signal Processors). It can be used for neurocomputing, fuzzy logics applications, real-time image processing, digital signal processing and all applications that can take advantage of cooperating DSPs. The now available chip is able to perform up to 2 Giga arithmetic operations per second, and has a peak throughput of 1.5 Gigabytes per second.
compilers, architecture, and synthesis for embedded systems | 2014
Bilel Belhadj; Alexandre Valentian; Pascal Vivet; Marc Duranton; Liqiang He; Olivier Temam
3D stacking is a promising technology (low latency/power/area, high bandwidth); its main shortcoming is increased power density. Simultaneously, motivated by energy constraints, architectures are evolving towards greater customization, with tasks delegated to accelerators. Due to the widespread use of machine-learning algorithms and the re-emergence of neural networks (NNs) as the preferred such algorithms, NN accelerators are receiving increased at-tention. They turn out to be well matched to 3D stacking: inherently 3D structures with a low power density and high across-layer bandwidth requirements. We present what is, to the best of our knowledge, the first 3D stacked NN accelerator.
european conference on parallel processing | 2002
Albert Cohen; Daniela Genius; Abdesselem Kortebi; Zbigniew Chamski; Marc Duranton; Paul Feautrier
This paper aims at modeling video stream applications with structured data and multiple clocks. Multi-Periodic Process Networks (MPPN) are real-time process networks with an adaptable degree of synchronous behavior and a hierarchical structure. MPPN help to describe stream-processing applications and deduce resource requirements such as parallel functional units, throughput and buffer sizes.