Alexandru Plesco
École normale supérieure de Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexandru Plesco.
design, automation, and test in europe | 2013
Christophe Alias; Alain Darte; Alexandru Plesco
Some data- and compute-intensive applications can be accelerated by offloading portions of codes to platforms such as GPGPUs or FPGAs. However, to get high performance for these kernels, it is mandatory to restructure the application, to generate adequate communication mechanisms for the transfer of remote data, and to make good usage of the memory bandwidth. In the context of the high-level synthesis (HLS), from a C program, of hardware accelerators on FPGA, we show how to automatically generate optimized remote accesses for an accelerator communicating to an external DDR memory. Loop tiling is used to enable block communications, suitable for DDR memories. Pipelined communication processes are generated to overlap communications and computations, thereby hiding some latencies, in a way similar to double buffering. Finally, not only intra-tile but also inter-tile data reuse is exploited to avoid remote accesses when data are already available in the local memory. Our first contribution is to show how to generate the sets of data to be read from (resp. written to) the external memory just before (resp. after) each tile so as to reduce communications and reuse data as much as possible in the accelerator. The main difficulty arises when some data may be (re)defined in the accelerator and should be kept locally. Our second contribution is an optimized code generation scheme, entirely at source-level, i.e., in C, that allows us to compile all the necessary glue (the communication processes) with the same HLS tool as for the computation kernel. Both contributions use advanced polyhedral techniques for program analysis and transformation. Experiments with Altera HLS tools demonstrate how to use our techniques to efficiently map C kernels to FPGA.
applied reconfigurable computing | 2011
Christophe Alias; Bogdan Pasca; Alexandru Plesco
Recent increase in the complexity of the circuits has brought high-level synthesis tools as a must in the digital circuit design. However, these tools come with several limitations, and one of them is the efficient use of pipelined arithmetic operators. This paper explains how to generate efficient hardware with floating-point pipelined operators for regular codes with perfect loop nests. The part to be mapped to the operator is identified, then the program is scheduled so that each intermediate result is produced exactly at the time it is needed by the operator, avoiding pipeline stalling and temporary buffers. Finally, we show how to generate the VHDL code for the control unit and how to link it with specialized pipelined floating-point operators generated using the open-source FloPoCo tool. The method has been implemented in the Bee research compiler and experimental results on DSP kernels show promising results with a minimum of 94% efficient utilization of the pipelined operators for a complex kernel.
application specific systems architectures and processors | 2011
F. de Dinechin; Jean-Michel Muller; Bogdan Pasca; Alexandru Plesco
Solving the Table Makers Dilemma, for a given function and a given target floating-point format, requires testing the value of the function, with high precision, at a very large number of consecutive values. We give an algorithm that allows for performing such computations on a very regular architecture, and present an FPGA implementation of that algorithm.
ACM Transactions on Architecture and Code Optimization | 2017
Christophe Alias; Alexandru Plesco
Hardware accelerators generated by polyhedral synthesis techniques make extensive use of affine expressions (affine functions and convex polyhedra) in control and steering logic. Since the control is pipelined, these affine objects must be evaluated at the same time for different values, which forbids aggressive reuse of operators. In this article, we propose a method to factorize a collection of affine expressions without preventing pipelining. Our key contributions are (i) to use semantic factorizations exploiting arithmetic properties of addition and multiplication and (ii) to rely on a cost function whose minimization ensures correct usage of FPGA resources. Our algorithm is totally parameterized by the cost function, which can be customized to fit a target FPGA. Experimental results on a large pool of linear algebra kernels show a significant improvement compared to traditional low-level RTL optimizations. In particular, we show how our method reduces resource consumption by revealing hidden strength reductions.
Symposium en Architecture de machines (Sympa 2008) | 2007
Alexandru Plesco; Tanguy Risset
Archive | 2011
Christophe Alias; Alain Darte; Alexandru Plesco
application-specific systems, architectures, and processors | 2010
Christophe Alias; Alain Darte; Alexandru Plesco
Archive | 2011
Christophe Alias; Alain Darte; Alexandru Plesco
Archive | 2016
Christophe Alias; Fabrice Rastello; Alexandru Plesco
Archive | 2016
Christophe Alias; Fabrice Rastello; Alexandru Plesco