Florent de Dinechin
Institut national des sciences Appliquées de Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Florent de Dinechin.
design, automation, and test in europe | 2017
Matei Istoan; Florent de Dinechin
This article presents the new framework for semi-automatic circuit pipelining that will be used in future releases of the FloPoCo generator. From a single description of an operator or datapath, optimized implementations are obtained automatically for a wide range of FPGA targets and a wide range of frequency/latency trade-offs. Compared to previous versions of FloPoCo, the level of abstraction has been raised, enabling easier development, shorter generator code, and better pipeline optimization. The proposed approach is also more flexible than fully automatic pipelining approaches based on retiming: In the proposed technique, the incremental construction of the pipeline along with the circuit graph enables architectural design decisions that depend on the pipeline.
conference on ph.d. research in microelectronics and electronics | 2017
Andrea Bocco; Yves Durand; Florent de Dinechin
The Universal NUMber, or UNUM, is a variable length floating-point format conceived to substitute the current one defined in the IEEE 754 standard. UNUM is able, through an internal algebra based on interval arithmetic, to keep track of the precision during operations, offering better result reliability than IEEE 754. This work discusses the implementation of UNUM arithmetic and reports hardware implementation results of some of the UNUM operators.
Archive | 2018
Jean-Michel Muller; Nicolas Brunie; Florent de Dinechin; Claude-Pierre Jeannerod; Mioara Joldes; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Serge Torres
While the previous chapters have made clear that it is common practice to verify floating-point algorithms with pen-and-paper proofs, this practice can lead to subtle bugs. Indeed, floating-point arithmetic introduces numerous special cases, and examining all the details would be tedious. As a consequence, the verification process tends to focus on the main parts of the correctness proof, so that it does not grow out of reach.
Archive | 2018
Jean-Michel Muller; Nicolas Brunie; Florent de Dinechin; Claude-Pierre Jeannerod; Mioara Joldes; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Serge Torres
Among the many operations that the IEEE 754 standards specify (see Chapter 3), we will focus here and in the next two chapters on the five basic arithmetic operations: addition, subtraction, multiplication, division, and square root. We will also study the fused multiply-add (FMA) operator. We review here some of the known properties and algorithms used to implement each of those operators. Chapter 8 and Chapter 9 will detail some examples of actual implementations in, respectively, hardware and software.
field programmable logic and applications | 2017
Yohann Uguen; Florent de Dinechin; Steven Derrien
FPGAs are well known for their ability to perform non-standard computations not supported by classical microprocessors. Many libraries of highly customizable application-specific IPs have exploited this capablity. However, using such IPs usually requires handcrafted HDL, hence significant design efforts. High Level Synthesis (HLS) lowers the design effort thanks to the use of C/C++ dialects for programming FPGAs. However, high-level C language becomes a hindrance when one wants to express non-standard computations: this languages was designed for programming microprocessors and carries with it many restrictions due to this paradigm. This is especially true when computing with floating-point, whose data-types and evaluation semantics are defined by the IEEE-754 and C11 standards. If the high-level specification was a computation on the reals, then HLS imposes a very restricted implementation space. This work attempts to bridge FPGA application-specific efficiency and HLS ease of use. It specifically targets the ubiquitous floating-point summation-reduction pattern. A source-to-source compiler transforms selected floating-point additions into sequences of simpler operators using non-standard arithmetic formats. This improves performance and accuracy for several benchmarks, while keeping the ease of use of a high-level C description.
field programmable custom computing machines | 2017
Yohann Uguen; Florent de Dinechin; Steven Derrien
Many case studies have demonstrated the potential of Field-Programmable Gate Arrays (FPGAs) as accelerators for a wide range of applications. FPGAs offer massive parallelism and programmability at the bit level. This enables programmers to exploit a range of techniques that avoid many bottlenecks of classical von Neumann computing. However, development costs for FPGAs are orders of magnitude higher than classical programming. A solution would be the use of High-Level Synthesis (HLS) tools, which use C as a hardware description language. However, the C language was designed to be executed on general purpose processors, not to generate hardware. Its datatypes and operators are limited to a small number (more or less matching the hardware operators present in mainstream processors), and HLS tools inherit these limitations. To better exploit the freedom offered by hardware and FPGAs, HLS vendors have enriched the C language with integer and fixed-point types of arbitrary size. Still, the operations on these types remain limited to the basic arithmetic and logic ones. In floating point, the current situation is even worse. The operator set is limited, and the sizes are restricted to 32 and 64 bits. Besides, most recent compilers, including the HLS ones, attempt to follow established standards, in particular C11 and IEEE-754. This ensures bit-exact compatibility with software, but greatly reduces the freedom of optimization by the compiler. For instance, a floating point addition is not associative even though its real equivalent is. In the present work we attempt to give the compiler more freedom. For this, we sacrifice the strict respect of the IEEE-754 and C11 standards, but we replace it with the strict respect of a high-level accuracy specification expressed by the programmer through a pragma. The case study in this work is a program transformation that applies to floating-point additions on a loops critical path. It decomposes them into elementary steps, resizes the corresponding subcomponents to guarantee some user-specified accuracy, and merges and reorders these components to improve performance. The result of this complex sequence of optimizations could not be obtained from an operator generator, since it involves global loop information. For this purpose, we used a compilation flow involving one or several source-to-source transformations operating on the code given to HLS tools (Figure 1).The proposed transformation already works very well on 3 of the 10 FPMarks where it improves both latency and accuracy by an order of magnitude for comparable area. For 2 more benchmarks, the latency is not improved (but not degraded either) due to current limitations of HLS tools. This defines short-term future work. The main result of this work is that HLS tools also have the potential to generate efficient designs for handling floating-point computations in a completely non-standard way. In the longer term, we believe that HLS flows can not only import application-specific operators from the FPGA literature, they can also improve them using high-level, program-level information.
Circuits Systems and Signal Processing | 2017
Hatam Abdoli; Hooman Nikmehr; Naser Movahedinia; Florent de Dinechin
symposium on computer arithmetic | 2018
Martin Kumm; Oscar Gustafsson; Florent de Dinechin; Johannes Kappauf; Peter Zipf
Archive | 2017
Yohann Uguen; Florent de Dinechin
Archive | 2017
Yohann Uguen; Florent de Dinechin; Steven Derrien