Is this you? Create Your Porfile

Guillaume Revy

École normale supérieure de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guillaume Revy is active.

Explore More

Publication

Featured researches published by Guillaume Revy.

symposium on computer arithmetic | 2011

Automatic Generation of Fast and Certified Code for Polynomial Evaluation

Guillaume Revy

Designing an efficient floating-point implementation of a function based on polynomial evaluation requires being able to find an accurate enough evaluation code, exploiting at most the target architecture features. This article introduces CGPE, a tool dealing with the generation of fast and certified codes for the evaluation of bivariate polynomials. First we discuss the issue underlying the evaluation scheme combinatorics before giving an overview of the CGPE tool. The approach we propose consists in two steps: the generation of evaluation schemes by using some heuristics so as to quickly find some of low latency, and the selection that mainly consists in automatically checking their scheduling on the given target and validating their accuracy. Then, we present on-going development and ideas for possible improvements of the whole process. Finally, we illustrate the use of CGPE on some examples, and show how it allows us to generate fast and certified codes in a few seconds and thus to reduce the development time of libms like FLIP.

forum on specification and design languages | 2006

UML/XML-Based Approach to Hierarchical AMS Synthesis

Ian O'Connor; Faress Tissafi-Drissi; Guillaume Revy

This chapter explores the suitability of unified modeling language (UML) techniques for defining hierarchical relationships in analogue and mixed signal (AMS) circuit blocks, and extensible markup language (XML) for storing soft AMS intellectual property (IP) design rules and firm AMS IP design data. Both aspects are essential to raising the abstraction level in synthesis of this class of block in SoCs. The various facets of AMS IP are discussed, and explicit mappings to concepts in UML are demonstrated. Then, through a simple example block, these concepts are applied and the successful modification of an existing analogue synthesis tool to incorporate these ideas is proven. The central data format of this tool is XML, and several examples are given showing how this metalanguage can be used in both AMS soft-IP creation and firm-IP synthesis.

field-programmable logic and applications | 2010

Multiplicative Square Root Algorithms for FPGAs

Florent de Dinechin; Mioara Joldes; Bogdan Pasca; Guillaume Revy

Most current square root implementations for FPGAs use a digit recurrence algorithm which is well suited to their LUT structure. However, recent computing-oriented FPGAs include embedded multipliers and RAM blocks which can also be used to implement quadratic convergence algorithms, very high radix digit recurrences, or polynomial approximation algorithms. The cost of these solutions is evaluated and compared, and a complete implementation of a polynomial approach is presented within the open-source FloPoCo framework. This polynomial approach allows a shorter latency and higher frequency than the digit recurrence approach, and improves over previous multiplicative approaches. However, the cost of IEEE-compliant correct rounding is shown to be very high.

international symposium on industrial embedded systems | 2007

Faster floating-point square root for integer processors

Claude-Pierre Jeannerod; Hervé Knochel; Christophe Monat; Guillaume Revy

This paper presents some work in progress on fast and accurate floating-point arithmetic software for ST200-based embedded systems. We show how to use some key architectural features to design codes that achieve correct rounding-to-nearest without sacrificing for efficiency. This is illustrated with the square root function, whose implementation given here is faster by over 35% than the previously best one for such systems.

symposium on computer arithmetic | 2009

A New Binary Floating-Point Division Algorithm and Its Software Implementation on the ST231 Processor

Claude-Pierre Jeannerod; Hervé Knochel; Christophe Monat; Guillaume Revy; Gilles Villard

This paper deals with the design and implementation of low latency software for binary floating-point division with correct rounding to nearest.The approach we present here targets a VLIW integer processor of the ST200 family, and is based on fast and accurate programs for evaluating some particular bivariate polynomials. We start by giving approximation and evaluation error conditions that are sufficient to ensure correct rounding. Then we describe the heuristics used to generate such evaluation programs, as well as those used to automatically validate their accuracy.Finally, we propose, for the binary32 format, a complete C implementation of the resulting division algorithm. With the ST200 compiler and compared to previous implementations, the speed-up observed with our approach is by a factor of almost 1.8.

international conference on pervasive and embedded computing and communication systems | 2014

Code Size and Accuracy-Aware Synthesis of Fixed-Point Programs for Matrix Multiplication

Matthieu Martel; Amine Najahi; Guillaume Revy

In digital signal processing, many primitives boil down to a matrix multiplication. In order to enable savings in time, energy consumption, and on-chip area, these primitives are often implemented in fixed-point arithmetic. Various conflicting goals undermine the process of writing fixed-point codes, such as numerical accuracy, run-time latency, and size of the codes. In this article, we introduce a new methodology to automate the synthesis of small and accurate codes for matrix multiplication in fixed-point arithmetic. Our approach relies on a heuristic to merge matrix rows or columns in order to reduce the synthesized code size, while guaranteeing a target accuracy. We suggest a merging strategy based on finding closest pairs of vectors, which makes it possible to address in a few seconds problems such as the synthesis of small and accurate codes for size-64 and more matrix multiplication. Finally, we illustrate its efficiency on a set of benchmarks, and we show that it allows to reduce the synthesized code size by more than 50% while maintaining good numerical properties.

parallel symbolic computation | 2010

Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors

Claude-Pierre Jeannerod; Jean-Michel Muller; Guillaume Revy; Christian Bertin; Jingyan Jourdan-Lu; Hervé Knochel; Christophe Monat

Recently, some high-performance IEEE 754 single precision floating-point software has been designed, which aims at best exploiting some features (integer arithmetic, parallelism) of the STMicroelectronics ST200 Very Long Instruction Word (VLIW) processor. We review here the techniques and software tools used or developed for this design and its implementation, and how they allowed very high instruction-level parallelism (ILP) exposure. Those key points include a hierarchical description of function evaluation algorithms, the exploitation of the standard encoding of floating-point data, the automatic generation of fast and accurate polynomial evaluation schemes, and some compiler optimizations.

application-specific systems, architectures, and processors | 2015

Range reduction based on Pythagorean triples for trigonometric function evaluation

Hugues de Lassus Saint-Genies; David Defour; Guillaume Revy

Software evaluation of elementary functions usually requires three steps: a range reduction, a polynomial evaluation, and a reconstruction step. These evaluation schemes are designed to give the best performance for a given accuracy, which requires a fine control of errors. One of the main issues is to minimize the number of sources of error and/or their influence on the final result. The work presented in this article addresses this problem as it removes one source of error for the evaluation of trigonometric functions. We propose a method that eliminates rounding errors from tabulated values used in the second range reduction for the sine and cosine evaluation. When targeting correct rounding, we show that such tables are smaller and make the reconstruction step less expensive than existing methods. This approach relies on Pythagorean triples generators. Finally, we show how to generate tables indexed by up to 10 bits in a reasonable time and with little memory consumption.

symbolic and numeric algorithms for scientific computing | 2014

Automated Synthesis of Target-Dependent Programs for Polynomial Evaluation in Fixed-Point Arithmetic

Amine Najahi; Guillaume Revy

The design of both fast and numerically accurate programs is a real challenge. Thus, the CGPE tool was introduced to assist programmers in synthesizing fast and numerically certified codes in fixed-point arithmetic for the particular case of polynomial evaluation. For performance purposes, this tool produces programs using exclusively unsigned arithmetic and addition/subtraction or multiplication operations, thus requiring some constraints on the fixed-point operands. These choices are well-suited when dealing with the implementation of certain mathematical functions, however they prevent from tackling a broader class of polynomial evaluation problems. In this paper, we first expose a rigorous arithmetic model for CGPE that takes into account signed arithmetic. Then, in order to make the most out of advanced instructions, we enhance this tool with a multi-criteria instruction selection module. This allows us to optimize the generated codes according to different criteria, like operation count, evaluation latency, or accuracy. Finally, we illustrate this technique on operation count, and we show that it yields an average reduction of up to 22.3% of the number of operations in the synthesized codes of some functions. We also explicit practical examples to show the impact of using accuracy based rather than latency based instruction selection.

symposium on computer arithmetic | 2011

How to Square Floats Accurately and Efficiently on the ST231 Integer Processor

Claude-Pierre Jeannerod; Jingyan Jourdan-Lu; Christophe Monat; Guillaume Revy

We consider the problem of computing IEEE floating-point squares by means of integer arithmetic. We show how to exploit the specific properties of squaring in order to design and implement algorithms that have much lower latency than those for general multiplication, while still guaranteeing correct rounding. Our algorithms are parameterized by the floating-point format, aim at high instruction-level parallelism (ILP) exposure, and cover all rounding modes. We show further that their C implementation for the binary32 format yields efficient codes for targets like the ST231 VLIW integer processor from ST Microelectronics, with a latency at least 1.75x smaller than that of general multiplication in the same context.

Explore More