Christophe Monat | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christophe Monat is active.

Explore More

Publication

Featured researches published by Christophe Monat.

conference on advanced signal processing algorithms architectures and implemenations | 2004

A floating-point library for integer processors

Christian Bertin; Nicolas Brisebarre; Benoit Dupont de Dinechin; Claude-Pierre Jeannerod; Christophe Monat; Jean-Michel Muller; Saurabh-Kumar Raina; Arnaud Tisserand

This paper presents a C library for the software support of single precision floating-point (FP) arithmetic on processors without FP hardware units such as VLIW or DSP processor cores for embedded applications. This library provides several levels of compliance to the IEEE 754 FP standard. The complete specifications of the standard can be used or just some relaxed characteristics such as restricted rounding modes or computations without denormal numbers. This library is evaluated on the ST200 VLIW processors from STMicroelectronics.

international symposium on industrial embedded systems | 2007

Faster floating-point square root for integer processors

Claude-Pierre Jeannerod; Hervé Knochel; Christophe Monat; Guillaume Revy

This paper presents some work in progress on fast and accurate floating-point arithmetic software for ST200-based embedded systems. We show how to use some key architectural features to design codes that achieve correct rounding-to-nearest without sacrificing for efficiency. This is illustrated with the square root function, whose implementation given here is faster by over 35% than the previously best one for such systems.

symposium on computer arithmetic | 2009

A New Binary Floating-Point Division Algorithm and Its Software Implementation on the ST231 Processor

Claude-Pierre Jeannerod; Hervé Knochel; Christophe Monat; Guillaume Revy; Gilles Villard

This paper deals with the design and implementation of low latency software for binary floating-point division with correct rounding to nearest.The approach we present here targets a VLIW integer processor of the ST200 family, and is based on fast and accurate programs for evaluating some particular bivariate polynomials. We start by giving approximation and evaluation error conditions that are sufficient to ensure correct rounding. Then we describe the heuristics used to generate such evaluation programs, as well as those used to automatically validate their accuracy.Finally, we propose, for the binary32 format, a complete C implementation of the resulting division algorithm. With the ST200 compiler and compared to previous implementations, the speed-up observed with our approach is by a factor of almost 1.8.

parallel symbolic computation | 2010

Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors

Claude-Pierre Jeannerod; Jean-Michel Muller; Guillaume Revy; Christian Bertin; Jingyan Jourdan-Lu; Hervé Knochel; Christophe Monat

Recently, some high-performance IEEE 754 single precision floating-point software has been designed, which aims at best exploiting some features (integer arithmetic, parallelism) of the STMicroelectronics ST200 Very Long Instruction Word (VLIW) processor. We review here the techniques and software tools used or developed for this design and its implementation, and how they allowed very high instruction-level parallelism (ILP) exposure. Those key points include a hierarchical description of function evaluation algorithms, the exploitation of the standard encoding of floating-point data, the automatic generation of fast and accurate polynomial evaluation schemes, and some compiler optimizations.

symposium on computer arithmetic | 2005

Division by constant for the ST100 DSP microprocessor

Jean-Michel Muller; Arnaud Tisserand; B. de Dinechin; Christophe Monat

Algorithms for Euclidean (i.e., integer) division by a constant operation are presented. They allow fast computation for some values of the divisor (known at compile time) or also when both quotient and modulus are required. These algorithms are based on the multiply-accumulate instruction and the 40-bit arithmetic available in DSPs such as the ST100 DSP from STMicroelectronics. The results are demonstrated in the case of standard speech coding applications.

Microelectronic Engineering | 2000

DSP-MCU processor optimization for portable applications

Benoit Dupont de Dinechin; Christophe Monat; Patrick Blouet; Christian Bertin

Abstract Existing portable systems such as digital cellular phones are designed around a Micro-Controller Unit (MCU), a Digital Signal Processor (DSP), and Dedicated Hardware Blocks (DHBs). The next-generation of portable systems require an extended battery-powered life, lower manufacturing costs, shorter time-to-market delays, and higher digital signal processing performance with the flexibility of software implementation. These requirements can be met by generalizing the DSP with the VLIW and EPIC instruction-level parallel processing techniques. The resulting DSP-MCU processors allow high-performance digital signal processing to be implemented in software. Unlike traditional DSPs, DSP-MCU processors enableC/C++ compilers to generate high-performance and compact code, and effectively support Real-Time Operating Systems (RTOS). This paper discusses the architecture and implementation requirements of the next-generation DSP-MCU processors for portable applications, in particular in the telecommunications area.

international symposium on industrial embedded systems | 2012

Non-generic floating-point software support for embedded media processing

Claude-Pierre Jeannerod; Jingyan Jourdan-Lu; Christophe Monat

This paper presents some work in progress on the design and implementation of efficient floating-point software support for embedded integer processors. We provide quantitative evidence of the benefits of supporting various non-generic (that is, fused, specialized, or paired) operations in addition to the five basic arithmetic operations: for individual calls, speedups range from 1.12 to 4.86, while on DSP kernels and benchmarks, our approach allows us to be up to 1.59x faster.

symposium on computer arithmetic | 2011

How to Square Floats Accurately and Efficiently on the ST231 Integer Processor

Claude-Pierre Jeannerod; Jingyan Jourdan-Lu; Christophe Monat; Guillaume Revy

We consider the problem of computing IEEE floating-point squares by means of integer arithmetic. We show how to exploit the specific properties of squaring in order to design and implement algorithms that have much lower latency than those for general multiplication, while still guaranteeing correct rounding. Our algorithms are parameterized by the floating-point format, aim at high instruction-level parallelism (ILP) exposure, and cover all rounding modes. We show further that their C implementation for the binary32 format yields efficient codes for targets like the ST231 VLIW integer processor from ST Microelectronics, with a latency at least 1.75x smaller than that of general multiplication in the same context.

international symposium on industrial embedded systems | 2017

More accurate complex multiplication for embedded processors

Claude-Pierre Jeannerod; Christophe Monat; Laurent Thévenoux

This paper presents some work in progress on the development of fast and accurate support for complex floatingpoint arithmetic on embedded processors. Focusing on the case of multiplication, we describe algorithms and implementations for computing both the real and imaginary parts with high relative accuracy. We show that, in practice, such accuracy guarantees can be achieved with reasonable overhead compared with conventional algorithms (which are those offered by current implementations and for which the real or imaginary part of a product can have no correct digit at all). For example, the average execution-time overheads when computing an FFT on the ARM Cortex-A53 and -A57 processors range from 1.04x to 1.17x only, while arithmetic costs suggest overheads from 1.5x to 1.8x.

IEEE Transactions on Computers | 2011