Damien Stehlé
University of Sydney School of Mathematics and Statistics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Damien Stehlé.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
While the previous chapters have made clear that it is common practice to certify floating-point algorithms with pen-and-paper proofs, this practice can lead to subtle bugs. Indeed, floating-point arithmetic introduces numerous special cases, and examining all the details would be tedious. As a consequence, the certification process tends to focus on the main parts of the correctness proof, so that it does not grow out of reach.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
Though very useful in many situations, the fixed-precision floating-point formats that are available in hardware or software in our computers may sometimes prove insufficient. There are reasonably rare cases when the binary64/decimal64 or binary128/decimal128 floating-point numbers of the IEEE 754 standard are too crude as approximations of the real numbers. This may occur for example when dealing with ill-conditioned numerical problems: internal computations with very high precision may be needed to obtain a meaningful final result.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
The previous chapters have given an overview of interesting properties and algorithms that can be built on an IEEE 754-compliant floating-point arithmetic. In this chapter, we discuss the practical issues encountered when trying to implement such algorithms in actual computers using actual programming languages. In particular, we discuss the relationship between standard compliance, portability, accuracy, and performance. This chapter is useful to programmers wishing to obtain a standard-compliant behavior from their programs, but it is also useful for understanding how performance may be improved by relaxing standard compliance and also what traps one may fall into.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
The fused multiply-add (FMA) instruction makes it possible to evaluate ab + c, where a, b, and c are floating-point numbers, with one final rounding only.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
The elementary functions are the most common mathematical functions: sine, cosine, tangent and their inverses, exponentials and logarithms of radices e, 2 or 10, etc. They appear everywhere in scientific computing; thus being able to evaluate them quickly and accurately is important for many applications. Various very different methods have been used for evaluating them: polynomial or rational approximations, shift-and-add algorithms, table-based methods, etc. The choice of the method greatly depends on whether the function will be implemented on hardware or software, on the target precision (for instance, table-based methods are very good for low precision, but unrealistic for very high precision), and on the required performance (in terms of speed, accuracy, memory consumption, size of code, etc.). With regard to performance, one will also resort to different methods depending on whether one wishes to optimize average performance or worst-case performance.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
Our main focus in this chapter is the IEEE 754-2008 Standard for Floating-Point Arithmetic [267] , a revision and merge of the earlier IEEE 754-1985 [12] and IEEE 854-1987 [13] standards. A paper written in 1981 by Kahan, Why Do We Need a Floating-Point Standard? [315], depicts the rather messy situation of floating-point arithmetic before the 1980s. Anybody who takes the view that the current standard is too constraining and that circuit and system manufacturers could build much more efficient machines without it should read that paper and think about it. Even if there were at that time a few reasonably good environments, the various systems available then were so different that writing portable yet reasonably efficient numerical software was extremely difficult. For instance, as pointed out in [553], sometimes a programmer had to insert multiplications by 1. 0 to make a program work reliably.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
In this chapter, we focus on the computation of sums and dot products, and on the evaluation of polynomials in IEEE 754 floating-point arithmetic.1 Such calculations arise in many fields of numerical computing. Computing sums is required, e.g., in numerical integration and the computation of means and variances. Dot products appear everywhere in numerical linear algebra. Polynomials are used to approximate many functions (see Chapter 11).
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
Continued fractions make it possible to build very good (indeed, the best possible, in a sense that will be made explicit by Theorems 49 and 50) rational approximations to real numbers. As such, they naturally appear in many problems of number theory, discrete mathematics, and computer science. Since floating-point numbers are rational approximations to real numbers, it is not surprising that continued fractions play a role in some areas of floating-point arithmetic.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
As we have seen in previous chapters (especially in Chapters 2 and 4), requiring correctly rounded arithmetic operations has a number of advantages.
Archive | 2010
Jean-Michel Muller; Nicolas Brisebarre; Florent de Dinechin; Claude-Pierre Jeannerod; Vincent Lefèvre; Guillaume Melquiond; Nathalie Revol; Damien Stehlé; Serge Torres
As said in the Introduction, roughly speaking, a radix-β floating-point number x is a number of the form m · β e , where β is the radix of the floating-point system, m such that |m| < β is the significand of x, and e is its exponent. And yet, portability, accuracy, and the ability to prove interesting and useful properties as well as to design smart algorithms require more rigorous definitions, and much care in the specifications. This is the first purpose of this chapter. The second one is to deal with basic problems: rounding, exceptions, properties of real arithmetic that become wrong in floating-point arithmetic, best choices for the radix, and radix conversions.