Benoit Dupont de Dinechin

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Benoit Dupont de Dinechin is active.

Explore More

Publication

Featured researches published by Benoit Dupont de Dinechin.

symposium on code generation and optimization | 2009

Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency

Benoit Boissinot; Alain Darte; Fabrice Rastello; Benoit Dupont de Dinechin; Christophe Guillon

Static single assignment (SSA) form is an intermediate program representation in which many code optimizations can be performed with fast and easy-to-implement algorithms. However, some of these optimizations create situations where the SSA variables arising from the same original variable now have overlapping live ranges. This complicates the translation out of SSA code into standard code. There are three issues to consider: correctness, code quality (elimination of copies), and algorithm efficiency (speed and memory footprint). Briggs et al. proposed patches to correct the initial approach of Cytron et al. A cleaner and more general approach was proposed by Sreedhar et al., along with techniques to reduce the number of generated copies. We propose a new approach based on coalescing and a precise view of interferences, in which correctness and optimizations are separated. Our approach is provably correct and simpler to implement, with no patches or particular cases as in previous solutions, while reducing the number of generated copies. Also, experiments with SPEC CINT2000 show that it is 2x faster and 10x less memory-consuming than the Method III of Sreedhar et al., which makes it suitable for just-in-time compilation.

conference on advanced signal processing algorithms architectures and implemenations | 2004

A floating-point library for integer processors

Christian Bertin; Nicolas Brisebarre; Benoit Dupont de Dinechin; Claude-Pierre Jeannerod; Christophe Monat; Jean-Michel Muller; Saurabh-Kumar Raina; Arnaud Tisserand

This paper presents a C library for the software support of single precision floating-point (FP) arithmetic on processors without FP hardware units such as VLIW or DSP processor cores for embedded applications. This library provides several levels of compliance to the IEEE 754 FP standard. The complete specifications of the standard can be used or just some relaxed characteristics such as restricted rounding modes or computations without denormal numbers. This library is evaluated on the ST200 VLIW processors from STMicroelectronics.

international conference on parallel processing | 2006

SCAN: a heuristic for near-optimal software pipelining

Florent Blachot; Benoit Dupont de Dinechin; Guillaume Huard

Software pipelining is a classic compiler optimization that improves the performances of inner loops on instruction-level parallel processors. In the context of embedded computing, applications are compiled prior to manufacturing the system, so it is possible to invest large amounts of time for compiler optimizations. Traditionally, software pipelining is performed by heuristics such as iterative modulo scheduling. Optimal software pipelining can be formulated as integer linear programs, however these formulations can take exponential time to solve. As a result, the size of loops that can be optimally software pipelined is quite limited. In this article, we present the SCAN heuristic, which enables to benefit from the integer linear programming formulations of software pipelining even on loops of significant size. The principle of the SCAN heuristic is to iteratively constrain the software pipelining problem until the integer linear programming formulation is solvable in reasonable time. We applied the SCAN heuristic to a multimedia benchmark for the ST200 VLIW processor. We show that it almost always compute an optimal solution for loops that are intractable by classic integer linear programming approaches. This improves performances by up to 33.3% over the heuristic modulo scheduling of the production ST200 compiler.

asian symposium on programming languages and systems | 2011

A non-iterative data-flow algorithm for computing liveness sets in strict SSA programs

Benoit Boissinot; Florian Brandner; Alain Darte; Benoit Dupont de Dinechin; Fabrice Rastello

We revisit the problem of computing liveness sets (the sets of variables live-in and live-out of basic blocks) for programs in strict static single assignment (SSA). In strict SSA, aka SSA with dominance property, the definition of a variable always dominates all its uses. We exploit this property and the concept of loop-nesting forest to design a fast two-phases data-flow algorithm: a first pass traverses the control-flow graph (CFG), propagating liveness information backwards, a second pass traverses a loop-nesting forest, updating liveness sets within loops. The algorithm is proved correct even for irreducible CFGs. We analyze its algorithmic complexity and evaluate its efficiency on SPECINT 2000. Compared to traditional iterative data-flow approaches, which perform updates until a fixed point is reached, our algorithm is 2 times faster on average. Other approaches are possible that propagate from uses to definitions, one variable at a time, instead of unioning sets as in data-flow analysis. Our algorithm is 1.43 times faster than the fastest alternative on average, when sets are represented as bitsets and for optimized programs, which have non-trivial live-ranges and a larger number of variables.

european conference on parallel processing | 2008

Inter-block Scoreboard Scheduling in a JIT Compiler for VLIW Processors

Benoit Dupont de Dinechin

We present a postpass instruction scheduling technique suitable for Just-In-Time (JIT) compilers targeted to VLIW processors. Its key features are: reduced compilation time and memory requirements; satisfaction of scheduling constraints along all program paths; and the ability to preserve existing prepass schedules, including software pipelines. This is achieved by combining two ideas: instruction scheduling similar to the dynamic scheduler of an out-of-order superscalar processor; the satisfaction of inter-block scheduling constraints by propagating them across the control-flow graph until fixed-point. We implemented this technique in a Common Language Infrastructure JIT compiler for the ST200 VLIW processors and the ARM processors.

compiler construction | 2014

Using the SSA-Form in a Code Generator

Benoit Dupont de Dinechin

In high-end compilers such as Open64, GCC or LLVM, the Static Single Assignment (SSA) form is a structural part of the target-independent program representation that supports most of the code optimizations. However, aggressive compilation also requires that optimizations that are more effective with the SSA form be applied to the target-specific program representations operated by the code generator, that is, the set of compiler phases after and including instruction selection.

Microelectronic Engineering | 2000

DSP-MCU processor optimization for portable applications

Benoit Dupont de Dinechin; Christophe Monat; Patrick Blouet; Christian Bertin

Abstract Existing portable systems such as digital cellular phones are designed around a Micro-Controller Unit (MCU), a Digital Signal Processor (DSP), and Dedicated Hardware Blocks (DHBs). The next-generation of portable systems require an extended battery-powered life, lower manufacturing costs, shorter time-to-market delays, and higher digital signal processing performance with the flexibility of software implementation. These requirements can be met by generalizing the DSP with the VLIW and EPIC instruction-level parallel processing techniques. The resulting DSP-MCU processors allow high-performance digital signal processing to be implemented in software. Unlike traditional DSPs, DSP-MCU processors enableC/C++ compilers to generate high-performance and compact code, and effectively support Real-Time Operating Systems (RTOS). This paper discusses the architecture and implementation requirements of the next-generation DSP-MCU processors for portable applications, in particular in the telecommunications area.

ACM Transactions in Embedded Computing Systems | 2011

Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

Sid Touati; Frederic Brault; Karine Deschinkel; Benoit Dupont de Dinechin

Integrating register allocation and software pipelining of loops is an active research area. We focus on techniques that precondition the dependence graph before software pipelining in order to ensure that no register spill instructions are inserted by the register allocator in the software pipelined loop. If spilling is not necessary for the input code, preconditioning techniques insert dependence arcs so that the maximum register pressure MAXLIVE achieved by any loop schedule is below the number of available registers, without hurting the initiation interval if possible. When a solution exists, a spill-free software pipeline is guaranteed to exist. Existing preconditioning techniques consider one register type (register class) at a time [Deschinkel and Touati 2008]. In this article, we extend preconditioning techniques so that multiple register types are considered simultaneously. First, we generalize the existing theory of register pressure minimization for cyclic scheduling. Second, we implement our method inside the production compiler of the ST2xx VLIW family, and we demonstrate its efficiency on industry benchmarks (FFMPEG, MEDIABENCH, SPEC2000, SPEC2006). We demonstrate a high spill reduction rate without a significant initiation interval loss.

compiler construction | 1999

Extending Modulo Scheduling with Memory Reference Merging

Benoit Dupont de Dinechin

We describe an extension of modulo scheduling, called “memory reference merging”, which improves the management of cache bandwidth on microprocessors such as the DEC Alpha 21164. The principle is to schedule together memory references that are likely to be merged in a read buffer (LOADs), or a write buffer (STOREs). This technique has been used over several years on the Cray T3E block scheduler, and was later generalized to the Cray T3E software pipeliner. Experiments on the Cray T3E demonstrate the benefits of memory reference merging.

Archive | 2003