Christophe Guillon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christophe Guillon is active.

Explore More

Publication

Featured researches published by Christophe Guillon.

symposium on code generation and optimization | 2009

Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency

Benoit Boissinot; Alain Darte; Fabrice Rastello; Benoit Dupont de Dinechin; Christophe Guillon

Static single assignment (SSA) form is an intermediate program representation in which many code optimizations can be performed with fast and easy-to-implement algorithms. However, some of these optimizations create situations where the SSA variables arising from the same original variable now have overlapping live ranges. This complicates the translation out of SSA code into standard code. There are three issues to consider: correctness, code quality (elimination of copies), and algorithm efficiency (speed and memory footprint). Briggs et al. proposed patches to correct the initial approach of Cytron et al. A cleaner and more general approach was proposed by Sreedhar et al., along with techniques to reduce the number of generated copies. We propose a new approach based on coalescing and a precise view of interferences, in which correctness and optimizations are separated. Our approach is provably correct and simpler to implement, with no patches or particular cases as in previous solutions, while reducing the number of generated copies. Also, experiments with SPEC CINT2000 show that it is 2x faster and 10x less memory-consuming than the Method III of Sreedhar et al., which makes it suitable for just-in-time compilation.

compilers, architecture, and synthesis for embedded systems | 2000

Code generator optimizations for the ST120 DSP-MCU core

B. Dupont de Dinechin; F. de Ferri; Christophe Guillon; A. Stoutchinin

The ST120 Digital Signal Processor Micro-Controller Unit (DSP–MCU) core was designed by STMicroelectronics in order to meet the ever-increasing digital signal processing requirements of portable and consumer applications. Like other recent high-end DSP–MCU cores, the ST120 blends traditional DSP features with modern Instruction-Level Parallelism (ILP) capabilities. Compiler management of the ST120 features presents a unique challenge to the code generation. The ST120 Linear Assembly Optimizer (LAO) effectively exploits instructionlevel parallelism, while enabling compact code size. In this paper, we focus on the LAO implementation of the SSA representation, the IF-conversion, the SLIW scheduling, and the LAO improvements to register allocation. This includes solutions to problems that arise when compiler optimizations are applied to assembly-level, already predicated code.

languages and compilers for parallel computing | 2006

Register allocation: what does the NP-completeness proof of Chaitin et al. really prove? or revisiting register allocation: why and how

Florent Bouchez; Alain Darte; Christophe Guillon; Fabrice Rastello

Register allocation is one of the most studied problems in compilation. It is considered NP-complete since Chaitin et al., in 1981, modeled the problem of assigning temporary variables to k machine registers as the problem of coloring, with k colors, the interference graph associated to the variables. The fact that this graph can be arbitrary proves the NP-completeness of this formulation. However, this original proof does not really show where the complexity of register allocation comes from. Recently, the re-discovery that interference graphs of SSA programs can be colored in polynomial time raised the question: Can we use SSA to do register allocation in polynomial time, without contradicting Chaitin et als NP-completeness result? To address this question and, more generally, the complexity of register allocation, we revisit Chaitin et als proof to identify the interactions between spilling (load/store insertion), coalescing/splitting (removal/ insertion of register moves), critical edges (property of the control flow), and coloring (assignment to registers). In particular, we show that, in general, it is easy to decide if temporary variables can be assigned to k registers or if some spilling is necessary. In other words, the real complexity does not come from the coloring itself (as a misinterpretation Chaitin et als proof may suggest) but comes from critical edges and from the optimizations of spilling and coalescing.

compilers, architecture, and synthesis for embedded systems | 2004

Procedure placement using temporal-ordering information: dealing with code size expansion

Christophe Guillon; Fabrice Rastello; Thierry Bidault; Florent Bouchez

In a direct-mapped instruction cache, all instructions that have the same memory address modulo the cache size, share a common and unique cache slot. Instruction cache conflicts can be partially handled at linked time by procedure placement. Pettis and Hansen give in [1] an algorithm that reorders procedures in memory by aggregating them in a greedy fashion. The Gloy and Smith algorithm [2] greatly decreases the number of con ict-misses but increases the code size by allowing gaps between procedures. The latter contains two main stages: the cache-placement phase assigns modulo addresses to minimizes cache-conflicts; the memory-placement phase assigns final memory addresses under the modulo placement constraints, and minimizes the code size expansion. In this paper: (1) we state the NP-completeness of the cache-placement problem; (2) we provide an optimal algorithm to the memory-placement problem with complexity O(n min(n; L) log* (n)) (n is the number of procedures, L the cache size); (3) we take final program size into consideration during the cache-placement phase. Our modifications to the Gloy and Smith algorithm gives on average a code size expansion of 8% over the original program size, while the initial algorithm gave an expansion of 177%. The cache miss reduction is nearly the same as the Gloy and Smith solution with 35% cache miss reduction.

compiler construction | 2011

Dynamic elimination of overflow tests in a trace compiler

Rodrigo Sol; Christophe Guillon; Fernando Magno Quintão Pereira; Mariza Andrade da Silva Bigonha

Trace compilation is a technique used by just-in-time (JIT) compilers such as TraceMonkey, the JavaScript engine in the Mozilla Firefox browser. Contrary to traditional JIT machines, a trace compiler works on only part of the source program, normally a linear path inside a heavily executed loop. Because the trace is compiled during the interpretation of the source program the JIT compiler has access to runtime values. This observation gives the compiler the possibility of producing binary code specialized to these values. In this paper we explore such opportunity to provide an analysis that removes unnecessary overflow tests from JavaScript programs. Our optimization uses range analysis to show that some operations cannot produce overflows. The analysis is linear in size and space on the number of instructions present in the input trace, and it is more effective than traditional range analyses, because we have access to values known only at execution time. We have implemented our analysis on top of Firefoxs TraceMonkey, and have tested it on over 1000 scripts from several industrial strength benchmarks, including the scripts present in the top 100 most visited webpages in the Alexa index. We generate binaries to either x86 or the embedded microprocessor ST40-300. On the average, we eliminate 91.82% of the overflows in the programs present in the TraceMonkey test suite. This optimization provides an average code size reduction of 8.83% on ST40 and 6.63% on x86. Our optimization increases TraceMonkeys runtime by 2.53%.

design automation conference | 2010

Compilation and virtualization in the HiPEAC vision

Christian Bertin; Christophe Guillon; Koen De Bosschere

This paper describes the HiPEAC vision of embedded virtualization as it has developed during two years of discussion among the members of the HiPEAC cluster on binary translation and virtualization. We start from system virtualization and process virtualization and we gradually develop a vision in which the two merge into one virtualization layer for embedded systems. Such a unified virtualization offers solutions for consolidation, performance optimization, software engineering and dealing with legacy hardware components. Four adoption requirements are identified: support for real-time execution, low performance overhead, virtualization of accelerator cores and finally trustworthiness. Finally, we define four research challenges: full virtualization of heterogeneous multi-core platforms, portable performance for heterogeneous multi-cores, virtual machine management interfaces, and standards for embedded virtualization.

software and compilers for embedded systems | 2011

Decoupled graph-coloring register allocation with hierarchical aliasing

André Luiz Camargos Tavares; Quentin Colombet; Mariza Andrade da Silva Bigonha; Christophe Guillon; Fernando Magno Quintão Pereira; Fabrice Rastello

Recent results have shown how to do graph-coloring-based register allocation in a way that decouples spilling from register assignment. This decoupled approach has the main advantage of simplifying the implementation of register allocators. However, the decoupled model, as described in previous works, faces many problems when dealing with register aliasing, a phenomenon typical in architectures usually seen in embedded systems, such as ARM. In this paper we introduce the semi-elementary form, a program representation that brings decoupled register allocation to architectures with register aliasing. The semi-elementary form is much smaller than program representations used by previous decoupled solutions; thus, leading to register allocators that perform better in terms of time and space. Furthermore, this representation reduces the number of copies that traditional allocators insert into assembly programs. We have empirically validated our results by showing how our representation improves two well known graph coloring based allocators, namely the Iterated Register Coalescer (IRC), and Bouchez et al.s brute force (BF) method, both augmented with Smith et al. extensions to handle aliasing. Running our techniques on SPEC CPU 2000, we have reduced the number of nodes in the interference graphs by a factor of 4 to 5; hence, speeding-up allocation time by a factor of 3 to 5. Additionally the semi-elementary form reduces by 8% the number of copies that IRC leaves uncoalesced.

Journal of Embedded Computing | 2005