Fabien Coelho
Mines ParisTech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fabien Coelho.
Scientific Programming | 1997
Corinne Ancourt; Fabien Coelho; François Irigoin; Ronan Keryell
High Performance Fortran (HPF) was developed to support data parallel programming for single-instruction multiple-data (SIMD) and multiple-instruction multiple-data (MIMD) machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors, and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode HPF directives and to synthesize distributed code with space-efficient array allocation, tight loop bounds, and vectorized communications for INDEPENDENT loops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, and overlap analysis. The systematic use of an affine framework makes it possible to prove the compilation scheme correct.
international conference on parallel processing | 2013
Karel De Vogeleer; Gerard Memmi; Pierre Jouvelot; Fabien Coelho
This paper provides both theoretical and experimental evidence for the existence of an Energy/Frequency Convexity Rule, which relates energy consumption and CPU frequency on mobile devices. We monitored a typical smartphone running a specific computing-intensive kernel of multiple nested loops written in C using a high-resolution power gauge. Data gathered during a week-long acquisition campaign suggest that energy consumed per input element is strongly correlated with CPU frequency, and, more interestingly, the curve exhibits a clear minimum over a 0.2 GHz to 1.6 GHz window. We provide and motivate an analytical model for this behavior, which fits well with the data. Our work should be of clear interest to researchers focusing on energy usage and minimization for mobile devices, and provide new insights for optimization opportunities.
Electronic Notes in Theoretical Computer Science | 2010
Corinne Ancourt; Fabien Coelho; François Irigoin
Modular static analyzers use procedure abstractions, a.k.a. summarizations, to ensure that their execution time increases linearly with the size of analyzed programs. A similar abstraction mechanism is also used within a procedure to perform a bottom-up analysis. For instance, a sequence of instructions is abstracted by combining the abstractions of its components, or a loop is abstracted using the abstraction of its loop body: fixed point iterations for a loop can be replaced by a direct computation of the transitive closure of the loop body abstraction. More specifically, our abstraction mechanism uses affine constraints, i.e. polyhedra, to specify pre- and post-conditions as well as state transformers. We present an algorithm to compute the transitive closure of such a state transformer, and we illustrate its performance on various examples. Our algorithm is simple, based on discrete differentiation and integration: it is very different from the usual abstract interpretation fixed point computation based on widening. Experiments are carried out using previously published examples. We obtain the same results directly, without using any heuristic.
acm sigplan symposium on principles and practice of parallel programming | 1997
Fabien Coelho
Array remapping are useful to many applications on distributed memory parallel machines. They are available in High Performance Fortran, a Fortran-based data-parallel language. This paper describes techniques to handle dynamic mappings through simple array copies: array remapping are translated into copies between statically mapped distinct versions of the array. It discusses the language restrictions required to do so. The remapping graph which captures all remapping and liveness information is presented, as well as additional data-flow optimizations that can be performed on this graph, so as to avoid useless remapping at run time. Such useless remapping appear for arrays that are not used after a remapping. Live array copies are also kept to avoid other flow-dependent useless remappings. Finally the code generation and runtime required by our scheme are discussed. These techniques are implemented in our prototype HPF compiler.
Journal of Parallel and Distributed Computing | 1996
Fabien Coelho; Corinne Ancourt
Applications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines.hpf(High Performance Fortran) provides such remappings explicitly throughrealignandredistributedirectives and implicitly at procedure calls and returns. However, such features are left out ofhpf2.0 for efficiency reasons. This paper presents a new technique for compilinghpfremappings onto message-passing parallel architectures. First, useless remappings that appear naturally are removed. Second, thespmdgenerated code takes advantage of replication to shorten the remapping time. Communication is proved optimal: a minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in ourhpfcompiler and was experimented on adecAlpha farm.
international conference on embedded computer systems architectures modeling and simulation | 2014
Karel DeVogeleer; Gerard Memmi; Pierre Jouvelot; Fabien Coelho
We introduce and experimentally validate a new macro-level model of the CPU temperature/power relationship within nanometer-scale application processors or system-on-chips. By adopting a holistic view, this model is able to take into account many of the physical effects that occur within such systems. Together with two algorithms described in the paper, our results can be used, for instance by engineers designing power or thermal management units, to cancel the temperature-induced bias on power measurements. This will help them gather temperature-neutral power data while running multiple instance of their benchmarks. Also power requirements and system failure rates can be decreased by controlling the CPUs thermal behavior. Even though it is usually assumed that the temperature/power relationship is exponentially related, there is however a lack of publicly available physical temperature/power measurements to back up this assumption, something our paper corrects. Via measurements on two pertinent platforms sporting nanometer-scale application processors, we show that the power/temperature relationship is indeed very likely exponential over a 20°C to 85°C temperature range. Our data suggest that, for application processors operating between 20°C and 50°C, a quadratic model is still accurate and a linear approximation is acceptable.
european conference on parallel processing | 2002
Youcef Bouchebaba; Fabien Coelho
Our aim is to minimize the electrical energy used during the execution of signal processing applications that are a sequence of loop nests. This energy is mostly used to transfer data among various levels of memory hierarchy. To minimize these transfers, we transform these programs by using simultaneously loop permutation, tiling, loop fusion with shifting and memory reuse. Each input nest uses a stencil of data produced in the previous nest and the references to the same array are equal, up to a shift. All transformations described in this paper have been implemented in pips, our optimizing compiler and cache misses reductions have been measured.
compiler construction | 2003
Thi Viet Nga Nguyen; François Irigoin; Corinne Ancourt; Fabien Coelho
One of the most common programming errors is the use of a variable before its definition. This undefined value may produce incorrect results, memory violations, unpredictable behaviors and program failure. To detect this kind of error, two approaches can be used: compile-time analysis and run-time checking. However, compile-time analysis is far from perfect because of complicated data and control flows as well as arrays with non-linear, indirection subscripts, etc. On the other hand, dynamic checking, although supported by hardware and compiler techniques, is costly due to heavy code instrumentation while information available at compile-time is not taken into account. This paper presents a combination of an efficient compile-time analysis and a source code instrumentation for run-time checking. All kinds of variables are checked by PIPS, a Fortran research compiler for program analyses, transformation, parallelization and verification. Uninitialized array elements are detected by using imported array region, an efficient inter-procedural array data flow analysis. If exact array regions cannot be computed and compile-time information is not sufficient, array elements are initialized to a special value and their utilization is accompanied by a value test to assert the legality of the access. In comparison to the dynamic instrumentation, our method greatly reduces the number of variables to be initialized and to be checked. Code instrumentation is only needed for some array sections, not for the whole array. Tests are generated as early as possible. In addition, programs can be proved to be free from used-before-set errors statically at compile-time or, on the contrary, have real undefined errors. Experiments on SPEC95 CFP show encouraging results on analysis cost and run-time overheads.
symposium on frontiers of massively parallel computation | 1995
Fabien Coelho
The MIMD distributed memory architecture is the choice architecture for massively parallel machines. It insures scalability, but at the expense of programming ease. New languages such as HPF were introduced to solve this problem: the user advises the compiler about data distribution and parallel computations through directives. This paper focuses on the compilation of I/O communications for HPF. Data must be efficiently collected to or updated from I/O nodes with vectorized messages, for any possible mapping. The problem is solved using standard polyhedron scanning techniques. The code generation issues to handle the different cases are addressed. Then the method is improved and extended to parallel I/Os. This work suggests new HPF directives for parallel I/Os.<<ETX>>
international conference on progress in cryptology | 2008
Fabien Coelho
Proof-of-work schemes are economic measures to deter denial-of-service attacks: service requesters compute moderately hard functions the results of which are easy to check by the provider. We present such a new scheme for solution-verification protocols. Although most schemes to date are probabilistic unbounded iterative processes with high variance of the requester effort, our Merkle tree scheme is deterministic with an almost constant effort and null variance, and is computation-optimal.