Benoît Pradelle
Versailles Saint-Quentin-en-Yvelines University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Benoît Pradelle.
Computer Science - Research and Development | 2014
Abdelhafid Mazouz; Alexandre Laurent; Benoît Pradelle; William Jalby
Dynamic Voltage and Frequency Scaling (DVFS) has appeared as one of the most important techniques to reduce energy consumption in computing systems. The main idea exploited by DVFS controllers is to reduce the CPU frequency in memory-bound phases, usually significantly reducing the energy consumption. However, depending on the CPU model, transitions between CPU frequencies may imply varying delays. Such delays are often optimistically ignored in DVFS controllers, whereas their knowledge could enhance the quality of frequency setting decisions.The current article presents an experimental study on the measurement of frequency transition latencies. The measurement methodology is presented accompanied with evaluations on three Intel machines, reflecting three distinct micro-architectures. In overall, we show for our experimental setup that, while changing CPU frequency upward leads to higher transition delays, changing it downward leads to smaller or similar transition delays across the set of available frequencies.
ieee international conference on green computing and communications | 2013
Jean-Philippe Halimi; Benoît Pradelle; Amina Guermouche; Nicolas Triquenaux; Alexandre Laurent; Jean Christophe Beyler; William Jalby
Several solutions are considered to reduce energy consumption of computers. Among them, Dynamic Voltage and Frequency Scaling (DVFS) emerged as an effective way to enhance energy efficiency by adapting processor frequency to workloads. We propose Forest, a new DVFS controller designed to efficiently exploit the recent technologies introduced in processors. Forest is a runtime DVFS controller able to estimate the energy savings it can achieve from power gains, evaluated offline using power probes embedded in modern CPUs, and speedups measured at runtime for the current workload. It does not use any performance model but rather directly measures the effect of frequency transitions on energy. Using such methodology, Forest can achieve energy savings on the whole system under user-defined slowdown constraints. In our experiments, Forest is able to achieve more than 39% CPU energy savings compared to the default Linux DVFS controller, with a slowdown below 5%, as requested by the user.
International Green Computing Conference | 2014
Jean-Philippe Halimi; Benoît Pradelle; Amina Guermouche; William Jalby
Dynamic Voltage and Frequency Scaling (DVFS) is commonly used to save energy in computing systems. However, when it comes to parallel programs, existing DVFS controllers only reduce frequency while or before waiting in blocking communications. As a consequence, energy savings are only possible for the program tasks out of the critical path and when the workload is imbalanced. We propose a new runtime DVFS controller, FoREST-mn. It allows to take advantage of both the low CPU usage of some program phases as well as communication slack to save more energy with parallel programs. The DVFS control then becomes more complex, but energy savings are even obtained when the workload is balanced. The resulting slowdown on programs is carefully controlled and constrained by a user-defined threshold. We implemented the presented strategies and evaluated it on 4 compute nodes totaling 64 cores. FoREST-mn is able to perform significant CPU energy savings on the NAS programs, up to 34 % on MG, while efficiently bounding the resulting slowdown.
international conference on parallel processing | 2016
Benoît Pradelle; Benoît Meister; Muthu Manikandan Baskaran; Athanasios Konstantinidis; Thomas Henretty; Richard Lethin
Computers across the board, from embedded to future exascale computers, are consistently designed with deeper memory hierarchies. While this opens up exciting opportunities for improving software performance and energy efficiency, it also makes it increasingly difficult to efficiently exploit the hardware. Advanced compilation techniques are a possible solution to this difficult problem and, among them, the polyhedral compilation technology provides a pathway for performing advanced automatic parallelization and code transformations. However, the polyhedral model is also known for its poor scalability with respect to the number of dimensions in the polyhedra that are used for representing programs. Although current compilers can cope with such limitation when targeting shallow hierarchies, polyhedral optimizations often become intractable as soon as deeper hardware hierarchies are considered. We address this problem by introducing two new operators for polyhedral compilers: focalisation and defocalisation. When applied in the compilation flow, the new operators reduce the dimensionality of polyhedra, which drastically simplifies the mathematical problems solved during the compilation. We prove that the presented operators preserve the original program semantics, allowing them to be safely used in compilers. We implemented the operators in a production compiler, which drastically improved its performance and scalability when targeting deep hierarchies.
european conference on parallel processing | 2014
Abdelhafid Mazouz; Benoît Pradelle; William Jalby
Achieving or proving energy efficiency necessarily relies on the ability to perform power measurements at some point. In order to simplify power measurements at the CPU level, recent processors support model-based energy accounting interfaces such as Intel RAPL or AMD APM. Though such interfaces are an attractive option for energy characterization, their accuracy and reliability has to be verified before using them. We propose a new statistical validation methodology for CPU power estimators that does not require any complex hardware system instrumentation. The methodology only relies on a single full-system AC power meter and is able to make statistically relevant decisions about the probes reliability. We also present an experimental evaluation using two Intel machines equipped with a RAPL interface and investigate the impact of multiple parameters such as the CPU frequency or the number of active cores on the probe accuracy.
2013 International Green Computing Conference Proceedings | 2013
Nicolas Triquenaux; Alexandre Laurent; Benoît Pradelle; Jean Christophe Beyler; William Jalby
Engineers obsessed in the past with augmenting application performance. Nowadays, as the energy price tag and ecological concerns rise, administrators and engineers are considering the energy consumption metric as new optimization criteria. In order to decrease energy consumption, one known solution is Dynamic Voltage Frequency Scaling (DVFS). DVFS allows users to modify the CPU frequency in order to reduce energy consumption. Although the technique is efficient, its real potential is often difficult to predict. The following paper presents UtoPeak, a tool able to compute the sequence of frequencies providing the lowest energy consumption for a given program execution. Associated to the ideal frequency sequence, Utopeak predicts, ahead of time, the potential impact of DVFS on energy consumption. The paper shows how Utopeak is able to predict the potential gains for known benchmark suites. Utopeaks accuracy in predicting the best energy consumption is 95.6% in average on the benchmark suites SPEC2006 and NAS.
ieee high performance extreme computing conference | 2017
Muthu Manikandan Baskaran; Tom Henretty; Benoît Pradelle; M. Harper Langston; David Bruns-Smith; Richard Lethin
Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems. In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 11× reduction in memory usage and up to 7× improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.
ieee high performance extreme computing conference | 2016
Benoît Pradelle; Muthu Manikandan Baskaran; Thomas Henretty; Benoît Meister; Athanasios Konstantinidis; Richard Lethin
In the last decade, the scope of software optimizations expanded to encompass energy consumption on top of the classical runtime minimization objective. In that context, several optimizations have been developed to improve the software energy efficiency. However, these optimizations commonly rely on long profiling steps and are often implemented as unstable runtime systems, which limits their applicability. We propose in this paper a new energy model and two associated energy optimizations that can be performed at compilation time, without having to profile the compiled programs. The model predicts the energy consumption of programs at compilation time using the precise software representation available in the polyhedral model. The energy model is then used at the core of two compiler optimizations to generate more efficient programs. The model and the two optimizations have been implemented in the R-Stream compiler.
Proceedings of the 5th Workshop on Extreme-Scale Programming Tools | 2016
Muthu Manikandan Baskaran; Benoît Pradelle; Benoît Meister; Athanasios Konstantinidis; Richard A. Lethin
Hardware scaling and low-power considerations associated with the quest for exascale and extreme scale computing are driving system designers to consider new runtime and execution models such as the event-driven-task (EDT) models that enable more concurrency and reduce the amount of synchronization. Further, for performance, productivity, and code sustainability reasons, there is an increasing demand for auto-parallelizing compiler technologies to automatically produce code for EDT-based runtimes. However achieving scalable performance in extreme-scale systems with auto-generated codes is a non-trivial challenge. Some of the key requirements that are important for achieving good scalable performance across many EDT-based systems are: (1) scalable dynamic creation of task-dependence graph and spawning of tasks, (2) scalable creation and management of data and communications, and (3) dynamic scheduling of tasks and movement of data for scalable asynchronous execution. In this paper, we develop capabilities within R-Stream - an automatic source-to-source optimization compiler - for automatic generation and optimization of code and data management targeted towards Open Community Runtime (OCR) - an exascale-ready asynchronous task-based runtime. We demonstrate the effectiveness of our techniques through performance improvements on various benchmarks and proxy application kernels that are relevant to the extreme-scale computing community.
Computer Science - Research and Development | 2014
Benoît Pradelle; Nicolas Triquenaux; Jean Christophe Beyler; William Jalby