Guillaume Huard | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guillaume Huard is active.

Explore More

Publication

Featured researches published by Guillaume Huard.

cluster computing and the grid | 2005

A batch scheduler with high level components

Nicolas Capit; G. Da Costa; Yiannis Georgiou; Guillaume Huard; Cyrille Martin; G. Mounie; Pierre Neyron; Olivier Richard

In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for resource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan grid) and has shown good efficiency and robustness.

high performance distributed computing | 2009

TakTuk, adaptive deployment of remote executions

Benoit Claudel; Guillaume Huard; Olivier Richard

This article deals with TakTuk, a middleware that deploys efficiently parallel remote executions on large scale grids (thousands of nodes). This tool is mostly intended for interactive use: distributed machines administration and parallel applications development. Thus, it has to minimize the time required to complete the whole deployment process. To achieve this minimization, we propose and validate a remote execution deployment model inspired by the real world behavior of standard remote execution protocols (rsh and ssh). From this model and from existing works in networking, we deduce an optimal deployment algorithm for the homogeneous case. Unfortunately, this optimal algorithm does not translate directly to the heterogeneous case. Therefore, we derive from the theoretical solution a heuristic based on dynamic work-stealing that adapts to heterogeneities (processors, links, load, ...). The underlying principle of this heuristic is the same as the principle of the optimal algorithm: to deploy nodes as soon as possible. Experiments assess TakTuk efficiency and show that TakTuk scales well to thousands of nodes. Compared to similar tools, TakTuk ranks among the best performers while offering more features and versatility. In particular, TakTuk is the only tool really suited to remote executions deployment on grids or more heterogeneous platforms.

Future Generation Computer Systems | 2010

Triva: Interactive 3D visualization for performance analysis of parallel applications

Lucas Mello Schnorr; Guillaume Huard; Philippe Olivier Alexandre Navaux

The successful execution of parallel applications in grid infrastructures depends directly on a performance analysis that takes into account the grid characteristics, such as the network topology and resources location. This paper presents Triva, a software analysis tool that implements a novel technique to visualize the behavior of parallel applications. The proposed technique explores 3D graphics to show the application behavior together with a description of the resources, highlighting communication patterns, the network topology and a visual representation of a logical organization of the resources. We have used a real grid infrastructure to execute and trace applications composed of thousands of processes.

international conference on parallel processing | 2006

SCAN: a heuristic for near-optimal software pipelining

Florent Blachot; Benoit Dupont de Dinechin; Guillaume Huard

Software pipelining is a classic compiler optimization that improves the performances of inner loops on instruction-level parallel processors. In the context of embedded computing, applications are compiled prior to manufacturing the system, so it is possible to invest large amounts of time for compiler optimizations. Traditionally, software pipelining is performed by heuristics such as iterative modulo scheduling. Optimal software pipelining can be formulated as integer linear programs, however these formulations can take exponential time to solve. As a result, the size of loops that can be optimally software pipelined is quite limited. In this article, we present the SCAN heuristic, which enables to benefit from the integer linear programming formulations of software pipelining even on loops of significant size. The principle of the SCAN heuristic is to iteratively constrain the software pipelining problem until the integer linear programming formulation is solvable in reasonable time. We applied the SCAN heuristic to a multimedia benchmark for the ST200 VLIW processor. We show that it almost always compute an optimal solution for loops that are intractable by classic integer linear programming approaches. This improves performances by up to 33.3% over the heuristic modulo scheduling of the production ST200 compiler.

international conference on supercomputing | 2011

Controlling cache utilization of HPC applications

Swann Perarnau; Marc Tchiboukdjian; Guillaume Huard

This paper discusses the use of software cache partitioning techniques to study and improve cache behavior of HPC applications. Most existing studies use this partitioning to solve quality of service issues, like fair distribution of a shared cache among running processes. We believe that, in the HPC context of a single application being studied/optimized on the system, with a single thread per core, cache partitioning can be used in new and interesting ways. First, we propose an implementation of software cache partitioning using the well known page coloring technique. This implementation differs from existing ones by giving control of the partitioning to the application programmer. Developed on the most popular OS in HPC (Linux), this cache control scheme has low overhead both in memory and CPU while being simple to use. Second, we illustrate how this user-controlled cache partitioning can lead to efficient measurements of cache behavior of a parallel scientific visualization application. While current tools require expensive binary instrumentation of an application to obtain its working sets, our method only needs a few unmodified runs on the target platform. Finally, we discuss the use of our scheme to optimize memory intensive applications by isolating each of their critical data structures into dedicated cache partitions. This isolation allows the analysis of each structure cache requirements and leads to new and significant optimization strategies. To the best of our knowledge, no other existing tool enables such tuning of HPC applications.

acm sigplan symposium on principles and practice of parallel programming | 2010

KRASH: reproducible CPU load generation on many cores machines

Swann Perarnau; Guillaume Huard

In this article we present KRASH, a tool for reproducible generation of system-level CPU load. This tool is intended for use in shared memory machines equipped with multiple CPU cores which are usually exploited concurrently by several users. The objective of KRASH is to enable parallel application developers to validate their resources use strategies on a partially loaded machine by replaying an observed load in concurrence with their application. To reach this objective, we present a method for CPU load generation which behaves as realistically as possible: the resulting load is similar to the load that would be produced by concurrent processes run by other users. Nevertheless, contrary to a simple run of a CPU-intensive application, KRASH is not sensitive to system scheduling decisions. The main benefit brought by KRASH is this reproducibility: no matter how many processes are present in the system the load generated by our tool strictly respects a given load profile. This last characteristic proves to be hard to achieve using simple methods because the system scheduler is supposed to share the resources fairly among running processes. Our first contribution is a method that cooperates with the system scheduler to produce a CPU load that conforms to a desired load profile. We argue that this cooperation with the system scheduler is mandatory in the generator to reach a good reproducibility, a high precision and a low intrusiveness. Taking advantage of Linux kernel capabilities, we implemented this method in KRASH (Kernel for Reproduction and Analysis of System Heterogeneity). We have run experiments that show that KRASH provides a precise reproduction of the desired load and that it induces a very low overhead on the system. Our second contribution is a qualitative and quantitative study that compares KRASH to other tools dealing with system-level CPU load generation. To our knowledge, KRASH is the only tool that implements the generation of a dynamic load profile (a load varying with time). When used to generate a constant load, KRASH result is among the most realistic ones. Furthermore, KRASH provides more flexibility than other tools.

cluster computing and the grid | 2009

Towards Visualization Scalability through Time Intervals and Hierarchical Organization of Monitoring Data

Lucas Mello Schnorr; Guillaume Huard; Philippe Olivier Alexandre Navaux

Highly distributed systems such as Grids are used today to the execution of large-scale parallel applications. The behavior analysis of these applications is not trivial. The complexity appears because of the event correlation among processes, external influences like time-sharing mechanisms and saturation of network links, and also the amount of data that registers the application behavior. Almost all visualization tools to analysis of parallel applications offer a space-time representation of the application behavior. This paper presents a novel technique that combines traces from grid applications with a treemap visualization of the data. With this combination, we dynamically create an annotated hierarchical structure that represents the application behavior for the selected time interval. The experiments in the grid show that we can readily use our technique to the analysis of large-scale parallel applications with thousands of processes.

parallel computing | 2012

A hierarchical aggregation model to achieve visualization scalability in the analysis of parallel applications

Lucas Mello Schnorr; Guillaume Huard; Philippe Olivier Alexandre Navaux

The analysis of large-scale parallel applications today has several issues, such as the observation and identification of unusual behavior of processes, expected state of the application, and so on. Performance visualization tools offer a wide spectrum of techniques to visually analyze the monitoring data collected from these applications. The problem is that most of the techniques were not conceived to deal with a high number of processes, in large-scale scenarios. A common example for that is the space-time view, largely used in the performance visualization area, but limited on how much data can be analyzed at the same time. The work presented in this article addresses the problem of visualization scalability in the analysis of parallel applications, through a combination of a temporal integration technique, an aggregation model and treemap representations. Results show that our approach can be used to analyze applications composed of several thousands of processes in large-scale and dynamic scenarios.

grid computing | 2008

3D approach to the visualization of parallel applications and Grid monitoring information

Lucas Mello Schnorr; Guillaume Huard; Philippe Olivier Alexandre Navaux

Parallel computing is increasingly used to provide more performance to applications that need tremendous computational power. The main characteristics of distributed parallel machines are heterogeneity, dynamism and size. They influence directly the way the application and platform monitoring tasks are performed, especially when analyzing a large quantity of information collected in a topologically complex machine. This paper describes our efforts to provide parallel programmers and Grid users a new way to visualize monitoring data. Using graphics in three dimensions and information visualization techniques, we aim at bringing rich topological information to the rendered scene. It results in an immersible and human readable representation of complex monitoring data, suited to Grid environments. We first review known techniques in information visualization context, especially those that address the case of hierarchical information, and we discuss about their use in our context. Then, we propose a new 3D approach that combines the classical space-time visualization of application traces with the representation of the applicationpsilas communication pattern. Finally, we present experimental results obtained through the visualization of parallel applications in our prototype.

international conference on cluster computing | 2014

A spatiotemporal data aggregation technique for performance analysis of large-scale execution traces

Damien Dosimont; Robin Lamarche-Perrin; Lucas Mello Schnorr; Guillaume Huard; Jean-Marc Vincent

Analysts commonly use execution traces collected at runtime to understand the behavior of an application running on distributed and parallel systems. These traces are inspected post mortem using various visualization techniques that, however, do not scale properly for a large number of events. This issue, mainly due to human perception limitations, is also the result of bounded screen resolutions preventing the proper drawing of many graphical objects. This paper proposes a new visualization technique overcoming such limitations by providing a concise overview of the trace behavior as the result of a spatiotemporal data aggregation process. The experimental results show that this approach can help the quick and accurate detection of anomalies in traces containing up to two hundred million events.

Explore More