Thomas Herault
University of Paris-Sud
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Herault.
cluster computing and the grid | 2008
Camille Coti; Thomas Herault; Sylvain Peyronnet; Ala Rezmerita; Franck Cappello
Institutional grids consist of the aggregation of clusters belonging to different administrative domains to build a single parallel machine. To run an MPI application over an institutional grid, one has to address many challenges. One of the first problems to solve is the connectivity of the different nodes not belonging to the same administrative domain. Techniques based on communication relays, dynamic port opening, among others, have been proposed. In this work, we propose a set of Grid or Web Services to abstract this connectivity service, and we evaluate the performances of this new level of communication for establishing the connectivity of an MPI application over an experimental grid.
international conference on stabilization safety and security of distributed systems | 2006
Thomas Herault; Pierre Lemarinier; Olivier Peres; Laurence Pilard; Joffroy Beauquier
We introduce a self-stabilizing algorithm that builds and maintains a spanning tree topology on any large scale system. We assume that the existing topology is a complete graph and that nodes may arrive or leave at any time. To cope with the large number of processes of a grid or a peer to peer system, we limit the memory usage of each process to a small constant number of variables, combining this with previous results concerning failure detectors and resource discovery.
2006 International Conference onResearch, Innovation and Vision for the Future | 2006
Akim Demaille; Thomas Herault; Sylvain Peyronnet
Sensor networks are networks consisting of minia- ture and low-cost systems with limited computation power and energy. Thanks to the low cost of the devices, one can spread a huge number of sensors into a given area to monitor, for example, physical changes of the environment. Typical applications are in defense, environment, and design of ad-hoc networks areas. In this paper, we address the problem of verifying the correctness of such networks through a case study. We model a simple sensor network whose aim is to detect an event in a bounded area (such as a fire in a forest). The behavior of the network is probabilistic, so we use Approximate Probabilistic Model Checker (APMC), a tool that allows to approximately check the correctness of extremely large probabilistic systems, to verify it.
Lecture Notes in Computer Science | 2001
Joffroy Beauquier; Thomas Herault; Elad Michael Schiller
The paper presents a technique for achieving stabilization in distributed systems. This technique, called agent-stabilization, uses an external tool, the agent, that can be considered as a special message created by a lower layer. Basically, an agent performs a traversal of the network and if necessary, modifies the local states of the nodes, yielding stabilization.
international conference on cluster computing | 2006
William Hoarau; Pierre Lemarinier; Thomas Herault; Eric Vallejo Rodriguez; Sébastien Tixeuil; Franck Cappello
One of the topics of paramount importance in the development of cluster and grid middleware is the impact of faults since their occurrence in grid infrastructures and in large-scale distributed systems is common. MPI (message passing interface) is a popular abstraction for programming distributed and parallel applications. FAIL (FAult Injection Language) is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investigate the possibility of using FAIL to inject faults in a fault-tolerant MPI implementation. Our middleware, FAIL-MPI, is used to carry quantitative and qualitative faults and stress testing
CoreGRID Workshop - Making Grids Work | 2008
Fatiha Bouabache; Thomas Herault; Gilles Fedak; Franck Cappello
As High Performance platforms (Clusters, Grids, etc.) continue to grow in size, the average time between failures decreases to a critical level. An efficient and reliable fault tolerance protocol plays a key role in High Performance Computing. Rollback recovery is the most common fault tolerance technique used in High Performance Computing and especially in MPI applications. This technique relies on the reliability of the checkpoint storage, most of the rollback recovery protocols assume that the checkpoint servers machines are reliable. However, in a grid environment any unit can fail at any moment, including components used to connect different administrative domains. Such a failure leads to the loss of a whole set of machines, including the more reliable machines used to store the checkpoints in this administrative domain. It is thus not safe to rely on the high MTBF of specific machines to store the checkpoint images.
Modeling and verification of parallel processes. Summer school | 2001
Thomas Herault; Pierre Lemarinier
Archive | 2008
Gilles Fedak; Jean-Patrick Gelas; Thomas Herault; Victor Iniesta; Derrick Kondo; Laurent Lefèvre; Paul Malecot; Lucas Nussbaum; Ala Rezmerita; Olivier Richard
Automatic Verification of Critical Systems | 2006
Michaël Cadilhac; Thomas Herault; Richard Lassaigne; Sylvain Peyronnet; Sébastien Tixeuil
Archive | 2018
George Bosilca; Aurelien Bouteiller; Thomas Herault; Valentin Le Fèvre; Yves Robert; Jack Dongarra