Is this you? Create Your Porfile

Matthieu Gallet

École normale supérieure de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthieu Gallet is active.

Explore More

Publication

Featured researches published by Matthieu Gallet.

grid computing | 2010

Analysis and modeling of time-correlated failures in large-scale distributed systems

Nezih Yigitbasi; Matthieu Gallet; Derrick Kondo; Alexandru Iosup; Dick H. J. Epema

The analysis and modeling of the failures bound to occur in todays large-scale production systems is invaluable in providing the understanding needed to make these systems fault-tolerant yet efficient. Many previous studies have modeled failures without taking into account the time-varying behavior of failures, under the assumption that failures are identically, but independently distributed. However, the presence of time correlations between failures (such as peak periods with increased failure rate) refutes this assumption and can have a significant impact on the effectiveness of fault-tolerance mechanisms. For example, the performance of a proactive fault-tolerance mechanism is more effective if the failures are periodic or predictable; similarly, the performance of checkpointing, redundancy, and scheduling solutions depends on the frequency of failures. In this study we analyze and model the time-varying behavior of failures in large-scale distributed systems. Our study is based on nineteen failure traces obtained from (mostly) production large-scale distributed systems, including grids, P2P systems, DNS servers, web servers, and desktop grids. We first investigate the time correlation of failures, and find that many of the studied traces exhibit strong daily patterns and high autocorrelation. Then, we derive a model that focuses on the peak failure periods occurring in real large-scale distributed systems. Our model characterizes the duration of peaks, the peak inter-arrival time, the inter-arrival time of failures during the peaks, and the duration of failures during peaks; we determine for each the best-fitting probability distribution from a set of several candidate distributions, and present the parameters of the (best) fit. Last, we validate our model against the nineteen real failure traces, and find that the failures it characterizes are responsible on average for over 50% and up to 95% of the downtime of these systems.

european conference on parallel processing | 2010

A model for space-correlated failures in large-scale distributed systems

Matthieu Gallet; Nezih Yigitbasi; Bahman Javadi; Derrick Kondo; Alexandru Iosup; Dick H. J. Epema

Distributed systems such as grids, peer-to-peer systems, and even Internet DNS servers have grown significantly in size and complexity in the last decade. This rapid growth has allowed distributed systems to serve a large and increasing number of users, but has also made resource and system failures inevitable. Moreover, perhaps as a result of system complexity, in distributed systems a single failure can trigger within a short time span several more failures, forming a group of time-correlated failures. To eliminate or alleviate the significant effects of failures on performance and functionality, the techniques for dealing with failures require good failure models. However, not many such models are available, and the available models are valid for few or even a single distributed system. In contrast, in this work we propose a model that considers groups of time-correlated failures and is valid for many types of distributed systems. Our model includes three components, the group size, the group inter-arrival time, and the resource downtime caused by the group. To validate this model, we use failure traces corresponding to fifteen distributed systems. We find that space-correlated failures are dominant in terms of resource downtime in seven of the fifteen studied systems. For each of these seven systems, we provide a set of model parameters that can be used in research studies or for tuning distributed systems. Last, as a result of our work six of the studied traces have been made available through the Failure Trace Archive (http://fta.inria.fr).

european conference on parallel processing | 2010

Non-clairvoyant scheduling of multiple bag-of-tasks applications

Henri Casanova; Matthieu Gallet; Frédéric Vivien

The bag-of-tasks application model, albeit simple, arises in many application domains and has received a lot of attention in the scheduling literature. Previous works propose either theoretically sound solutions that rely on unrealistic assumptions, or ad-hoc heuristics with no guarantees on performance. This work attempts to bridge this gap through the design of non-clairvoyant heuristics based on solid theoretical foundations. The performance achieved by these heuristics is studied via simulations in a view to comparing them both to previously proposed solutions and to theoretical upper bounds on achievable performance. Also, an interesting theoretical result in this work is that a straightforward on-demand heuristic delivers asymptotically optimal performance when the communications or the computations can be neglected.

international parallel and distributed processing symposium | 2009

Efficient scheduling of task graph collections on heterogeneous resources

Matthieu Gallet; Loris Marchal; Frédéric Vivien

In this paper, we focus on scheduling jobs on computing Grids. In our model, a Grid job is made of a large collection of input data sets, which must all be processed by the same task graph or workflow, thus resulting in a collection of task graphs problem. We are looking for a competitive scheduling algorithm not requiring complex control. We thus only consider single-allocation strategies. In addition to a mixed linear programming approach to find an optimal allocation, we present different heuristic schemes. Then, using simulations, we compare the performance of our different heuristics to the performance of a classical scheduling policy in Grids, HEFT. The results show that some of our static-scheduling policies take advantage of their platform and application knowledge and outperform HEFT, especially under communication-intensive scenarios. In particular, one of our heuristics, DELEGATE, almost always achieves the best performance while having lower running times than HEFT.

acm symposium on parallel algorithms and architectures | 2010

Computing the throughput of probabilistic and replicated streaming applications

Anne Benoit; Fanny Dufossé; Matthieu Gallet; Yves Robert; Bruno Gaujal

In this paper, we investigate how to compute the throughput of probabilistic and replicated streaming applications. We are given (i) a streaming application whose dependence graph is a linear chain; (ii) a one-to-many mapping of the application onto a fully heterogeneous target, where a processor is assigned at most one application stage, but where a stage can be replicated onto a set of processors; and (iii) a set of IID (Independent and Identically-Distributed) variables to model each computation and communication time in the mapping. How can we compute the throughput of the application, i.e., the rate at which data sets can be processed? We consider two execution models, the STRICT model where the actions of each processor are sequentialized, and the OVERLAP model where a processor can compute and communicate in parallel. The problem is easy when application stages are not replicated, i.e., assigned to a single processor: in that case the throughput is dictated by the critical hardware resource. However, when stages are replicated, i.e., assigned to several processors, the problem becomes surprisingly complicated: even in the deterministic case, the optimal throughput may be lower than the smallest internal resource throughput. To the best of our knowledge, the problem has never been considered in the probabilistic case. The first main contribution of the paper is to provide a general method (although of exponential cost) to compute the throughput when mapping parameters follow IID exponential laws. This general method is based upon the analysis of timed Petri nets deduced from the application mapping; it turns out that these Petri nets exhibit a regular structure in the OVERLAP model, thereby enabling to reduce the cost and provide a polynomial algorithm. The second main contribution of the paper is to provide bounds for the throughput when stage parameters are arbitrary IID and NBUE (New Better than Used in Expectation) variables: the throughput is bounded from below by the exponential case and bounded from above by the deterministic case.

international conference on parallel and distributed systems | 2008

Allocating Series of Workflows on Computing Grids

Matthieu Gallet; Loris Marchal; Frédéric Vivien

In this paper, we focus on scheduling jobs on computing Grids. In our model, a Grid job is made of a large collection of input data sets, which must all be processed by the same task graph or workflow, thus resulting in a series of workflow problem. We are looking for an efficient solution with regard to throughput and latency, while avoiding solutions requiring complex control. We thus only consider single-allocation strategies. We present an algorithm based on mixed linear programming to find an optimal allocation, and this for different routing policies depending on how much latitude we have on communications. Then, using simulations, we compare our allocations to reference heuristics. The results show that our algorithm almost always finds an allocation with good throughput and low latency, and that it outperforms the reference heuristics, especially under communication-intensive scenarios.

Introduction to scheduling, 2017, ISBN 978-1-138-11772-3, págs. 187-218 | 2017

Divisible Load Scheduling

Matthieu Gallet; Yves Robert; Frédéric Vivien

Algorithmica | 2014

Computing the Throughput of Probabilistic and Replicated Streaming Applications

Anne Benoit; Matthieu Gallet; Bruno Gaujal; Yves Robert

Archive | 2009

Algorithms - Scheduling I On Scheduling Dags to Maximize Area

Gennaro Cordasco; Arnold L. Rosenberg; Matthieu Gallet; Loris Marchal; Frédéric Vivien; Anne Benoit; Yves Robert; F. Vivien; Fanny Dufossé; Gregory M. Striemer; Ali Akoglu; John Paul Walters; Vidyananth Balu; Suryaprakash Kompalli; Vipin Chaudhary; Rohan Darole; Michael Boyer; David Tarjan; Scott T. Acton; Kevin Skadron

Archive | 2007

Scientific Foundations - Scheduling Strategies and Algorithm Design for Heterogeneous Platforms

Anne Benoit; Leila Ben Saad; Sékou Diakité; Alexandru Dobrila; Fanny Dufossé; Matthieu Gallet; Mathias Jacquelin; Loris Marchal; Jean-Marc Nicod; Laurent Philippe; Veronika Rehn-Sonigo; Paul Renaud-Goud; Clément Rezvoy; Yves Robert; Bernard Tourancheau; Frédéric Vivien

Explore More