Olivier Beaumont
University of Bordeaux
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Olivier Beaumont.
international conference on parallel processing | 2011
Olivier Beaumont; Lionel Eyraud-Dubois; Young Won
Several Network Coordinate Systems have been proposed to predict unknown network distances between a large number of Internet nodes by using only a small number of measurements. These systems focus on predicting latency, and they are not adapted to the prediction of available bandwidth. But end-to-end path available bandwidth is an important metric for the performance optimisation in many high throughput distributed applications, such as video streaming and file sharing networks. In this paper, we propose to perform available bandwidth prediction with the last-mile model, in which each node is characterised by its incoming and outgoing capacities. This model has been used in several theoretical works for distributed applications. We design decentralised heuristics to compute the capacities of each node so as to minimise the prediction error. We show that our algorithms can achieve a competitive accuracy even with asymmetric and erroneous end-to-end measurement datasets. A comparison with existing models (Vivaldi, Sequoia, PathGuru, DMF) is provided. Simulation results also show that our heuristics can provide good quality predictions even when using a very small number of measurements.
international parallel and distributed processing symposium | 2013
Olivier Beaumont; Lionel Eyraud-Dubois; Hubert Larchevêque
We consider several reliability problems that arise when allocating applications to processing resources in a Cloud computing platform. More specifically, we assume on the one hand that each computing resource is associated to a capacity constraint and to a probability of failure. On the other hand, we assume that each service runs as a set of independent instances of identical Virtual Machines, and that the Service Level Agreement between the Cloud provider and the client states that a minimal number of instances of the service should run with a given probability. In this context, given the capacity and failure probabilities of the machines, and the capacity and reliability demands of the services, the question for the cloud provider is to find an allocation of the instances of the services (possibly using replication) onto machines satisfying all types of constraints during a given time period. In this paper, our goal is to assess the impact of the reliability constraint on the complexity of resource allocation problems. We consider several variants of this problem, depending on the number of services and whether their reliability demand is individual or global. We prove several fundamental complexity results (#P and NP-completeness results) and we provide several optimal and approximation algorithms. In particular, we prove that a basic randomized allocation algorithm, that is easy to implement, provides optimal or quasi-optimal results in several contexts, and we show through simulations that it also achieves very good results in more general settings.
international parallel and distributed processing symposium | 2016
Emmanuel Agullo; Olivier Beaumont; Lionel Eyraud-Dubois; Suraj Kumar
Our goal is to provide an analysis and comparison of static and dynamic strategies for task graph scheduling on platforms consisting of heterogeneous and unrelated resources, such as GPUs and CPUs. Static scheduling strategies, that have been used for years, suffer several weaknesses. First, it is well known that underlying optimization problems are NP-Complete, what limits the capability of finding optimal solutions to small cases. Second, parallelism into processing nodes makes it difficult to precisely predict the performance of both communications and computations, due to shared resources and co-scheduling effects. Recently, to cope with this limitations, many dynamic task-graph based runtime schedulers (StarPU, StarSs, QUARK, PaRSEC, ) have been proposed. Dynamic schedulers base their allocation and scheduling decisions on the one side on dynamic information such as the set of available tasks, the location of data and the state of the resources and on the other hand on static information such as task priorities computed from the whole task graph. Our analysis is deep but we concentrate on a single kernel, namely Cholesky factorization of dense matrices on platforms consisting of GPUs and CPUs. This application encompasses many important characteristics in our context. Indeed it consists in a phase where the number of available tasks if large, where the careful use of resources is critical, and in a phase with few tasks available, where the choice of the task to be executed is crucial. In this paper, we analyze the performance of static and dynamic strategies and we propose a set of intermediate strategies, by adding more static (resp. dynamic) features into dynamic (resp. static) strategies. Our conclusions are somehow unexpected in the sense that we prove that static-based strategies are very efficient, even in a context where performance estimations are not very good.
international parallel and distributed processing symposium | 2012
Olivier Beaumont; Nicolas Bonichon; Ludovic Courtès; Eelco Dolstra; Xavier Hanin
In this paper, we consider the problem of scheduling a special kind of mixed data-parallel applications arising in the context of Continuous Integration. Continuous integration (CI) is a software engineering technique, which consists in re-building and testing interdependent software components as soon as developers modify them. The CI tool is able to provide quick feedback to the developers, which allows them to fix the bug soon after it has been introduced. The CI process can be described as a DAG where nodes represent package build tasks, and edges represent dependencies among these packages, build tasks themselves can in turn be run in parallel. Thus, CI can be viewed as a mixed data-parallel application. A crucial point for a successful CI process is its ability to provide quick feedback. Thus, make span minimization is the main goal. Our contribution is twofold. First we provide and analyze a large dataset corresponding to a build DAG. Second, we compare the performance of several scheduling heuristics on this dataset.
international parallel and distributed processing symposium | 2013
Olivier Beaumont; Hubert Larchevêque; Loris Marchal
Divisible Load Theory (DLT) has received a lot of attention in the past decade. A divisible load is a perfect parallel task, that can be split arbitrarily and executed in parallel on a set of possibly heterogeneous resources. The success of DLT is strongly related to the existence of many optimal resource allocation and scheduling algorithms, what strongly differs from general scheduling theory. Moreover, recently, close relationships have been underlined between DLT, that provides a fruitful theoretical framework for scheduling jobs on heterogeneous platforms, and MapReduce, that provides a simple and efficient programming framework to deploy applications on large scale distributed platforms. The success of both have suggested to extend their framework to non-linear complexity tasks. In this paper, we show that both DLT and MapReduce are better suited to workloads with linear complexity. In particular, we prove that divisible load theory cannot directly be applied to quadratic workloads, such as it has been proposed recently. We precisely state the limits for classical DLT studies and we review and propose solutions based on a careful preparation of the dataset and clever data partitioning algorithms. In particular, through simulations, we show the possible impact of this approach on the volume of communications generated by MapReduce, in the context of Matrix Multiplication and Outer Product algorithms.
international parallel and distributed processing symposium | 2010
Olivier Beaumont; Lionel Eyraud-Dubois; Shailesh Kumar Agrawal
We consider the classical problem of broadcasting a large message at an optimal rate in a large scale distributed network under the multi-port communication model. In this context, we are interested in both building an overlay network and providing an explicit algorithm for scheduling the communications. From an optimization point of view, we aim both at maximizing the throughput (i.e., the rate at which nodes receive the message) and minimizing the degree of the participating nodes, i.e., the number of TCP connections they must handle simultaneously. The main novelties of our approach are the introduction of this degree constraint and the classification of the set of participating nodes into two parts: open nodes that stay in the open-Internet and “guarded” nodes that lie behind firewalls or NATs. Two guarded nodes cannot communicate directly, but rather need to use an open node as a gateway for transmitting a message. In the case without guarded nodes, we prove that it is possible to reach the optimal throughput, at the price of a quasi-optimal (up to a small additive increase) degree of the participating nodes. In presence of guarded nodes, our main contributions are a closed form formula for the optimal cyclic throughput and the proof that the optimal solution may require arbitrarily large degrees. In the acyclic case, we propose an algorithm that reaches the optimal throughput with low degree. Then, we prove a worst case ratio between the optimal acyclic and cyclic throughput and show through simulations that this ratio is on average very close to 1, what makes acyclic solutions efficient both in terms of throughput maximization and degree minimization.
SIROCCO '08 Proceedings of the 15th international colloquium on Structural Information and Communication Complexity | 2008
Olivier Beaumont; Nicolas Bonichon; Philippe Duchon; Hubert Larchevêque
In this paper, we consider the clustering of resources on large scale platforms. More precisely, we target parallel applications consisting of independant tasks, where each task is to be processed on a different cluster. In this context, each cluster should be large enough so as to hold and process a task, and the maximal distance between two hosts belonging to the same cluster should be small in order to minimize latencies of intra-cluster communications. This corresponds to maximum bin covering with an extra distance constraint. We describe a distributed approximation algorithm that computes resource clustering with coordinates in i¾? in O(log2n) steps and O(nlogn) messages, where nis the overall number of hosts. We prove that this algorithm provides an approximation ratio of
IEEE Transactions on Parallel and Distributed Systems | 2014
Olivier Beaumont; Nicolas Bonichon; Lionel Eyraud-Dubois; Przemyslaw Uznanski; Shailesh Kumar Agrawal
frac{1}{3}
international parallel and distributed processing symposium | 2010
Olivier Beaumont; Hejer Rejeb
.
ieee international conference on high performance computing, data, and analytics | 2010
Olivier Beaumont; Arnold L. Rosenberg
We consider the problem of broadcasting a large message in a large scale distributed network under the multi-port communication model. We are interested in building an overlay network, with the aim of maximizing the throughput and minimizing the degree of the participating nodes. We consider a classification of participating nodes into two parts: open nodes that stay in the open-Internet and guarded nodes that lie behind firewalls or NATs, with the constraint that two guarded nodes cannot communicate directly. Without guarded nodes, we prove that it is possible to reach the optimal throughput with a quasi-optimal (up to a small additive increase) degree of the participating nodes. In presence of guarded nodes, we provide a closed form formula for the optimal cyclic throughput and we observe that the optimal solution may require arbitrarily large degrees. In the acyclic case, we propose an algorithm that reaches the optimal acyclic throughput with low degree. Then, we prove a worst case 5/7 ratio between the optimal acyclic and cyclic throughput and show through simulations that this ratio is on average very close to 1, what makes acyclic solutions efficient both in terms of throughput maximization and degree minimization.