Bertrand Simon
École normale supérieure de Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bertrand Simon.
symposium on principles of database systems | 2016
Michael A. Bender; Jonathan W. Berry; Rob Johnson; Thomas M. Kroeger; Samuel McCauley; Cynthia A. Phillips; Bertrand Simon; Shikha Singh; David Zage
We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way---a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linear-sized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O((log2 N)/B+ logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Theta(log N) I/Os. This is no better than the bounds one gets from running an in-memory skip list in external memory.
IEEE Transactions on Parallel and Distributed Systems | 2018
Loris Marchal; Bertrand Simon; Oliver Sinnen; Frédéric Vivien
Scientific workloads are often described by Directed Acyclic task Graphs. Indeed, DAGs represent both a theoretical model and the structure employed by dynamic runtime schedulers to handle HPC applications. A natural problem is then to compute a makespan-minimizing schedule of a given graph. In this paper, we are motivated by task graphs arising from multifrontal factorizations of sparse matrices and therefore work under the following practical model. Tasks are malleable (i.e., a single task can be allotted a time-varying number of processors) and their speedup behaves perfectly up to a first threshold, then speedup increases linearly, but not perfectly, up to a second threshold where the speedup levels off and remains constant. After proving the NP-hardness of minimizing the makespan of DAGs under this model, we study several heuristics. We propose model-optimized variants for PropScheduling, widely used in linear algebra application scheduling, and FlowFlex. GreedyFilling is proposed, a novel heuristic designed for our speedup model, and we demonstrate that PropScheduling and GreedyFilling are 2-approximation algorithms. In the evaluation, employing synthetic data sets and task graphs arising from multifrontal factorization, the proposed optimized variants and GreedyFilling significantly outperform the traditional algorithms, whereby GreedyFilling demonstrates a particular strength for balanced graphs.
international parallel and distributed processing symposium | 2017
Loris Marchal; Samuel McCauley; Bertrand Simon; Frédéric Vivien
Scientific applications are usually described as directed acyclic graphs, where nodes represent tasks and edges represent dependencies between tasks. For some applications, such as the multifrontal method of sparse matrix factorization, this graph is a tree: each task produces a single output data, used by a single task (its parent in the tree). We focus on the case when the data manipulated by tasks have a large size, which is especially the case in the multifrontal method. To process a task, both its inputs and its output must fit in the main memory. Moreover, output results of tasks have to be stored between their production and their use by the parent task. It may therefore happen, during an execution, that not all data fit together in memory. In particular, this is the case if the total available memory is smaller than the minimum memory required to process the whole tree. In such a case, some data have to be temporarily written to disk and read afterwards. These Input/Output (I/O) operations are very expensive; hence, the need to minimize them. We revisit this open problem in this paper. Specifically, our goal is to minimize the total volume of I/O while processing a given task tree. We first formalize and generalize known results, then prove that existing solutions can be arbitrarily worse than optimal. Finally, we propose a novel heuristic algorithm, based on the optimal tree traversal for memory minimization. We demonstrate good performance of this new heuristic through simulations on both synthetic trees and realistic trees built from actual sparse matrices.
latin american symposium on theoretical informatics | 2016
Michael A. Bender; Rezaul Alam Chowdhury; Alexander Conway; Pramod Ganapathi; Rob Johnson; Samuel McCauley; Bertrand Simon; Shikha Singh
We revisit classical sieves for computing primes and analyze their performance in the external-memory model. Most prior sieves are analyzed in the RAM model, where the focus is on minimizing both the total number of operations and the size of the working set. The hope is that if the working set fits in RAM, then the sieve will have good I/O performance, though such an outcome is by no means guaranteed by a small working-set size.
fun with algorithms | 2016
Michael A. Bender; Samuel McCauley; Bertrand Simon; Shikha Singh; Frédéric Vivien
This paper formalizes a resource-allocation problem that is all too familiar to the seasoned program-committee member. For each submission j that the PC member has the honor of reviewing, there is a choice. The PC member can spend the time to review submission j in detail on his/her own at a cost of Cj. Alternatively, the PC member can spend the time to identify and contact peers, hoping to recruit them as subreviewers, at a cost of 1 per subreviewer. These potential subreviewers have a certain probability of rejecting each review request, and this probability increases as time goes on. Once the PC member runs out of time or unasked experts, he/she is forced to review the paper without outside assistance. This paper gives optimal solutions to several variations of the scheduling-reviewers problem. Most of the solutions from this paper are based on an iterated log function of Cj. In particular, with k rounds, the optimal solution sends the k-iterated log of Cj requests in the first round, the (k − 1)-iterated log in the second round, and so forth. One of the contributions of this paper is solving this problem exactly, even when rejection probabilities may increase. Naturally, PC members must make an integral number of subreview requests. This paper gives, as an intermediate result, a linear-time algorithm to transform the artificial problem in which one can send fractional requests into the less-artificial problem in which one sends an integral number of requests. Finally, this paper considers the case where the PC member knows nothing about the probability that a potential subreviewer agrees to review the paper. This paper gives an approximation algorithm for this case, whose bounds improve as the number of rounds increases.
european conference on parallel processing | 2015
Abdou Guermouche; Loris Marchal; Bertrand Simon; Frédéric Vivien
Scientific workloads are often described by directed acyclic task graphs. This is in particular the case for multifrontal factorization of sparse matrices—the focus of this paper—whose task graph is structured as a tree of parallel tasks. Prasanna and Musicus [19, 20] advocated using the concept of malleable tasks to model parallel tasks involved in matrix computations. In this powerful model each task is processed on a time-varying number of processors. Following Prasanna and Musicus, we consider malleable tasks whose speedup is \(p^\alpha \), where p is the fractional share of processors on which a task executes, and \(\alpha \) (\(0 < \alpha \le 1\)) is a task-independent parameter. Firstly, we use actual experiments on multicore platforms to motivate the relevance of this model for our application. Then, we study the optimal time-minimizing allocation proposed by Prasanna and Musicus using optimal control theory. We greatly simplify their proofs by resorting only to pure scheduling arguments. Building on the insight gained thanks to these new proofs, we extend the study to distributed (homogeneous or heterogeneous) multicore platforms. We prove the NP-completeness of the corresponding scheduling problem, and we then propose some approximation algorithms.
2014 Joint Rail Conference | 2014
Bertrand Simon; Brigitte Jaumard; Thai Hoa Le
Avoiding or preventing deadlocks in simulation tools for train scheduling remains a critical issue, especially when combined with the objective of minimizing, e.g., the travel times of the trains. In this paper, we revisit the deadlock avoidance and detection problem, and propose a new deadlock avoidance algorithm, called DEADAALG, based on a resource reservation mechanism. The DEADAALG algorithm is proved to be exact, i.e., either detects an unavoidable deadlock resulting from the input data or provide a train scheduling thanks to the scheduling algorithm, called SIMTRAS, which is free of deadlocks. Moreover, we show that the SIMTRAS algorithm is a polynomial time algorithm with an O(|S| × |T|2log |T|) time complexity, where T is the set of trains and S is the set of sections in the railway topology. Numerical experiments are conducted on the Vancouver-Calgary single-track corridor of Canadian Pacific. We then show that the SIMTRAS algorithm is very efficient and provides schedules of a quality that is comparable to those of an exact optimization algorithm, in tens of seconds for up to 30 trains/day over a planning period of 60 days.Copyright
international parallel and distributed processing symposium | 2018
Loris Marchal; Hanna Nagy; Bertrand Simon; Frédéric Vivien
Archive | 2018
Bertrand Simon
Archive | 2018
Louis-Claude Canon; Loris Marchal; Bertrand Simon; Frédéric Vivien