Henri Casanova | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Henri Casanova is active.

Explore More

Publication

Featured researches published by Henri Casanova.

international symposium on parallel and distributed computing | 2007

A Comparison of Scheduling Approaches for Mixed-Parallel Applications on Heterogeneous Platforms

Tchimou N'Takpé; Frédéric Suter; Henri Casanova

Mixed-parallel applications can take advantage of large-scale computing platforms but scheduling them efficiently on such platforms is challenging. In this paper we compare the two main proposed approaches for solving this scheduling problem on a heterogeneous set of homogeneous clusters. We first modify previously proposed algorithms for both approaches and show that our modifications lead to significant improvements. We then perform a comparison of the modified algorithms in simulation over a wide range of application and platform conditions. We find that although both approaches have advantages, one of them is most likely the most appropriate for the majority of users.

international symposium on computer architecture | 2012

A case for random shortcut topologies for HPC interconnects

Michihiro Koibuchi; Hiroki Matsutani; Hideharu Amano; D. Frank Hsu; Henri Casanova

As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Fortunately, modern High Performance Computing (HPC) systems can exploit low-latency topologies of high-radix switches. In this context, we propose the use of random shortcut topologies, which are generated by augmenting classical topologies with random links. Using graph analysis we find that these topologies, when compared to non-random topologies of the same degree, lead to drastically reduced diameter and average shortest path length. The best results are obtained when adding random links to a ring topology, meaning that good random shortcut topologies can easily be generated for arbitrary numbers of switches. Using flit-level discrete event simulation we find that random shortcut topologies achieve throughput comparable to and latency lower than that of existing non-random topologies such as hypercubes and tori. Finally, we discuss and quantify practical challenges for random shortcut topologies, including routing scalability and larger physical cable lengths.

IEEE Transactions on Parallel and Distributed Systems | 2009

Scheduling Parallel Task Graphs on (Almost) Homogeneous Multicluster Platforms

Pierre-François Dutot; Tchimou N'Takpé; Frédéric Suter; Henri Casanova

Applications structured as parallel task graphs exhibit both data and task parallelism and arise in many domains. Scheduling these applications efficiently on parallel platforms has been a long-standing challenge. In the case of a single homogeneous platform, such as a cluster, results have been obtained both in theory, i.e., guaranteed algorithms, and, in practice, i.e., pragmatic heuristics. Due to task parallelism, these applications are well suited for execution on distributed platforms that span multiple clusters possibly in multiple institutions. However, the only available results in this context are nonguaranteed heuristics. In this paper, we develop a scheduling algorithm, MCGAS, which is applicable to multicluster platforms that are almost homogeneous. Such platforms are often found as large subsets of multicluster platforms. Our novel contribution is that MCGAS computes task allocations so that a (tunable) performance guarantee is provided. Since a performance guarantee does not necessarily imply good average performance in practice, we also compare MCGAS with a recently proposed nonguaranteed algorithm. Using simulation over a wide range of experimental scenarios, we find that MCGAS leads to better average application makespans than its competitor.

cluster computing and the grid | 2006

Scalable Grid Application Scheduling via Decoupled Resource Selection and Scheduling

Yang Zhang; Anirban Mandal; Henri Casanova; Andrew A. Chien; Yang-Suk Kee; Ken Kennedy; Charles Koelbel

Over the past years grid infrastructures have been deployed at larger and larger scales, with envisioned deployments incorporating tens of thousands of resources. Therefore, application scheduling algorithms can become unscalable (albeit polynomial) and thus unusable in large-scale environments. One reason for unscalability is that these algorithms perform implicit resource selection. One can achieve better scalability by performing explicit resource selection independently from scheduling in a decoupled approach. Furthermore, we hypothesize that one can achieve similar or even better performance as with the non-decoupled approach, which we call the one step approach, by selecting resources judiciously. Leveraging the Virtual Grid abstraction, we demonstrate that the decoupled approach is indeed both scalable and effective in large-scale and highly heterogeneous resource environments.

international parallel and distributed processing symposium | 2006

Using virtual grids to simplify application scheduling

Richard Y. Huang; Henri Casanova; Andrew A. Chien

Users and developers of grid applications have access to increasing numbers of resources. While more resources generally mean higher capabilities for an application, they also raise the issue of application scheduling scalability. First, even polynomial time scheduling heuristics may take a prohibitively long time to compute a schedule. Second, and perhaps more critical, it may not be possible to gather all the resource information needed by a scheduling algorithm in a scalable manner. Our application focus is scientific workflows, which can be represented as directed acyclic graphs (DAGs). Our claim is that, in future resource-rich environments, simple scheduling algorithms may be sufficient to achieve good workflow performances. We introduce a scalable scheduling approach that uses a resource abstraction called a virtual grid (VG). Our simulations of a range of typical DAG structures and resources demonstrate that a simple greedy scheduling heuristic combined with the virtual grid abstraction is as effective and more scalable than more complex heuristic DAG scheduling algorithms on large-scale platforms

high performance distributed computing | 2006

On the Harmfulness of Redundant Batch Requests

Henri Casanova

Most parallel computing resources are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes are granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled resources the one that produce the shortest turnaround time. As a result, an increasing number of users resort to redundant requests: several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job; once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we investigate whether redundant requests are harmful in terms of (i) schedule performance and fairness, (ii) system load, and (iii) system predictability. We find that two main issues with redundant requests are load on the middleware and unfairness towards users who do not use redundant requests, which both depend on the number of users who use redundant requests and on the amount of request redundancy these users employ

high-performance computer architecture | 2013

Layout-conscious random topologies for HPC off-chip interconnects

Michihiro Koibuchi; Ikki Fujiwara; Hiroki Matsutani; Henri Casanova

As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Random network topologies can be used to achieve low hop counts between nodes and thus low latency. However, random topologies lead to increased aggregate cable length and cable packaging complexity on a machine room floor. In this work we propose two new methods for generating random topologies and their physical layout on a floorplan: randomize links after optimizing the physical layout, or optimize the layout after randomizing links. The first method randomly swaps link endpoints for a given non-random topology for which a good physical layout is known. The resulting topology has the same cable length and cable packaging as the original topology, but achieves lower communication latency. The second method creates a random topology with random links picked so that they will not lead to a long physical cable length, and then solves a constrained optimization problem to compute a physical layout that minimizes aggregate cable length. We quantitatively compare these two methods using both graph analysis and cycle-accurate network simulation, including comparisons with previously proposed random topologies and non-random topologies.

IEEE Transactions on Parallel and Distributed Systems | 2012

Dynamic Fractional Resource Scheduling versus Batch Scheduling

Mark Stillwell; Frédéric Vivien; Henri Casanova

We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology to share fractional node resources in a precise and controlled manner. Other VM-based scheduling approaches have focused primarily on technical issues or extensions to existing batch scheduling systems, while we take a more aggressive approach and seek to find heuristics that maximize an objective metric correlated with job performance. We derive absolute performance bounds and develop algorithms for the online nonclairvoyant version of our scheduling problem. We further evaluate these algorithms in simulation against both synthetic and real-world HPC workloads and compare our algorithms to standard batch scheduling approaches. We find that our approach improves over batch scheduling by orders of magnitude in terms of job stretch, while leading to comparable or better resource utilization. Our results demonstrate that virtualization technology coupled with lightweight online scheduling strategies can afford dramatic improvements in performance for executing HPC workloads.

ieee international conference on high performance computing data and analytics | 2014

Using group replication for resilience on exascale systems

Marin Bougeret; Henri Casanova; Yves Robert; Frédéric Vivien; Dounia Zaidouni

High performance computing applications must be resilient to faults. The traditional fault-tolerance solution is checkpoint-recovery, by which application state is saved to and recovered from secondary storage throughout execution. It has been shown that, even when using an optimal checkpointing strategy, the checkpointing overhead precludes high parallel efficiency at large scale. Additional fault-tolerance mechanisms must thus be used. Such a mechanism is replication, that is, multiple processors performing the same computation so that a processor failure does not necessarily imply an application failure. In spite of resource waste, replication can lead to higher parallel efficiency when compared to using only checkpoint-recovery at large scale. We propose to execute and checkpoint multiple application instances concurrently, an approach we term group replication. For exponential failures we give an upper bound on the expected application execution time. This bound corresponds to a particular checkpointing period that we derive. For general failures, we propose a dynamic programming algorithm to determine non-periodic checkpoint dates as well as an empirical periodic checkpointing solution whose period is found via a numerical search. Using simulation we evaluate our proposed approaches, including comparison to the non-replication case, for both exponential and Weibull failure distributions. Our broad finding is that group replication is useful in a range of realistic application and checkpointing overhead scenarios for future exascale platforms.

parallel and distributed computing: applications and technologies | 2012

Cabinet Layout Optimization of Supercomputer Topologies for Shorter Cable Length

Ikki Fujiwara; Michihiro Koibuchi; Henri Casanova

As the scales of supercomputers increase total cable length becomes enormous, e.g., up to thousands of kilometers. Recent high-radix switches with dozens of ports make switch layout and system packaging more complex. In this study, we study the optimization of the physical layout of topologies of switches on a machine room floor with the goal of reducing cable length. For a given topology, using graph clustering algorithms, we group switches logically into cabinets so that the number of inter-cabinet cables is small. Then, we map the cabinets onto a physical floor space so as to minimize total cable length. This is done by modeling and optimizing the mapping problem as a facility location problem. Our evaluation results show that, when compared to standard clustering/mapping approaches and for popular network topologies, our clustering approach can reduce the number of inter-cabinet cables by up to 40.3% and our mapping approach can reduce the inter-rack cable length by up to 39.6%.

Explore More