Vitus J. Leung
Sandia National Laboratories
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vitus J. Leung.
international conference on cluster computing | 2002
Vitus J. Leung; Esther M. Arkin; Michael A. Bender; David P. Bunde; Jeanette Johnston; Alok Lal; Joseph S. B. Mitchell; Cynthia A. Phillips; Steven S. Seiden
The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.
ieee international conference on high performance computing data and analytics | 2014
James A. Ang; Richard F. Barrett; R.E. Benner; D. Burke; Cy P. Chan; Jeanine Cook; David Donofrio; Simon D. Hammond; Karl Scott Hemmert; Suzanne M. Kelly; H. Le; Vitus J. Leung; David Resnick; Arun Rodrigues; John Shalf; Dylan T. Stark; Didem Unat; Nicholas J. Wright
To achieve exascale computing, fundamental hardware architectures must change. This will significantly impact scientific applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. To adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency in the future. An abstract machine model is designed to expose to the application developers and system software only the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. A proxy architecture is a parameterized version of an abstract machine model, with parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models enable discussion among the developers of analytic models and simulators and computer hardware architects and they allow for application performance analysis, system software development, and hardware optimization opportunities. In this paper, we present a set of abstract machine models and show how they might be used to help software developers prepare for exascale. We then apply parameters to one of these models to demonstrate how a proxy architecture can enable a more concrete exploration of how well application codes map onto future architectures.
Algorithmica | 2008
Michael A. Bender; David P. Bunde; Erik D. Demaine; Sándor P. Fekete; Vitus J. Leung; Henk Meijer; Cynthia A. Phillips
Abstract We give processor-allocation algorithms for grid architectures, where the objective is to select processors from a set of available processors to minimize the average number of communication hops. The associated clustering problem is as follows: Given n points in ℜd, find a size-k subset with minimum average pairwise L1 distance. We present a natural approximation algorithm and show that it is a
international parallel and distributed processing symposium | 2004
David P. Bunde; Vitus J. Leung; Jens Mache
\frac{7}{4}
Journal of Scheduling | 2003
Sandy Irani; Vitus J. Leung
-approximation for two-dimensional grids; in d dimensions, the approximation guarantee is
Archive | 2014
Adrian Tate; Amir Kamil; Anshu Dubey; Armin Groblinger; Brad Chamberlain; Brice Goglin; Harold C. Edwards; Chris J. Newburn; David Padua; Didem Unat; Emmanuel Jeannot; Frank Hannig; Gysi Tobias; Hatem Ltaief; James C. Sexton; Jesús Labarta; John Shalf; Karl Fuerlinger; Kathryn O'Brien; Leonidas Linardakis; Maciej Besta; Marie-Christine Sawley; Mark James Abraham; Mauro Bianco; Miquel Pericàs; Naoya Maruyama; Paul H. J. Kelly; Peter Messmer; Robert B. Ross; Romain Ciedat
2-\frac{1}{2d}
ACM Transactions on Mathematical Software | 2010
Carl Ollivier-Gooch; Lori Freitag Diachin; Mark S. Shephard; Timothy J. Tautges; Jason A. Kraftcheck; Vitus J. Leung; Xiaojuan Luo; Mark C. Miller
, which is tight. We also give a polynomial-time approximation scheme (PTAS) for constant dimension d, and we report on experimental results.
Eighth Annual Water Distribution Systems Analysis Symposium (WDSA) | 2008
Jonathan W. Berry; Robert D. Carr; William Eugene Hart; Vitus J. Leung; Cindy A. Phillips; Jean-Paul Watson
Summary form only given. Motivated by observations about job runtimes on the CPlant system, we use a trace-driven microsimulator to begin characterizing the performance of different classes of allocation algorithms on jobs with different communication patterns in space-shared parallel systems with mesh topology. We show that relative performance varies considerably with communication pattern. The paging strategy using the Hilbert space-filling curve and the best fit heuristic performed best across several communication patterns.
workshop on algorithms and data structures | 2005
Michael A. Bender; David P. Bunde; Erik D. Demaine; Sándor P. Fekete; Vitus J. Leung; Henk Meijer; Cynthia A. Phillips
In this paper, we consider the on-line scheduling of jobs that may be competing for mutually exclusive resources. We model the conflicts between jobs with a conflict graph, so that the set of all concurrently running jobs must form an independent set in the graph. This model is natural and general enough to have applications in a variety of settings; however, we are motivated by the following two specific applications: traffic intersection control and session scheduling in high speed local area networks with spatial reuse. Our results focus on two special classes of graphs motivated by our applications: bipartite graphs and interval graphs. The cost function we use is maximum response time. In all of the upper bounds, we devise algorithms which maintain a set of invariants which bound the accumulation of jobs on cliques (in the case of bipartite graphs, edges) in the graph. The lower bounds show that the invariants maintained by the algorithms are tight to within a constant factor. For a specific graph which arises in the traffic intersection control problem, we show a simple algorithm which achieves the optimal competitive ratio.
job scheduling strategies for parallel processing | 2009
Ojaswirajanya Thebe; David P. Bunde; Vitus J. Leung
The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.