Vitus J. Leung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vitus J. Leung is active.

Explore More

Publication

Featured researches published by Vitus J. Leung.

international conference on cluster computing | 2002

Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies

Vitus J. Leung; Esther M. Arkin; Michael A. Bender; David P. Bunde; Jeanette Johnston; Alok Lal; Joseph S. B. Mitchell; Cynthia A. Phillips; Steven S. Seiden

The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.

ieee international conference on high performance computing data and analytics | 2014

Abstract machine models and proxy architectures for exascale computing

James A. Ang; Richard F. Barrett; R.E. Benner; D. Burke; Cy P. Chan; Jeanine Cook; David Donofrio; Simon D. Hammond; Karl Scott Hemmert; Suzanne M. Kelly; H. Le; Vitus J. Leung; David Resnick; Arun Rodrigues; John Shalf; Dylan T. Stark; Didem Unat; Nicholas J. Wright

To achieve exascale computing, fundamental hardware architectures must change. This will significantly impact scientific applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. To adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency in the future. An abstract machine model is designed to expose to the application developers and system software only the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. A proxy architecture is a parameterized version of an abstract machine model, with parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models enable discussion among the developers of analytic models and simulators and computer hardware architects and they allow for application performance analysis, system software development, and hardware optimization opportunities. In this paper, we present a set of abstract machine models and show how they might be used to help software developers prepare for exascale. We then apply parameters to one of these models to demonstrate how a proxy architecture can enable a more concrete exploration of how well application codes map onto future architectures.

Algorithmica | 2008

Communication-Aware Processor Allocation for Supercomputers: Finding Point Sets of Small Average Distance

Michael A. Bender; David P. Bunde; Erik D. Demaine; Sándor P. Fekete; Vitus J. Leung; Henk Meijer; Cynthia A. Phillips

Abstract We give processor-allocation algorithms for grid architectures, where the objective is to select processors from a set of available processors to minimize the average number of communication hops. The associated clustering problem is as follows: Given n points in ℜd, find a size-k subset with minimum average pairwise L1 distance. We present a natural approximation algorithm and show that it is a

international parallel and distributed processing symposium | 2004

Communication patterns and allocation strategies

David P. Bunde; Vitus J. Leung; Jens Mache

\frac{7}{4}

Journal of Scheduling | 2003

Scheduling with conflicts on bipartite and interval graphs

Sandy Irani; Vitus J. Leung

-approximation for two-dimensional grids; in d dimensions, the approximation guarantee is

Archive | 2014

Programming Abstractions for Data Locality

Adrian Tate; Amir Kamil; Anshu Dubey; Armin Groblinger; Brad Chamberlain; Brice Goglin; Harold C. Edwards; Chris J. Newburn; David Padua; Didem Unat; Emmanuel Jeannot; Frank Hannig; Gysi Tobias; Hatem Ltaief; James C. Sexton; Jesús Labarta; John Shalf; Karl Fuerlinger; Kathryn O'Brien; Leonidas Linardakis; Maciej Besta; Marie-Christine Sawley; Mark James Abraham; Mauro Bianco; Miquel Pericàs; Naoya Maruyama; Paul H. J. Kelly; Peter Messmer; Robert B. Ross; Romain Ciedat

2-\frac{1}{2d}

ACM Transactions on Mathematical Software | 2010

An Interoperable, Data-Structure-Neutral Component for Mesh Query and Manipulation

Carl Ollivier-Gooch; Lori Freitag Diachin; Mark S. Shephard; Timothy J. Tautges; Jason A. Kraftcheck; Vitus J. Leung; Xiaojuan Luo; Mark C. Miller

, which is tight. We also give a polynomial-time approximation scheme (PTAS) for constant dimension d, and we report on experimental results.

Eighth Annual Water Distribution Systems Analysis Symposium (WDSA) | 2008

On the Placement of Imperfect Sensors in Municipal Water Networks

Jonathan W. Berry; Robert D. Carr; William Eugene Hart; Vitus J. Leung; Cindy A. Phillips; Jean-Paul Watson

Summary form only given. Motivated by observations about job runtimes on the CPlant system, we use a trace-driven microsimulator to begin characterizing the performance of different classes of allocation algorithms on jobs with different communication patterns in space-shared parallel systems with mesh topology. We show that relative performance varies considerably with communication pattern. The paging strategy using the Hilbert space-filling curve and the best fit heuristic performed best across several communication patterns.

workshop on algorithms and data structures | 2005

Communication-aware processor allocation for supercomputers

Michael A. Bender; David P. Bunde; Erik D. Demaine; Sándor P. Fekete; Vitus J. Leung; Henk Meijer; Cynthia A. Phillips

In this paper, we consider the on-line scheduling of jobs that may be competing for mutually exclusive resources. We model the conflicts between jobs with a conflict graph, so that the set of all concurrently running jobs must form an independent set in the graph. This model is natural and general enough to have applications in a variety of settings; however, we are motivated by the following two specific applications: traffic intersection control and session scheduling in high speed local area networks with spatial reuse. Our results focus on two special classes of graphs motivated by our applications: bipartite graphs and interval graphs. The cost function we use is maximum response time. In all of the upper bounds, we devise algorithms which maintain a set of invariants which bound the accumulation of jobs on cliques (in the case of bipartite graphs, edges) in the graph. The lower bounds show that the invariants maintained by the algorithms are tight to within a constant factor. For a specific graph which arises in the traffic intersection control problem, we show a simple algorithm which achieves the optimal competitive ratio.

job scheduling strategies for parallel processing | 2009

Scheduling Restartable Jobs with Short Test Runs

Ojaswirajanya Thebe; David P. Bunde; Vitus J. Leung

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

Explore More