Arthur B. Maccabe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arthur B. Maccabe is active.

Explore More

Publication

Featured researches published by Arthur B. Maccabe.

programming language design and implementation | 1990

The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

Karl J. Ottenstein; Robert A. Ballance; Arthur B. Maccabe

The Program Dependence Web (PDW) is a program representation that can be directly interpreted using control-, data-, or demand-driven models of execution. A PDW combines a single-assignment version of the program with explicit operators that manage the flow of data values. The PDW can be viewed as an augmented Program Dependence Graph. Translation to the PDW representation provides the basis for projects to compile Fortran onto dynamic dataflow architectures and simulators. A second application of the PDW is the construction of various compositional semantics for program dependence graphs.

parallel computing | 2000

Massively parallel computing using commodity components

Ron Brightwell; Lee Ann Fisk; David S. Greenberg; Trammell Hudson; Michael J. Levenhagen; Arthur B. Maccabe; Rolf Riesen

The Computational Plant (Cplant) project at Sandia National Laboratories is developing a large-scale, massively parallel computing resource from a cluster of commodity computing and networking components. We are combining the benefits of commodity cluster computing with our expertise in designing, developing, using, and maintaining large-scale, massively parallel processing (MPP) machines. In this paper, we present the design goals of the cluster and an approach to developing a commodity-based computational resource capable of delivering performance comparable to production-level MPP machines. We provide a description of the hardware components of a 96-node Phase I prototype machine and discuss the experiences with the prototype that led to the hardware choices for a 400-node Phase II production machine. We give a detailed description of the management and runtime software components of the cluster and oAer computational performance data as well as performance measurements of functions that are critical to the management of large systems. ” 2000 Elsevier Science B.V. All rights reserved.

international parallel and distributed processing symposium | 2006

Infiniband scalability in Open MPI

Galen M. Shipman; Timothy S. Woodall; Richard L. Graham; Arthur B. Maccabe; Patrick G. Bridges

Infiniband is becoming an important interconnect technology in high performance computing. Efforts in large scale Infiniband deployments are raising scalability questions in the HPC community. Open MPI, a new open source implementation of the MPI standard targeted for production computing, provides several mechanisms to enhance Infiniband scalability. Initial comparisons with MVAPICH, the most widely used Infiniband MPI implementation, show similar performance but with much better scalability characteristics. Specifically, small message latency is improved by up to 10% in medium/large jobs and memory usage per host is reduced by as much as 300%. In addition, Open MPI provides predictable latency that is close to optimal without sacrificing bandwidth performance

international conference on cluster computing | 2002

COMB: a portable benchmark suite for assessing MPI overlap

William Lawry; Christopher Wilson; Arthur B. Maccabe; Ron Brightwell

This paper describes a portable benchmark suite that assesses the ability of cluster networking hardware and software to overlap MPI communication and computation. The Communication Offload MPI-based Benchmark, or COMB, uses two methods to characterize the ability of messages to make progress concurrently, with computational processing on the host processor(s). COMB measures the relationship between MPI communication bandwidth and host CPU availability.

Archive | 1994

SUNMOS for the Intel Paragon - a brief user`s guide

Arthur B. Maccabe; R. Riesen; K.S. McCurley; Stephen R. Wheat

SUNMOS is an acronym for Sandia/UNM Operating System. It was originally developed for the nCUBE-2 MIMD supercomputer between January and December of 1991. Between April and August of 1993, SUNMOS was ported to the Intel Paragon. This document provides a quick overview of how to compile and run jobs using the SUNMOS environment on the Paragon. The primary goal of SUNMOS is to provide high performance message passing and process support an example of its capabilities, SUNMOS Release 1.4 occupies approximately 240K of memory on a Paragon node, and is able to send messages at bandwidths of 165 megabytes per second with latencies as low as 42 microseconds using Intel NX calls. By contrast, Release 1.2 of OSF/1 for the Paragon occupies approximately 7 megabytes of memory on a node, has a peak bandwidth of 65 megabytes per second, and latencies as low as 42 microseconds (the communication numbers are reported elsewhere in these proceedings).

conference on high performance computing (supercomputing) | 1993

A massively parallel adaptive finite element method with dynamic load balancing

Karen D. Devine; Joseph E. Flaherty; Stephen R. Wheat; Arthur B. Maccabe

The authors construct massively parallel adaptive finite element methods for the solution of hyperbolic conservation laws. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. The resulting method is of high order and may be parallelized efficiently on MIMD computers. The authors demonstrate parallel efficiency through computations on a 1024-processor nCUBE/2 hypercube. They present results using adaptive p-refinement to reduce the computational cost of the method, and tiling, a dynamic, element-based data migration system that maintains global load balance of the adaptive method by overlapping neighborhoods of processors that each perform local balancing.

international conference on cluster computing | 2006

Efficient Data-Movement for Lightweight I/O

Ron A. Oldfield; Patrick M. Widener; Arthur B. Maccabe; Lee Ward; Todd Kordenbrock

Efficient data movement is an important part of any high-performance I/O system, but it is especially critical for the current and next-generation of massively parallel processing (MPP) systems. In this paper, we discuss how the scale, architecture, and organization of current and proposed MPP systems impact the design of the data-movement scheme for the I/O system. We also describe and analyze the approach used by the lightweight file systems (LWFS) project, and we compare that approach to more conventional data-movement protocols used by small and mid-range clusters. Our results indicate that the data-movement strategy used by LWFS clearly outperforms conventional data-movement protocols, particularly as data sizes increase

ieee international conference on high performance computing data and analytics | 2003

Design, Implementation, and Performance of MPI on Portals 3.0

Ron Brightwell; Rolf Riesen; Arthur B. Maccabe

This paper describes an implementation of the Message Passing Interface (MPI) on the Portals 3.0 data movement layer. Portals 3.0 provides low-level building blocks that are flexible enough to support higher-level message passing layers, such as MPI, very efficiently. Portals 3.0 is also designed to allow for programmable network interface cards to offload message processing from the host processor, allowing for the ability to overlap computation and MPI communication. We describe the basic building blocks in Portals 3.0, show how they can be put together to implement MPI, and describe the protocols of our MPI implementation. We look at several key operations within the implementation and describe the effects that a Portals 3.0 implementation has on scalability and performance. We also present preliminary performance results from our implementation for Myrinet.

Concurrency and Computation: Practice and Experience | 2005

Architectural specification for massively parallel computers: an experience and measurement‐based approach

Ron Brightwell; William J. Camp; Benjamin Cole; Erik P. DeBenedictis; Robert W. Leland; James L. Tomkins; Arthur B. Maccabe

In this paper, we describe the hardware and software architecture of the Red Storm system developed at Sandia National Laboratories. We discuss the evolution of this architecture and provide reasons for the different choices that have been made. We contrast our approach of leveraging high‐volume, mass‐market commodity processors to that taken for the Earth Simulator. We present a comparison of benchmarks and application performance that support our approach. We also project the performance of Red Storm and the Earth Simulator. This projection indicates that the Red Storm architecture is a much more cost‐effective approach to massively parallel computing. Published in 2005 by John Wiley & Sons, Ltd.

Archive | 2012

The Portals 4.0 network programming interface.

Ronald B. Brightwell; Kevin Pedretti; Kyle Bruce Wheeler; Karl Scott Hemmert; Rolf Riesen; Keith D. Underwood; Arthur B. Maccabe; Trammell Hudson

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities. 3

Explore More