Ralph H. Castain
Los Alamos National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ralph H. Castain.
Lecture Notes in Computer Science | 2004
Edgar Gabriel; Graham E. Fagg; George Bosilca; Thara Angskun; Jack J. Dongarra; Jeffrey M. Squyres; Vishal Sahay; Prabhanjan Kambadur; Andrew Lumsdaine; Ralph H. Castain; David Daniel; Richard L. Graham; Timothy S. Woodall
A large number of MPI implementations are currently available, each of which emphasize different aspects of high-performance computing or are intended to solve a specific research problem. The result is a myriad of incompatible MPI implementations, all of which require separate installation, and the combination of which present significant logistical challenges for end users. Building upon prior research, and influenced by experience gained from the code bases of the LAM/MPI, LA-MPI, and FT-MPI projects, Open MPI is an all-new, production-quality MPI-2 implementation that is fundamentally centered around component concepts. Open MPI provides a unique combination of novel features previously unavailable in an open-source, production-quality implementation of MPI. Its component architecture provides both a stable platform for third-party research as well as enabling the run-time composition of independent software add-ons. This paper presents a high-level overview the goals, design, and implementation of Open MPI.
international conference on cluster computing | 2006
Richard L. Graham; Galen M. Shipman; Ralph H. Castain; George Bosilca; Andrew Lumsdaine
The growth in the number of generally available, distributed, heterogeneous computing systems places increasing importance on the development of user-friendly tools that enable application developers to efficiently use these resources. Open MPI provides support for several aspects of heterogeneity within a single, open-source MPI implementation. Through careful abstractions, heterogeneous support maintains efficient use of uniform computational platforms. We describe Open MPIs architecture for heterogeneous network and processor support. A key design features of this implementation is the transparency to the application developer while maintaining very high levels of performance. This is demonstrated with the results of several numerical experiments
Journal of Parallel and Distributed Computing | 2006
Sameer Shivle; Howard Jay Siegel; Anthony A. Maciejewski; Prasanna Sugavanam; Tarun Banka; Ralph H. Castain; Kiran Chindam; Steve Dussinger; Prakash Pichumani; Praveen Satyasekaran; William W. Saylor; David Sendek; J. Sousa; Jayashree Sridharan; Jose Velazco
An ad hoc grid is a heterogeneous computing and communication system that allows a group of mobile devices to accomplish a mission, often in a hostile environment. Energy management is a major concern in ad hoc grids. The problem studied here focuses on statically assigning resources in an ad hoc grid to an application composed of communicating subtasks. The goal of the allocation is to minimize the average percentage of energy consumed by the application to execute across the machines in the ad hoc grid, while meeting an application execution time constraint. This pre-computed allocation is then used when the application is deployed in a mission. Six different heuristic approaches of varying time complexities have been designed and compared via simulations to solve this ad hoc grid allocation problem. Also, a lower bound based on the performance metric has been designed to compare the performance of the heuristics developed.
international parallel and distributed processing symposium | 2004
Sameer Shivle; Ralph H. Castain; Howard Jay Siegel; Anthony A. Maciejewski; Tarun Banka; Kiran Chindam; Steve Dussinger; Prakash Pichumani; Praveen Satyasekaran; William W. Saylor; David Sendek; J. Sousa; Jayashree Sridharan; Prasanna Sugavanam; Jose Velazco
Summary form only given. An ad hoc grid is a heterogeneous computing and communication system without a fixed infrastructure; all of its components are mobile. Energy management is a major concern in an ad hoc grid. One important aspect of energy management is to minimize the energy consumption during a mission. In an ad hoc grid, communication and computations are deeply intertwined, and any energy optimization must consider both types of activities together rather than separately. The mapping (defined as matching and scheduling) of tasks onto machines with varied computational capabilities has been shown, in general, to be an NP-complete problem. Therefore, heuristic techniques are required to efficiently map tasks to machines in an ad hoc grid so as to minimize the energy consumed due to communication and computation. This research evaluates and compares energy management issues for resource allocation in ad hoc grids using six static heuristics.
Future Generation Computer Systems | 2008
Ralph H. Castain; Timothy S. Woodall; David Daniel; Jeffrey M. Squyres; Brian Barrett; Graham Edward Fagg
The Open Run-Time Environment (OpenRTE)-a spin-off from the Open MPI project-was developed to support distributed high-performance computing applications operating in a heterogeneous environment. The system transparently provides support for interprocess communication, resource discovery and allocation, and process launch across a variety of platforms. In addition, users can launch their applications remotely from their desktop, disconnect from them, and reconnect at a later time to monitor progress. This paper will describe the capabilities of the OpenRTE system, describe its architecture, and discuss future directions for the project.
Lecture Notes in Computer Science | 2004
Timothy S. Woodall; Richard L. Graham; Ralph H. Castain; David Daniel; Mitchel W. Sukalski; Graham E. Fagg; Edgar Gabriel; George Bosilca; Thara Angskun; Jack J. Dongarra; Jeffrey M. Squyres; Vishal Sahay; Prabhanjan Kambadur; Andrew Lumsdaine
TEG is a new component-based methodology for point-to-point messaging. Developed as part of the Open MPI project, TEG provides a configurable fault-tolerant capability for high-performance messaging that utilizes multi-network interfaces where available. Initial performance comparisons with other MPI implementations show comparable ping-pong latencies, but with bandwidths up to 30% higher.
Lecture Notes in Computer Science | 2004
Timothy S. Woodall; Richard L. Graham; Ralph H. Castain; David Daniel; Mitchel W. Sukalski; Graham E. Fagg; Edgar Gabriel; George Bosilca; Thara Angskun; Jack J. Dongarra; Jeffrey M. Squyres; Vishal Sahay; Prabhanjan Kambadur; Andrew Lumsdaine
TEG is a new methodology for point-to-point messaging developed as a part of the Open MPI project. Initial performance measurements are presented, showing comparable ping-pong latencies in a single NIC configuration, but with bandwidths up to 30% higher than that achieved by other leading MPI implementations. Homogeneous dual-NIC configurations further improved performance, but the heterogeneous case requires continued investigation.
Lecture Notes in Computer Science | 2005
Ralph H. Castain; Timothy S. Woodall; David Daniel; Jeffrey M. Squyres; Graham E. Fagg
The Open Run-Time Environment (OpenRTE)—a spin-off from the Open MPI project—was developed to support distributed high-performance computing applications operating in a heterogeneous environment. The system transparently provides support for interprocess communication, resource discovery and allocation, and process launch across a variety of platforms. In addition, users can launch their applications remotely from their desktop, disconnect from them, and reconnect at a later time to monitor progress. This paper will describe the capabilities of the OpenRTE system, describe its architecture, and discuss future directions for the project.
The Journal of Supercomputing | 2007
Ralph H. Castain; Jeffrey M. Squyres
Abstract Meeting the future computing needs of the scientific community will likely require the development of petascale computing environments based on the integration of significant numbers of processors into large-scale clusters, and the (possibly heterogeneous) aggregation of multiple clusters for use by individual and/or synchronized applications. Despite the best of efforts, such complex systems dictate that applications must expect to encounter failures of their computing resources and/or networks during the course of execution. The Open Run-Time Environment (OpenRTE) has been designed to support high-performance computing applications in such environments. Gaining acceptance by the user community requires that OpenRTE not only meet basic functional requirements, but must also provide users with (a) a transparent interface that avoids the need to customize applications when moving between specific computing and/or communication resources; (b) effective strategies that can be selected at run-time for dealing with faults; (c) transparent support for inter-process communication, resource discovery and allocation, and process launch across a variety of platforms; and (d) the ability to launch their applications remotely from their desktop, disconnect from them, and reconnect at a later time to monitor progress. This paper provides an updated description of OpenRTE and discusses its relation to the current grid protocols. In addition, we introduce the concept of resilient computing—a next-generation approach to fault tolerance—and describe how OpenRTE will utilize this concept in the future.
international parallel and distributed processing symposium | 2004
Ralph H. Castain; William W. Saylor; Howard Jay Siegel
Summary form only given. An ad hoc computing grid is characterized not only by constraints on the available energy and communications bandwidth associated with each participating device, but also by the dynamic nature of the grid itself. This is caused by the mobile nature of the assets connected to the grid (computing devices, sensors, and users), plus the fragility of interconnecting communication links. The challenge, therefore, is to efficiently and robustly manage both computational and communication resources in this dynamic, unpredictable environment. We report on one potential solution that combines Lagrangian techniques with the receding horizon concept used in modern robust control systems.