Ralph H. Castain | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ralph H. Castain is active.

Explore More

Publication

Featured researches published by Ralph H. Castain.

Lecture Notes in Computer Science | 2004

Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation

Edgar Gabriel; Graham E. Fagg; George Bosilca; Thara Angskun; Jack J. Dongarra; Jeffrey M. Squyres; Vishal Sahay; Prabhanjan Kambadur; Andrew Lumsdaine; Ralph H. Castain; David Daniel; Richard L. Graham; Timothy S. Woodall

A large number of MPI implementations are currently available, each of which emphasize different aspects of high-performance computing or are intended to solve a specific research problem. The result is a myriad of incompatible MPI implementations, all of which require separate installation, and the combination of which present significant logistical challenges for end users. Building upon prior research, and influenced by experience gained from the code bases of the LAM/MPI, LA-MPI, and FT-MPI projects, Open MPI is an all-new, production-quality MPI-2 implementation that is fundamentally centered around component concepts. Open MPI provides a unique combination of novel features previously unavailable in an open-source, production-quality implementation of MPI. Its component architecture provides both a stable platform for third-party research as well as enabling the run-time composition of independent software add-ons. This paper presents a high-level overview the goals, design, and implementation of Open MPI.

international conference on cluster computing | 2006

Open MPI: A High-Performance, Heterogeneous MPI

Richard L. Graham; Galen M. Shipman; Ralph H. Castain; George Bosilca; Andrew Lumsdaine

The growth in the number of generally available, distributed, heterogeneous computing systems places increasing importance on the development of user-friendly tools that enable application developers to efficiently use these resources. Open MPI provides support for several aspects of heterogeneity within a single, open-source MPI implementation. Through careful abstractions, heterogeneous support maintains efficient use of uniform computational platforms. We describe Open MPIs architecture for heterogeneous network and processor support. A key design features of this implementation is the transparency to the application developer while maintaining very high levels of performance. This is demonstrated with the results of several numerical experiments

Journal of Parallel and Distributed Computing | 2006

Static allocation of resources to communicating subtasks in a heterogeneous ad hoc grid environment

Sameer Shivle; Howard Jay Siegel; Anthony A. Maciejewski; Prasanna Sugavanam; Tarun Banka; Ralph H. Castain; Kiran Chindam; Steve Dussinger; Prakash Pichumani; Praveen Satyasekaran; William W. Saylor; David Sendek; J. Sousa; Jayashree Sridharan; Jose Velazco

An ad hoc grid is a heterogeneous computing and communication system that allows a group of mobile devices to accomplish a mission, often in a hostile environment. Energy management is a major concern in ad hoc grids. The problem studied here focuses on statically assigning resources in an ad hoc grid to an application composed of communicating subtasks. The goal of the allocation is to minimize the average percentage of energy consumed by the application to execute across the machines in the ad hoc grid, while meeting an application execution time constraint. This pre-computed allocation is then used when the application is deployed in a mission. Six different heuristic approaches of varying time complexities have been designed and compared via simulations to solve this ad hoc grid allocation problem. Also, a lower bound based on the performance metric has been designed to compare the performance of the heuristics developed.

international parallel and distributed processing symposium | 2004

Static mapping of subtasks in a heterogeneous ad hoc grid environment

Sameer Shivle; Ralph H. Castain; Howard Jay Siegel; Anthony A. Maciejewski; Tarun Banka; Kiran Chindam; Steve Dussinger; Prakash Pichumani; Praveen Satyasekaran; William W. Saylor; David Sendek; J. Sousa; Jayashree Sridharan; Prasanna Sugavanam; Jose Velazco

Summary form only given. An ad hoc grid is a heterogeneous computing and communication system without a fixed infrastructure; all of its components are mobile. Energy management is a major concern in an ad hoc grid. One important aspect of energy management is to minimize the energy consumption during a mission. In an ad hoc grid, communication and computations are deeply intertwined, and any energy optimization must consider both types of activities together rather than separately. The mapping (defined as matching and scheduling) of tasks onto machines with varied computational capabilities has been shown, in general, to be an NP-complete problem. Therefore, heuristic techniques are required to efficiently map tasks to machines in an ad hoc grid so as to minimize the energy consumed due to communication and computation. This research evaluates and compares energy management issues for resource allocation in ad hoc grids using six static heuristics.

Future Generation Computer Systems | 2008

The Open Run-Time Environment (OpenRTE): A transparent multicluster environment for high-performance computing

Ralph H. Castain; Timothy S. Woodall; David Daniel; Jeffrey M. Squyres; Brian Barrett; Graham Edward Fagg

The Open Run-Time Environment (OpenRTE)-a spin-off from the Open MPI project-was developed to support distributed high-performance computing applications operating in a heterogeneous environment. The system transparently provides support for interprocess communication, resource discovery and allocation, and process launch across a variety of platforms. In addition, users can launch their applications remotely from their desktop, disconnect from them, and reconnect at a later time to monitor progress. This paper will describe the capabilities of the OpenRTE system, describe its architecture, and discuss future directions for the project.

Lecture Notes in Computer Science | 2004

TEG: A High-Performance, Scalable, Multi-network Point-to-Point Communications Methodology

Timothy S. Woodall; Richard L. Graham; Ralph H. Castain; David Daniel; Mitchel W. Sukalski; Graham E. Fagg; Edgar Gabriel; George Bosilca; Thara Angskun; Jack J. Dongarra; Jeffrey M. Squyres; Vishal Sahay; Prabhanjan Kambadur; Andrew Lumsdaine

TEG is a new component-based methodology for point-to-point messaging. Developed as part of the Open MPI project, TEG provides a configurable fault-tolerant capability for high-performance messaging that utilizes multi-network interfaces where available. Initial performance comparisons with other MPI implementations show comparable ping-pong latencies, but with bandwidths up to 30% higher.

Lecture Notes in Computer Science | 2004

Open MPI's TEG Point-to-Point Communications Methodology: Comparison to Existing Implementations

TEG is a new methodology for point-to-point messaging developed as a part of the Open MPI project. Initial performance measurements are presented, showing comparable ping-pong latencies in a single NIC configuration, but with bandwidths up to 30% higher than that achieved by other leading MPI implementations. Homogeneous dual-NIC configurations further improved performance, but the heterogeneous case requires continued investigation.

Lecture Notes in Computer Science | 2005

The open run-time environment (OpenRTE): a transparent multi-cluster environment for high-performance computing

Ralph H. Castain; Timothy S. Woodall; David Daniel; Jeffrey M. Squyres; Graham E. Fagg

The Open Run-Time Environment (OpenRTE)—a spin-off from the Open MPI project—was developed to support distributed high-performance computing applications operating in a heterogeneous environment. The system transparently provides support for interprocess communication, resource discovery and allocation, and process launch across a variety of platforms. In addition, users can launch their applications remotely from their desktop, disconnect from them, and reconnect at a later time to monitor progress. This paper will describe the capabilities of the OpenRTE system, describe its architecture, and discuss future directions for the project.

The Journal of Supercomputing | 2007

Creating a transparent, distributed, and resilient computing environment: the OpenRTE project

Ralph H. Castain; Jeffrey M. Squyres

Abstract Meeting the future computing needs of the scientific community will likely require the development of petascale computing environments based on the integration of significant numbers of processors into large-scale clusters, and the (possibly heterogeneous) aggregation of multiple clusters for use by individual and/or synchronized applications. Despite the best of efforts, such complex systems dictate that applications must expect to encounter failures of their computing resources and/or networks during the course of execution. The Open Run-Time Environment (OpenRTE) has been designed to support high-performance computing applications in such environments. Gaining acceptance by the user community requires that OpenRTE not only meet basic functional requirements, but must also provide users with (a) a transparent interface that avoids the need to customize applications when moving between specific computing and/or communication resources; (b) effective strategies that can be selected at run-time for dealing with faults; (c) transparent support for inter-process communication, resource discovery and allocation, and process launch across a variety of platforms; and (d) the ability to launch their applications remotely from their desktop, disconnect from them, and reconnect at a later time to monitor progress. This paper provides an updated description of OpenRTE and discusses its relation to the current grid protocols. In addition, we introduce the concept of resilient computing—a next-generation approach to fault tolerance—and describe how OpenRTE will utilize this concept in the future.

international parallel and distributed processing symposium | 2004

Application of Lagrangian receding horizon techniques to resource management in ad hoc grid environments

Ralph H. Castain; William W. Saylor; Howard Jay Siegel

Summary form only given. An ad hoc computing grid is characterized not only by constraints on the available energy and communications bandwidth associated with each participating device, but also by the dynamic nature of the grid itself. This is caused by the mobile nature of the assets connected to the grid (computing devices, sensors, and users), plus the fragility of interconnecting communication links. The challenge, therefore, is to efficiently and robustly manage both computational and communication resources in this dynamic, unpredictable environment. We report on one potential solution that combines Lagrangian techniques with the receding horizon concept used in modern robust control systems.

Explore More