Galen M. Shipman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Galen M. Shipman is active.

Explore More

Publication

Featured researches published by Galen M. Shipman.

international conference on cluster computing | 2006

Open MPI: A High-Performance, Heterogeneous MPI

Richard L. Graham; Galen M. Shipman; Ralph H. Castain; George Bosilca; Andrew Lumsdaine

The growth in the number of generally available, distributed, heterogeneous computing systems places increasing importance on the development of user-friendly tools that enable application developers to efficiently use these resources. Open MPI provides support for several aspects of heterogeneity within a single, open-source MPI implementation. Through careful abstractions, heterogeneous support maintains efficient use of uniform computational platforms. We describe Open MPIs architecture for heterogeneous network and processor support. A key design features of this implementation is the transparency to the application developer while maintaining very high levels of performance. This is demonstrated with the results of several numerical experiments

international symposium on performance analysis of systems and software | 2011

A semi-preemptive garbage collector for solid state drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Feiyi Wang; Jongman Kim

NAND flash memory is a preferred storage media for various platforms ranging from embedded systems to enterprise-scale systems. Flash devices do not have any mechanical moving parts and provide low-latency access. They also require less power compared to rotating media. Unlike hard disks, flash devices use out-of-update operations and they require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semi-preemptive GC scheme that can preempt on-going GC processing and service pending I/O requests in the queue. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-preemptive GC sheme with realistic workloads demonstrate both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-preemptive GC scheme.

european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface | 2008

MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives

Richard L. Graham; Galen M. Shipman

With local core counts on the rise, taking advantage of shared-memory to optimize collective operations can improve performance. We study several on-host shared memory optimized algorithms for MPI_Bcast, MPI_Reduce, and MPI_Allreduce, using tree-based, and reduce-scatter algorithms. For small data operations with relatively large synchronization costs fan-in/fan-out algorithms generally perform best. For large messages data manipulation constitute the largest cost and reduce-scatter algorithms are best for reductions. These optimization improve performance by up to a factor of three. Memory and cache sharing effect require deliberate process layout and careful radix selection for tree-based methods.

international parallel and distributed processing symposium | 2006

Infiniband scalability in Open MPI

Galen M. Shipman; Timothy S. Woodall; Richard L. Graham; Arthur B. Maccabe; Patrick G. Bridges

Infiniband is becoming an important interconnect technology in high performance computing. Efforts in large scale Infiniband deployments are raising scalability questions in the HPC community. Open MPI, a new open source implementation of the MPI standard targeted for production computing, provides several mechanisms to enhance Infiniband scalability. Initial comparisons with MVAPICH, the most widely used Infiniband MPI implementation, show similar performance but with much better scalability characteristics. Specifically, small message latency is improved by up to 10% in medium/large jobs and memory usage per host is reduced by as much as 300%. In addition, Open MPI provides predictable latency that is close to optimal without sacrificing bandwidth performance

petascale data storage workshop | 2010

Workload characterization of a leadership class storage cluster

Young-Jae Kim; Raghul Gunasekaran; Galen M. Shipman; David A Dillow; Zhe Zhang; Bradley W. Settlemyer

Understanding workload characteristics is critical for optimizing and improving the performance of current systems and software, and architecting new storage systems based on observed workload patterns. In this paper, we characterize the scientific workloads of the worlds fastest HPC (High Performance Computing) storage cluster, Spider, at the Oak Ridge Leadership Computing Facility (OLCF). Spider provides an aggregate bandwidth of over 240 GB/s with over 10 petabytes of RAID 6 formatted capacity. OLCFs flagship petascale simulation platform, Jaguar, and other large HPC clusters, in total over 250 thousands compute cores, depend on Spider for their I/O needs. We characterize the system utilization, the demands of reads and writes, idle time, and the distribution of read requests to write requests for the storage system observed over a period of 6 months. From this study we develop synthesized workloads and we show that the read and write I/O bandwidth usage as well as the inter-arrival time of requests can be modeled as a Pareto distribution.

Future Generation Computer Systems | 2014

The Earth System Grid Federation: An open infrastructure for access to distributed geospatial data

Luca Cinquini; Daniel J. Crichton; Chris A. Mattmann; John Harney; Galen M. Shipman; Feiyi Wang; Rachana Ananthakrishnan; Neill Miller; Sebastian Denvil; Mark Morgan; Zed Pobre; Gavin M. Bell; Charles Doutriaux; Robert S. Drach; Dean N. Williams; Philip Kershaw; Stephen Pascoe; Estanislao Gonzalez; Sandro Fiore; Roland Schweitzer

Abstract The Earth System Grid Federation (ESGF) is a multi-agency, international collaboration that aims at developing the software infrastructure needed to facilitate and empower the study of climate change on a global scale. The ESGF’s architecture employs a system of geographically distributed peer nodes, which are independently administered yet united by the adoption of common federation protocols and application programming interfaces (APIs). The cornerstones of its interoperability are the peer-to-peer messaging that is continuously exchanged among all nodes in the federation; a shared architecture and API for search and discovery; and a security infrastructure based on industry standards (OpenID, SSL, GSI and SAML). The ESGF software stack integrates custom components (for data publishing, searching, user interface, security and messaging), developed collaboratively by the team, with popular application engines (Tomcat, Solr) available from the open source community. The full ESGF infrastructure has now been adopted by multiple Earth science projects and allows access to petabytes of geophysical data, including the entire Fifth Coupled Model Intercomparison Project (CMIP5) output used by the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) and a suite of satellite observations (obs4MIPs) and reanalysis data sets (ANA4MIPs). This paper presents ESGF as a successful example of integration of disparate open source technologies into a cohesive, wide functional system, and describes our experience in building and operating a distributed and federated infrastructure to serve the needs of the global climate science community.

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface | 2007

A case for standard non-blocking collective operations

Torsten Hoefler; Prabhanjan Kambadur; Richard L. Graham; Galen M. Shipman; Andrew Lumsdaine

In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The nonblocking point-to-point and blocking collective operations currently defined by MPI provide important performance and abstraction benefits. To allow these benefits to be simultaneously realized, we present an application programming interface for non-blocking collective operations in MPI. Microbenchmark and application-based performance results demonstrate that non-blocking collective operations offer not only improved convenience, but improved performance as well, when compared to manual use of threads with blocking collectives.

Advanced Structural and Chemical Imaging | 2015

Big data and deep data in scanning and electron microscopies: deriving functionality from multidimensional data sets

Alex Belianinov; Rama K. Vasudevan; Evgheni Strelcov; Chad A. Steed; Sang Mo Yang; Alexander Tselev; Stephen Jesse; Michael D. Biegalski; Galen M. Shipman; Christopher T. Symons; Albina Y. Borisevich; Richard K Archibald; Sergei V. Kalinin

The development of electron and scanning probe microscopies in the second half of the twentieth century has produced spectacular images of the internal structure and composition of matter with nanometer, molecular, and atomic resolution. Largely, this progress was enabled by computer-assisted methods of microscope operation, data acquisition, and analysis. Advances in imaging technology in the beginning of the twenty-first century have opened the proverbial floodgates on the availability of high-veracity information on structure and functionality. From the hardware perspective, high-resolution imaging methods now routinely resolve atomic positions with approximately picometer precision, allowing for quantitative measurements of individual bond lengths and angles. Similarly, functional imaging often leads to multidimensional data sets containing partial or full information on properties of interest, acquired as a function of multiple parameters (time, temperature, or other external stimuli). Here, we review several recent applications of the big and deep data analysis methods to visualize, compress, and translate this multidimensional structural and functional data into physically and chemically relevant information.

high performance interconnects | 2011

The Common Communication Interface (CCI)

Scott Atchley; David A Dillow; Galen M. Shipman; Patrick Geoffray; Jeffrey M. Squyres; George Bosilca; Ronald G. Minnich

There are many APIs for connecting and exchanging data between network peers. Each interface varies wildly based on metrics including performance, portability, and complexity. Specifically, many interfaces make design or implementation choices emphasizing some of the more desirable metrics (e.g., performance) while sacrificing others (e.g., portability). As a direct result, software developers building large, network-based applications are forced to choose a specific network API based on a complex, multi-dimensional set of criteria. Such trade-offs inevitably result in an interface that fails to deliver some desirable features. In this paper, we introduce a novel interface that both supports many features that have become standard (or otherwise generally expected) in other communication interfaces, and strives to export a small, yet powerful, interface. This new interface draws upon years of experience from network-oriented software development best practices to systems-level implementations. The goal is to create a relatively simple, high-level communication interface with low barriers to adoption while still providing important features such as scalability, resiliency, and performance. The result is the Common Communications Interface (CCI): an intuitive API that is portable, efficient, scalable, and robust to meet the needs of network-intensive applications common in HPC and cloud computing.

measurement and modeling of computer systems | 2012

D-factor: a quantitative model of application slow-down in multi-resource shared systems

Seung-Hwan Lim; Jae-Seok Huh; Young-Jae Kim; Galen M. Shipman; Chita R. Das

Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing these benefits, while satisfying performance limits, comes at a price -- resource contention among jobs increases job completion time. In this paper, we analyze slow-downs of jobs due to contention for multiple resources in a system; referred to as dilation factor. We observe that multiple-resource contention creates non-linear dilation factors of jobs. From this observation, we establish a general quantitative model for dilation factors of jobs in multi-resource systems. A job is characterized by a vector-valued loading statistics and dilation factors of a job set are given by a quadratic function of their loading vectors. We demonstrate how to systematically characterize a job, maintain the data structure to calculate the dilation factor (loading matrix), and calculate the dilation factor of each job. We validated the accuracy of the model with multiple processes running on a native Linux server, virtualized servers, and with multiple MapReduce workloads co-scheduled in a cluster. Evaluation with measured data shows that the D-factor model has an error margin of less than 16%. We also show that the model can be integrated with an existing on-line scheduler to minimize the makespan of workloads.

Explore More