Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert Latham is active.

Publication


Featured researches published by Robert Latham.


international conference on cluster computing | 2009

Scalable I/O forwarding framework for high-performance computing systems

Nawab Ali; Philip H. Carns; Kamil Iskra; Dries Kimpe; Samuel Lang; Robert Latham; Robert B. Ross; Lee Ward; P. Sadayappan

Current leadership-class machines suffer from a significant imbalance between their computational power and their I/O bandwidth. While Moores law ensures that the computational power of high-performance computing systems increases with every generation, the same is not true for their I/O subsystems. The scalability challenges faced by existing parallel file systems with respect to the increasing number of clients, coupled with the minimalistic compute node kernels running on these machines, call for a new I/O paradigm to meet the requirements of data-intensive scientific applications. I/O forwarding is a technique that attempts to bridge the increasing performance and scalability gap between the compute and I/O components of leadership-class machines by shipping I/O calls from compute nodes to dedicated I/O nodes. The I/O nodes perform operations on behalf of the compute nodes and can reduce file system traffic by aggregating, rescheduling, and caching I/O requests. This paper presents an open, scalable I/O forwarding framework for high-performance computing systems. We describe an I/O protocol and API for shipping function calls from compute nodes to I/O nodes, and we present a quantitative analysis of the overhead associated with I/O forwarding.


ieee international conference on high performance computing data and analytics | 2009

I/O performance challenges at leadership scale

Samuel Lang; Philip H. Carns; Robert Latham; Robert B. Ross; Kevin Harms; William E. Allcock

Todays top high performance computing systems run applications with hundreds of thousands of processes, contain hundreds of storage nodes, and must meet massive I/O requirements for capacity and performance. These leadership-class systems face daunting challenges to deploying scalable I/O systems. In this paper we present a case study of the I/O challenges to performance and scalability on Intrepid, the IBM Blue Gene/P system at the Argonne Leadership Computing Facility. Listed in the top 5 fastest supercomputers of 2008, Intrepid runs computational science applications with intensive demands on the I/O system. We show that Intrepids file and storage system sustain high performance under varying workloads as the applications scale with the number of processes.


international conference on cluster computing | 2009

24/7 Characterization of petascale I/O workloads

Philip H. Carns; Robert Latham; Robert B. Ross; Kamil Iskra; Samuel Lang; Katherine Riley

Developing and tuning computational science applications to run on extreme scale systems are increasingly complicated processes. Challenges such as managing memory access and tuning message-passing behavior are made easier by tools designed specifically to aid in these processes. Tools that can help users better understand the behavior of their application with respect to I/O have not yet reached the level of utility necessary to play a central role in application development and tuning. This deficiency in the tool set means that we have a poor understanding of how specific applications interact with storage. Worse, the community has little knowledge of what sorts of access patterns are common in todays applications, leading to confusion in the storage research community as to the pressing needs of the computational science community. This paper describes the Darshan I/O characterization tool. Darshan is designed to capture an accurate picture of application I/O behavior, including properties such as patterns of access within files, with the minimum possible overhead. This characterization can shed important light on the I/O behavior of applications at extreme scale. Darshan also can enable researchers to gain greater insight into the overall patterns of access exhibited by such applications, helping the storage community to understand how to best serve current computational science applications and better predict the needs of future applications. In this work we demonstrate Darshans ability to characterize the I/O behavior of four scientific applications and show that it induces negligible overhead for I/O intensive jobs with as many as 65,536 processes.


ACM Transactions on Storage | 2011

Understanding and Improving Computational Science Storage Access through Continuous Characterization

Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross

Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support.


international conference on parallel processing | 2011

Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data

Sriram Lakshminarasimhan; Neil Shah; Stephane Ethier; Scott Klasky; Robert Latham; Robert B. Ross; Nagiza F. Samatova

Modern large-scale scientific simulations running on HPC systems generate data in the order of terabytes during a single run. To lessen the I/O load during a simulation run, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Yet, lossless compression techniques are hardly suitable for scientific data due to its inherently random nature; for the applications used here, they offer less than 10% compression rate. They also impose significant overhead during decompression, making them unsuitable for data analysis and visualization that require repeated data access. To address this problem, we propose an effective method for In-situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a preconditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥ 0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as Wavelet compression. Moreover, besides being a communication-free and scalable compression technique, ISABELA is an inherently local decompression method, namely it does not decode the entire data, making it attractive for random access.


high-performance computer architecture | 2006

High performance file I/O for the Blue Gene/L supercomputer

Hao Yu; Ramendra K. Sahoo; C. Howson; George S. Almasi; José G. Castaños; Manish Gupta; José E. Moreira; Jeffrey J. Parker; Thomas Eugene Engelsiepen; Robert B. Ross; Rajeev Thakur; Robert Latham; William Gropp

Parallel I/O plays a crucial role for most data-intensive applications running on massively parallel systems like Blue Gene/L that provides the promise of delivering enormous computational capability. We designed and implemented a highly scalable parallel file I/O architecture for Blue Gene/L, which leverages the benefit of the hierarchical and functional partitioning design of the system software with separate computational and I/O cores. The architecture exploits the scalability aspect of GPFS (General Parallel File System) at the backend, while using MPI I/O as an interface between the application I/O and the file system. We demonstrate the impact of our high performance I/O solution for Blue Gene/L with a comprehensive evaluation that consists of a number of widely used parallel I/O benchmarks and I/O intensive applications. Our design and implementation is not only able to deliver at least one order of magnitude speed up in terms of I/O bandwidth for a real-scale application HOMME (achieving aggregate bandwidth of 1.8 GB/Sec and 2.3 GB/Sec for write and read accesses, respectively), but also supports high-level parallel I/O data interfaces such as parallel HDF5 and parallel NetCDF scaling up to a large number of processors.


ieee conference on mass storage systems and technologies | 2011

Understanding and improving computational science storage access through continuous characterization

Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross

Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support.


ieee international conference on high performance computing data and analytics | 2011

ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data

Sriram Lakshminarasimhan; John Jenkins; Zhenhuan Gong; Hemanth Kolla; S. Ku; Stephane Ethier; J.H. Chen; Choong-Seock Chang; Scott Klasky; Robert Latham; Robert B. Ross; Nagiza F. Samatova

Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed - rather than original, full-size - data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatio-temporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries - a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first technology that enables query-driven analytics over the compressed spatio-temporal floating-point double- or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.


international conference on data engineering | 2012

ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression

Eric R. Schendel; Ye Jin; Neil Shah; J.H. Chen; Choong-Seock Chang; S. Ku; Stephane Ethier; Scott Klasky; Robert Latham; Robert B. Ross; Nagiza F. Samatova

Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.


international conference on parallel processing | 2009

End-to-End Study of Parallel Volume Rendering on the IBM Blue Gene/P

Tom Peterka; Hongfeng Yu; Robert B. Ross; Kwan-Liu Ma; Robert Latham

In addition to their role as simulation engines, modern supercomputers can be harnessed for scientific visualization. Their extensive concurrency, parallel storage systems, and high-performance interconnects can mitigate the expanding size and complexity of scientific datasets and prepare for in situ visualization of these data. In ongoing research into testing parallel volume rendering on the IBM Blue Gene/P (BG/P), we measure performance of disk I/O, rendering, and compositing on large datasets, and evaluate bottlenecks with respect to system-specific I/O and communication patterns. To extend the scalability of the direct-send image compositing stage of the volume rendering algorithm, we limit the number of compositing cores when many small messages are exchanged. To improve the data-loading stage of the volume renderer, we study the I/O signatures of the algorithm in detail. The results of this research affirm that a distributed-memory computing architecture such as BG/P is a scalable platform for large visualization problems.

Collaboration


Dive into the Robert Latham's collaboration.

Top Co-Authors

Avatar

Robert B. Ross

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Philip H. Carns

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Rajeev Thakur

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Samuel Lang

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shane Snyder

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nagiza F. Samatova

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Scott Klasky

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Kevin Harms

Argonne National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge