Samuel Lang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Samuel Lang is active.

Explore More

Publication

Featured researches published by Samuel Lang.

international conference on cluster computing | 2009

Scalable I/O forwarding framework for high-performance computing systems

Nawab Ali; Philip H. Carns; Kamil Iskra; Dries Kimpe; Samuel Lang; Robert Latham; Robert B. Ross; Lee Ward; P. Sadayappan

Current leadership-class machines suffer from a significant imbalance between their computational power and their I/O bandwidth. While Moores law ensures that the computational power of high-performance computing systems increases with every generation, the same is not true for their I/O subsystems. The scalability challenges faced by existing parallel file systems with respect to the increasing number of clients, coupled with the minimalistic compute node kernels running on these machines, call for a new I/O paradigm to meet the requirements of data-intensive scientific applications. I/O forwarding is a technique that attempts to bridge the increasing performance and scalability gap between the compute and I/O components of leadership-class machines by shipping I/O calls from compute nodes to dedicated I/O nodes. The I/O nodes perform operations on behalf of the compute nodes and can reduce file system traffic by aggregating, rescheduling, and caching I/O requests. This paper presents an open, scalable I/O forwarding framework for high-performance computing systems. We describe an I/O protocol and API for shipping function calls from compute nodes to I/O nodes, and we present a quantitative analysis of the overhead associated with I/O forwarding.

ieee international conference on high performance computing data and analytics | 2009

I/O performance challenges at leadership scale

Samuel Lang; Philip H. Carns; Robert Latham; Robert B. Ross; Kevin Harms; William E. Allcock

Todays top high performance computing systems run applications with hundreds of thousands of processes, contain hundreds of storage nodes, and must meet massive I/O requirements for capacity and performance. These leadership-class systems face daunting challenges to deploying scalable I/O systems. In this paper we present a case study of the I/O challenges to performance and scalability on Intrepid, the IBM Blue Gene/P system at the Argonne Leadership Computing Facility. Listed in the top 5 fastest supercomputers of 2008, Intrepid runs computational science applications with intensive demands on the I/O system. We show that Intrepids file and storage system sustain high performance under varying workloads as the applications scale with the number of processes.

international conference on cluster computing | 2009

24/7 Characterization of petascale I/O workloads

Philip H. Carns; Robert Latham; Robert B. Ross; Kamil Iskra; Samuel Lang; Katherine Riley

Developing and tuning computational science applications to run on extreme scale systems are increasingly complicated processes. Challenges such as managing memory access and tuning message-passing behavior are made easier by tools designed specifically to aid in these processes. Tools that can help users better understand the behavior of their application with respect to I/O have not yet reached the level of utility necessary to play a central role in application development and tuning. This deficiency in the tool set means that we have a poor understanding of how specific applications interact with storage. Worse, the community has little knowledge of what sorts of access patterns are common in todays applications, leading to confusion in the storage research community as to the pressing needs of the computational science community. This paper describes the Darshan I/O characterization tool. Darshan is designed to capture an accurate picture of application I/O behavior, including properties such as patterns of access within files, with the minimum possible overhead. This characterization can shed important light on the I/O behavior of applications at extreme scale. Darshan also can enable researchers to gain greater insight into the overall patterns of access exhibited by such applications, helping the storage community to understand how to best serve current computational science applications and better predict the needs of future applications. In this work we demonstrate Darshans ability to characterize the I/O behavior of four scientific applications and show that it induces negligible overhead for I/O intensive jobs with as many as 65,536 processes.

ACM Transactions on Storage | 2011

Understanding and Improving Computational Science Storage Access through Continuous Characterization

Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross

Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support.

international parallel and distributed processing symposium | 2009

Small-file access in parallel file systems

Philip H. Carns; Samuel Lang; Robert B. Ross; Murali Vilayannur; Julian M. Kunkel; Thomas Ludwig

Todays computational science demands have resulted in ever larger parallel computers, and storage systems have grown to match these demands. Parallel file systems used in this environment are increasingly specialized to extract the highest possible performance for large I/O operations, at the expense of other potential workloads. While some applications have adapted to I/O best practices and can obtain good performance on these systems, the natural I/O patterns of many applications result in generation of many small files. These applications are not well served by current parallel file systems at very large scale. This paper describes five techniques for optimizing small-file access in parallel file systems for very large scale systems. These five techniques are all implemented in a single parallel file system (PVFS) and then systematically assessed on two test platforms. A microbenchmark and the mdtest benchmark are used to evaluate the optimizations at an unprecedented scale. We observe as much as a 905% improvement in small-file create rates, 1,106% improvement in small-file stat rates, and 727% improvement in small-file removal rates, compared to a baseline PVFS configuration on a leadership computing platform using 16,384 cores.

ieee conference on mass storage systems and technologies | 2010

Enabling active storage on parallel I/O software stacks

Seung Woo Son; Samuel Lang; Philip H. Carns; Robert B. Ross; Rajeev Thakur; Berkin Özisikyilmaz; Prabhat Kumar; Wei-keng Liao; Alok N. Choudhary

As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows data analysis, mining, and statistical operations to be executed from within a parallel I/O interface. In our proposed scheme, common analysis kernels are embedded in parallel file systems. We expose the semantics of these kernels to parallel file systems through an enhanced runtime interface so that execution of embedded kernels is possible on the server. In order to allow complete server-side operations without file format or layout manipulation, our scheme adjusts the file I/O buffer to the computational unit boundary on the fly. Our scheme also uses server-side collective communication primitives for reduction and aggregation using interserver communication. We have implemented a prototype of our active storage system and demonstrate its benefits using four data analysis benchmarks. Our experimental results show that our proposed system improves the overall performance of all four benchmarks by 50.9% on average and that the compute-intensive portion of the k-means clustering kernel can be improved by 58.4% through GPU offloading when executed with a larger computational load. We also show that our scheme consistently outperforms the traditional storage model with a wide variety of input dataset sizes, number of nodes, and computational loads.

ieee conference on mass storage systems and technologies | 2011

Understanding and improving computational science storage access through continuous characterization

Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross

petascale data storage workshop | 2007

GIGA+: scalable directories for shared file systems

Swapnil Patil; Garth A. Gibson; Samuel Lang; Milo Polte

There is an increasing use of high-performance computing (HPC) clusters with thousands of compute nodes that, with the advent of multi-core CPUs, will impose a significant challenge for storage systems: The ability to scale to handle I/O generated by applications executing in parallel in tens of thousands of threads. One such challenge is building scalable directories for cluster storage - i.e., directories that can store billions to trillions of entries and handle hundreds of thousands of operations per second.

ieee international conference on high performance computing data and analytics | 2011

Server-side I/O coordination for parallel file systems

Huaiming Song; Yanlong Yin; Xian-He Sun; Rajeev Thakur; Samuel Lang

Parallel file systems have become a common component of modern high-end computers to mask the ever-increasing gap between disk data access speed and CPU computing power. However, while working well for certain applications, current parallel file systems lack the ability to effectively handle concurrent I/O requests with data synchronization needs, whereas concurrent I/O is the norm in data-intensive applications. Recognizing that an I/O request will not complete until all involved file servers in the parallel file system have completed their parts, in this paper we propose a server-side I/O coordination scheme for parallel file systems. The basic idea is to coordinate file servers to serve one application at a time in order to reduce the completion time, and in the meantime maintain the server utilization and fairness. A window-wide coordination concept is introduced to serve our purpose. We present the proposed I/O coordination algorithm and its corresponding analysis of average completion time in this study. We also implement a prototype of the proposed scheme under the PVFS2 file system and MPI-IO environment. Experimental results demonstrate that the proposed scheme can reduce average completion time by 8% to 46%, and provide higher I/O bandwidth than that of default data access strategies adopted by PVFS2 for heavy I/O workloads. Experimental results also show that the server-side I/O coordination scheme has good scalability.

ieee/acm international symposium cluster, cloud and grid computing | 2011

A Segment-Level Adaptive Data Layout Scheme for Improved Load Balance in Parallel File Systems

Huaiming Song; Yanlong Yin; Xian-He Sun; Rajeev Thakur; Samuel Lang

Parallel file systems are designed to mask the ever-increasing gap between CPU and disk speeds via parallel I/O processing. While they have become an indispensable component of modern high-end computing systems, their inadequate performance is a critical issue facing the HPC community today. Conventionally, a parallel file system stripes a file across multiple file servers with a fixed stripe size. The stripe size is a vital performance parameter, but the optimal value for it is often application dependent. How to determine the optimal stripe size is a difficult research problem. Based on the observation that many applications have different data-access clusters in one file, with each cluster having a distinguished data access pattern, we propose in this paper a segmented data layout scheme for parallel file systems. The basic idea behind the segmented approach is to divide a file logically into segments such that an optimal stripe size can be identified for each segment. A five-step method is introduced to conduct the segmentation, to identify the appropriate stripe size for each segment, and to carry out the segmented data layout scheme automatically. Experimental results show that the proposed layout scheme is feasible and effective, and it improves performance up to 163% for writing and 132% for reading on the widely used IOR and IOzone benchmarks.

Explore More