John M. Bent | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John M. Bent is active.

Explore More

Publication

Featured researches published by John M. Bent.

ieee international conference on high performance computing data and analytics | 2009

PLFS: a checkpoint filesystem for parallel applications

John M. Bent; Garth A. Gibson; Gary Grider; Ben McClelland; Paul Nowoczynski; James Nunez; Milo Polte; Meghan Wingate

Parallel applications running across thousands of processors must protect themselves from inevitable system failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an approach, the size of writes are often small and not aligned with file system boundaries. Unfortunately for these applications, this preferred data layout results in pathologically poor performance from the underlying file system which is optimized for large, aligned writes to non-shared files. To address this fundamental mismatch, we have developed a virtual parallel log structured file system, PLFS. PLFS remaps an applications preferred data layout into one which is optimized for the underlying file system. Through testing on PanFS, Lustre, and GPFS, we have seen that this layer of indirection and reorganization can reduce checkpoint time by an order of magnitude for several important benchmarks and real applications without any application modification.

high performance distributed computing | 2002

Flexibility, manageability, and performance in a Grid storage appliance

John M. Bent; Venkateshwaran Venkataramani; Nick LeRoy; Alain Roy; Joseph Stanley; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Miron Livny

We present NeST a flexible software-only storage appliance designed to meet the storage needs of the Grid. NeST has three key features that make it well-suited for deployment in a Grid environment. First, NeST provides a generic data transfer architecture that supports multiple data transfer protocols (including GridFTP and NFS), and allows for the easy addition of new protocols. Second, NeST is dynamic, adapting itself on-the-fly so that it runs effectively on a wide range of hardware and software platforms. Third, NeST is Grid-aware, implying that features that are necessary for integration into the Grid, such as storage space guarantees, mechanisms for resource and data discovery, user authentication, and quality of service, are a part of the NeST infrastructure.

conference on high performance computing (supercomputing) | 2001

Gathering at the Well: Creating Communities for Grid I/O

Douglas Thain; John M. Bent; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Miron Livny

Grid applications have demanding I/O needs. Schedulers must bring jobs and data in close proximity in order to satisfy throughput, scalability, and policy requirements. Most systems accomplish this by making either jobs or data mobile. We propose a system that allows jobs and data to meet by binding execution and storage sites together into I/O communities which then participate in the wide-area system. The relationships between participants in a community may be expressed by the ClassAd framework. Extensions to the framework allow community members to express indirect relations. We demonstrate our implementation of I/O communities by improving the performance of a key high-energy physics simulation on an international distributed system.

high performance distributed computing | 2003

Pipeline and batch sharing in grid workloads

Douglas Thain; John M. Bent; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Miron Livny

We present a study of six batch-pipeline scientific workloads that are candidates for execution on computational grids. Whereas other studies focus on the behavior of single applications, this study characterizes workloads composed of pipelines of sequential processes that use file storage for communication and also share measurements of the memory, CPU, and I/O requirements of individual components as well as analyses of I/O sharing within complete batches. We conclude with a discussion of the ramifications of these workloads for end-to-end scalability and overall system design.

petascale data storage workshop | 2008

Introducing map-reduce to high end computing

Grant Mackey; Saba Sehrish; John M. Bent; Julio Lopez; Salman Habib; Jun Wang

In this work we present an scientific application that has been given a Hadoop MapReduce implementation. We also discuss other scientific fields of supercomputing that could benefit from a MapReduce implementation. We recognize in this work that Hadoop has potential benefit for more applications than simply data mining, but that it is not a panacea for all data intensive applications. We provide an example of how the halo finding application, when applied to large astrophysics datasets, benefits from the model of the Hadoop architecture. The halo finding application uses a friends of friends algorithm to quickly cluster together large sets of particles to output files which a visualization software can interpret. The current implementation requires that large datasets be moved from storage to computation resources for every simulation of astronomy data. Our Hadoop implementation allows for an in-place halo finding application on the datasets, which removes the time consuming process of transferring data between resources.

high performance distributed computing | 2010

MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns

Saba Sehrish; Grant Mackey; Jun Wang; John M. Bent

Due to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. Many application scientists are looking to integrate data-intensive computing into computational-intensive High Performance Computing facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. For every MapReduce application that must be run in order to complete the desired data analysis, a distributed read and write operation on the file system must be performed. Our contribution is to extend Map-Reduce to eliminate the multiple scans and also reduce the number of pre-processing MapReduce programs. We have added additional expressiveness to the MapReduce language to allow users to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data pre-processing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. Using our augmented MapReduce system, MapReduce with Access Patterns (MRAP), we have demonstrated up to 33% throughput improvement in one real application, and up to 70% in an I/O kernel of another application.

high performance distributed computing | 2013

I/O acceleration with pattern detection

Jun He; John M. Bent; Aaron Torres; Gary Grider; Garth A. Gibson; Carlos Maltzahn; Xian-He Sun

The I/O bottleneck in high-performance computing is becoming worse as application data continues to grow. In this work, we explore how patterns of I/O within these applications can significantly affect the effectiveness of the underlying storage systems and how these same patterns can be utilized to improve many aspects of the I/O stack and mitigate the I/O bottleneck. We offer three main contributions in this paper. First, we develop and evaluate algorithms by which I/O patterns can be efficiently discovered and described. Second, we implement one such algorithm to reduce the metadata quantity in a virtual parallel file system by up to several orders of magnitude, thereby increasing the performance of writes and reads by up to 40 and 480 percent respectively. Third, we build a prototype file system with pattern-aware prefetching and evaluate it to show a 46 percent reduction in I/O latency. Finally, we believe that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many high-performance applications, offers significant potential to enable many additional I/O optimizations.

petascale data storage workshop | 2009

...and eat it too: high read performance in write-optimized HPC I/O middleware file formats

Milo Polte; Jay F. Lofstead; John M. Bent; Garth A. Gibson; Scott Klasky; Qing Liu; Manish Parashar; Norbert Podhorszki; Karsten Schwan; Meghan Wingate; Matthew Wolf

As HPC applications run on increasingly high process counts on larger and larger machines, both the frequency of checkpoints needed for fault tolerance [14] and the resolution and size of Data Analysis Dumps are expected to increase proportionally. In order to maintain an acceptable ratio of time spent performing useful computation work to time spent performing I/O, write bandwidth to the underlying storage system must increase proportionally to this increase in the checkpoint and computation size. Unfortunately, popular scientific self-describing file formats such as netCDF [8] and HDF5 [3] are designed with a focus on portability and flexibility. Extra care and careful crafting of the output structure and API calls is required to optimize for write performance using these APIs. To provide sufficient write bandwidth to continue to support the demands of scientific applications, the HPC community has developed a number of I/O middleware layers, that structure output into write-optimized file formats. However, the obvious concern with any write optimized file format would be a corresponding penalty on reads. In the log-structured filesystem [13], for example, a file generated by random writes could be written efficiently, but reading the file back sequentially later would result in very poor performance. Simulation results require efficient read-back for visualization and analytics, and though most checkpoint files are never used, the efficiency of a restart is very important in the face of inevitable failures. The utility of write speed improving middleware would be greatly diminished if it sacrificed acceptable read performance. In this paper we examine the read performance of two write-optimized middleware layers on large parallel machines and compare it to reading data natively in popular file formats.

ieee conference on mass storage systems and technologies | 2012

Jitter-free co-processing on a prototype exascale storage stack

John M. Bent; Sorin Faibish; James P. Ahrens; Gary Grider; John Patchett; Percy Tzelnic; Jon Woodring

In the petascale era, the storage stack used by the extreme scale high performance computing community is fairly homogeneous across sites. On the compute edge of the stack, file system clients or IO forwarding services direct IO over an interconnect network to a relatively small set of IO nodes. These nodes forward the requests over a secondary storage network to a spindle-based parallel file system. Unfortunately, this architecture will become unviable in the exascale era. As the density growth of disks continues to outpace increases in their rotational speeds, disks are becoming increasingly cost-effective for capacity but decreasingly so for bandwidth. Fortunately, new storage media such as solid state devices are filling this gap; although not cost-effective for capacity, they are so for performance. This suggests that the storage stack at exascale will incorporate solid state storage between the compute nodes and the parallel file systems. There are three natural places into which to position this new storage layer: within the compute nodes, the IO nodes, or the parallel file system. In this paper, we argue that the IO nodes are the appropriate location for HPC workloads and show results from a prototype system that we have built accordingly. Running a pipeline of computational simulation and visualization, we show that our prototype system reduces total time to completion by up to 30%.

ieee conference on mass storage systems and technologies | 2012

Storage challenges at Los Alamos National Lab

John M. Bent; Gary Grider; Brett Michael Kettering; Adam Manzanares; Meghan McClelland; Aaron Torres; Alfred Torrez

There yet exist no truly parallel file systems. Those that make the claim fall short when it comes to providing adequate concurrent write performance at large scale. This limitation causes large usability headaches in HPC. Users need two major capabilities missing from current parallel file systems. One, they need low latency interactivity. Two, they need high bandwidth for large parallel IO; this capability must be resistant to IO patterns and should not require tuning. There are no existing parallel file systems which provide these features. Frighteningly, exascale renders these features even less attainable from currently available parallel file systems. Fortunately, there is a path forward.

Explore More