Avery Ching | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Avery Ching is active.

Explore More

Publication

Featured researches published by Avery Ching.

international conference on cluster computing | 2002

Noncontiguous I/O through PVFS

Avery Ching; Alok N. Choudhary; Wei-keng Liao; Robert B. Ross; William Gropp

With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications. The development of parallel file systems has helped to ease the performance gap, but I/O still remains an area needing significant performance improvement. Research has found that noncontiguous I/O access patterns in scientific applications combined with current file system methods, to perform these accesses lead to unacceptable performance for large data sets. To enhance performance of noncontiguous I/O, we have created list I/O, a native version of noncontiguous I/O. We have used the Parallel Virtual File System (PVFS) to implement our ideas. Our research and experimentation shows that list I/O outperforms current noncontiguous I/O access methods in most I/O situations and can substantially enhance the performance of real-world scientific applications.

cluster computing and the grid | 2003

Noncontiguous I/O accesses through MPI-IO

Avery Ching; Alok N. Choudhary; Kenin Coloma; Wei-keng Liao; Robert B. Ross; William Gropp

I/O performance remains a weakness of parallel computing systems today. While this weakness is partly attributed to rapid advances in other system components, I/O interfaces available to programmers and the I/O methods supported by file systems have traditionally not matched efficiently with the types of I/O operations that scientific applications perform, particularly noncontiguous accesses. The MPI-IO interface allows for rich descriptions of the I/O patterns desired for scientific applications and implementations such as ROMIO have taken advantage of this ability while remaining limited by underlying file system methods. A method of noncontiguous data access, list I/O, was recently implemented in the Parallel Virtual File System (PVFS). We implement support for this interface in the ROMIO MPI-IO implementation. Through a suite of noncontiguous I/O tests we compared ROMIO list I/O to current methods of ROMIO noncontiguous access and found that the list I/O interface provides performance benefits in many noncontiguous cases.

international parallel and distributed processing symposium | 2007

An Implementation and Evaluation of Client-Side File Caching for MPI-IO

Wei-keng Liao; Avery Ching; Kenin Coloma; Alok N. Choudhary; Lee Ward

Client-side file caching has long been recognized as a file system enhancement to reduce the amount of data transfer between application processes and I/O servers. However, caching also introduces cache coherence problems when a file is simultaneously accessed by multiple processes. Existing coherence controls tend to treat the client processes independently and ignore the aggregate I/O access pattern. This causes a serious performance degradation for parallel I/O applications. In this paper we discuss our new implementation and present an extended performance evaluation on GPFS and Lustre parallel file systems. In addition to comparing our methods to traditional approaches, we examine the performance of MPI-IO caching under direct I/O mode to bypass the underlying file system cache. We also investigate the performance impact of two file domain partitioning methods to MPI collective I/O operations: one which creates a balanced workload and the other which aligns accesses to the file system stripe size. In our experiments, alignment results in better performance by reducing file lock contention. When the cache page size is set to a multiple of the stripe size, MPI-IO caching inherits the same advantage and produces significantly improved I/O bandwidth.

international parallel and distributed processing symposium | 2006

Evaluating I/O characteristics and methods for storing structured scientific data

Avery Ching; Alok N. Choudhary; Wei-keng Liao; Lee Ward; Neil Pundit

Many large-scale scientific simulations generate large, structured multi-dimensional datasets. Data is stored at various intervals on high performance I/O storage systems for checkpointing, post-processing, and visualization. Data storage is very I/O intensive and can dominate the overall running time of an application, depending on the characteristics of the I/O access pattern. Our NCIO benchmark determines how I/O characteristics greatly affect performance (up to 2 orders of magnitude) and provides scientific application developers with guidelines for improvement. In this paper, we examine the impact of various I/O parameters and methods when using the MPI-IO interface to store structured scientific data in an optimized parallel file system

international conference on cluster computing | 2006

A New Flexible MPI Collective I/O Implementation

Kenin Coloma; Avery Ching; Alok N. Choudhary; Wei-keng Liao; Robert B. Ross; Rajeev Thakur; Lee Ward

The MPI-IO standard creates a huge opportunity to break out of the traditional file system I/O methods. As a software layer between the user and the file system, an MPI-IO library can potentially optimize I/O on behalf of the user with little to no user intervention. This is all possible because of the rich data description and communication infrastructure MPI-2 offers. Powerful data descriptions and some of the other desirable features of MPI-2, however, make MPI-IO challenging to implement. By creating a new collective I/O implementation that allows developers to easily tinker and play with new optimizations or combinations of different techniques, research can proceed faster and be quickly and reliably deployed

conference on high performance computing (supercomputing) | 2007

Using MPI file caching to improve parallel write performance for large-scale scientific applications

Wei-keng Liao; Avery Ching; Kenin Coloma; Arifa Nisar; Alok N. Choudhary; Jacqueline H. Chen; Ramanan Sankaran; Scott Klasky

Typical large-scale scientific applications periodically write checkpoint files to save the computational state throughout execution. Existing parallel file systems improve such write-only I/O patterns through the use of client-side file caching and write-behind strategies. In distributed environments where files are rarely accessed by more than one client concurrently, file caching has achieved significant success; however, in parallel applications where multiple clients manipulate a shared file, cache coherence control can serialize I/O. We have designed a thread based caching layer for the MPI I/O library, which adds a portable caching system closer to user applications so more information about the applications I/O patterns is available for better coherence control. We demonstrate the impact of our caching solution on parallel write performance with a comprehensive evaluation that includes a set of widely used I/O benchmarks and production application I/O kernels.

conference on high performance computing (supercomputing) | 2007

Noncontiguous locking techniques for parallel file systems

Avery Ching; Wei-keng Liao; Alok N. Choudhary; Robert B. Ross; Lee Ward

Many parallel scientific applications use high-level I/O APIs that offer atomic I/O capabilities. Atomic I/O in current parallel file systems is often slow when multiple processes simultaneously access interleaved, shared files. Current atomic I/O solutions are not optimized for handling noncontiguous access patterns because current locking systems have a fixed file system block-based granularity and do not leverage high-level access pattern information. In this paper we present a hybrid lock protocol that takes advantage of new list and datatype byte-range lock description techniques to enable high performance atomic I/O operations for these challenging access patterns. We implement our scalable distributed lock manager (DLM) in the PVFS parallel file system and show that these techniques improve locking throughput over a naive noncontiguous locking approach by several orders of magnitude in an array of lock-only tests. Additionally, in two scientific I/O benchmarks, we show the benefits of avoiding false sharing with our byte-range granular DLM when compared against a block-based lock system implementation.

international parallel and distributed processing symposium | 2007

Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method

Wei-keng Liao; Avery Ching; Kenin Coloma; Alok N. Choudhary; Mahmut T. Kandemir

Many large-scale production applications often have very long executions times and require periodic data checkpoints in order to save the state of the computation for program restart and/or tracing application progress. These write-only operations often dominate the overall application runtime, which makes them a good optimization target. Existing approaches for write-behind data buffering at the MPI I/O level have been proposed, but challenges still exist for addressing system-level I/O issues. We propose a two-stage write-behind buffering scheme for handing checkpoint operations. The first-stage of buffering accumulates write data for better network utilization and the second-stage of buffering enables the alignment for the write requests to the file stripe boundaries. Aligned I/O requests avoid file lock contention that can seriously degrade I/O performance. We present our performance evaluation using BTIO benchmarks on both GPFS and Lustre file systems. With the two-stage buffering, the performance of BTIO through MPI independent I/O is significantly improved and even surpasses that of collective I/O.

high performance distributed computing | 2006

Exploring I/O Strategies for Parallel Sequence-Search Tools with S3aSim

Avery Ching; Wu-chun Feng; Heshan Lin; Xiaosong Ma; Alok N. Choudhary

Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks

cluster computing and the grid | 2006

Scalable Approaches for Supporting MPI-IO Atomicity

Peter M. Aarestad; Avery Ching; George K. Thiruvathukal; Alok N. Choudhary

Scalable atomic and parallel access to noncontiguous regions of a file is essential to exploit high performance I/O as required by large-scale applications. Parallel I/O frameworks such as MPI I/O conceptually allow I/O to be defined on regions of a file using derived datatypes. Access to regions of a file can be automatically computed on a perprocessor basis using the datatype, resulting in a list of (offset, length) pairs. We describe three approaches for implementing lock serving (whole file, region locking, and byterange locking) and compare the various approaches using three noncontiguous I/O benchmarks. We present the details of the lock server architecture and describe the implementation of a fully-functional prototype that makes use of a lightweight message passing library and red/black trees.

Explore More