Wei-keng Liao
Northwestern University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Wei-keng Liao.
international conference on cluster computing | 2002
Avery Ching; Alok N. Choudhary; Wei-keng Liao; Robert B. Ross; William Gropp
With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications. The development of parallel file systems has helped to ease the performance gap, but I/O still remains an area needing significant performance improvement. Research has found that noncontiguous I/O access patterns in scientific applications combined with current file system methods, to perform these accesses lead to unacceptable performance for large data sets. To enhance performance of noncontiguous I/O, we have created list I/O, a native version of noncontiguous I/O. We have used the Parallel Virtual File System (PVFS) to implement our ideas. Our research and experimentation shows that list I/O outperforms current noncontiguous I/O access methods in most I/O situations and can substantially enhance the performance of real-world scientific applications.
international conference on cluster computing | 2003
Ching; Choudhary; Wei-keng Liao; Ross; Gropp
Parallel scientific applications store and retrieve very large, structured datasets. Directly supporting these structured accesses is an important step in providing high-performance I/O solutions for these applications. High-level interfaces such as HDF5 and Parallel netCDF provide convenient APIs for accessing structured datasets, and the MPI-IO interface also supports efficient access to structured data. However, parallel file systems do not traditionally support such access. In this work we present an implementation of structured data access support in the context of the parallel virtual file system (PVFS). We call this support datatype I/O because of its similarity to MPI datatypes. This support is built by using a reusable datatype-processing component from the MPICH2 MPI implementation. We describe how this component is leveraged to efficiently process structured data representations resulting from MPI-IO operations. We quantitatively assess the solution using three test applications. We also point to further optimizations in the processing path that could be leveraged for even more efficient operation.
international parallel and distributed processing symposium | 2006
Avery Ching; Alok N. Choudhary; Wei-keng Liao; Lee Ward; Neil Pundit
Many large-scale scientific simulations generate large, structured multi-dimensional datasets. Data is stored at various intervals on high performance I/O storage systems for checkpointing, post-processing, and visualization. Data storage is very I/O intensive and can dominate the overall running time of an application, depending on the characteristics of the I/O access pattern. Our NCIO benchmark determines how I/O characteristics greatly affect performance (up to 2 orders of magnitude) and provides scientific application developers with guidelines for improvement. In this paper, we examine the impact of various I/O parameters and methods when using the MPI-IO interface to store structured scientific data in an optimized parallel file system
international conference on cluster computing | 2006
Kenin Coloma; Avery Ching; Alok N. Choudhary; Wei-keng Liao; Robert B. Ross; Rajeev Thakur; Lee Ward
The MPI-IO standard creates a huge opportunity to break out of the traditional file system I/O methods. As a software layer between the user and the file system, an MPI-IO library can potentially optimize I/O on behalf of the user with little to no user intervention. This is all possible because of the rich data description and communication infrastructure MPI-2 offers. Powerful data descriptions and some of the other desirable features of MPI-2, however, make MPI-IO challenging to implement. By creating a new collective I/O implementation that allows developers to easily tinker and play with new optimizations or combinations of different techniques, research can proceed faster and be quickly and reliably deployed
conference on high performance computing (supercomputing) | 2007
Wei-keng Liao; Avery Ching; Kenin Coloma; Arifa Nisar; Alok N. Choudhary; Jacqueline H. Chen; Ramanan Sankaran; Scott Klasky
Typical large-scale scientific applications periodically write checkpoint files to save the computational state throughout execution. Existing parallel file systems improve such write-only I/O patterns through the use of client-side file caching and write-behind strategies. In distributed environments where files are rarely accessed by more than one client concurrently, file caching has achieved significant success; however, in parallel applications where multiple clients manipulate a shared file, cache coherence control can serialize I/O. We have designed a thread based caching layer for the MPI I/O library, which adds a portable caching system closer to user applications so more information about the applications I/O patterns is available for better coherence control. We demonstrate the impact of our caching solution on parallel write performance with a comprehensive evaluation that includes a set of widely used I/O benchmarks and production application I/O kernels.
international conference on cluster computing | 2002
Jianwei Li; Wei-keng Liao; Alok N. Choudhary; Valerie E. Taylor
In this paper we investigate the data access patterns and file I/O behaviors of a production cosmology application that uses the adaptive mesh refinement (AMR) technique for its domain decomposition. This application was originally developed using Hierarchical Data Format (HDF version 4) I/O library and since HDF4 does not provide parallel I/O facilities, the global file I/O operations were carried out by one of the allocated processors. When the number of processors becomes large, the I/O performance of this design degrades significantly due to the high communication cost and sequential file access. In this work, we present two additional I/O implementations, using MPI-IO and parallel HDF version 5, and analyze their impacts to the I/O performance for this typical AMR application. Based on the I/O patterns discovered in this application, we also discuss the interaction between user level parallel I/O operations and different parallel file systems and point out the advantages and disadvantages. The performance results presented in this work are obtained from an SGI Origin2000 using XFS, an IBM SP using GPFS, and a Linux cluster using PVFS.
conference on high performance computing (supercomputing) | 2007
Avery Ching; Wei-keng Liao; Alok N. Choudhary; Robert B. Ross; Lee Ward
Many parallel scientific applications use high-level I/O APIs that offer atomic I/O capabilities. Atomic I/O in current parallel file systems is often slow when multiple processes simultaneously access interleaved, shared files. Current atomic I/O solutions are not optimized for handling noncontiguous access patterns because current locking systems have a fixed file system block-based granularity and do not leverage high-level access pattern information. In this paper we present a hybrid lock protocol that takes advantage of new list and datatype byte-range lock description techniques to enable high performance atomic I/O operations for these challenging access patterns. We implement our scalable distributed lock manager (DLM) in the PVFS parallel file system and show that these techniques improve locking throughput over a naive noncontiguous locking approach by several orders of magnitude in an array of lock-only tests. Additionally, in two scientific I/O benchmarks, we show the benefits of avoiding false sharing with our byte-range granular DLM when compared against a block-based lock system implementation.
ieee international conference on high performance computing data and analytics | 2004
Avery Ching; Alok N. Choudhary; Wei-keng Liao; Robert B. Ross; William Gropp
Modern data-intensive structured datasets constantly undergo manipulation and migration through parallel scientific applications. Directly supporting these time-consuming operations is an important step in providing high-performance I/O solutions for modern large-scale applications. High-level interfaces such as HDF5 and parallel netCDF provide convenient APIs for accessing structured datasets, and the MPI IO interface also supports efficient access to structured data. Parallel file systems do not traditionally support such structured access from these higher level interfaces. In this work, we present two contributions. First, we demonstrate an implementation of structured data access support in the context of the Parallel Virtual File System (PVFS). We call this support datatype I/O because of its similarity to MPI datatypes. This support is built with a reusable datatype-processing component from the MPICH2 MPI implementation. The second contribution of this work is a comparison of I/O characteristics of modern high-performance noncontiguous I/O methods.We use our I/O characteristics comparison to assess all the methods using three test applications. We also point to further optimisations that could be leveraged for even more efficient operation.
international parallel and distributed processing symposium | 2007
Wei-keng Liao; Avery Ching; Kenin Coloma; Alok N. Choudhary; Mahmut T. Kandemir
Many large-scale production applications often have very long executions times and require periodic data checkpoints in order to save the state of the computation for program restart and/or tracing application progress. These write-only operations often dominate the overall application runtime, which makes them a good optimization target. Existing approaches for write-behind data buffering at the MPI I/O level have been proposed, but challenges still exist for addressing system-level I/O issues. We propose a two-stage write-behind buffering scheme for handing checkpoint operations. The first-stage of buffering accumulates write data for better network utilization and the second-stage of buffering enables the alignment for the write requests to the file stripe boundaries. Aligned I/O requests avoid file lock contention that can seriously degrade I/O performance. We present our performance evaluation using BTIO benchmarks on both GPFS and Lustre file systems. With the two-stage buffering, the performance of BTIO through MPI independent I/O is significantly improved and even surpasses that of collective I/O.
international parallel and distributed processing symposium | 2005
Kenin Coloma; Alok N. Choudhary; Avery Ching; Wei-keng Liao; Seung Woo Son; Mahmut T. Kandemir; Lee Ward
The I/O patterns of large scale scientific applications can often be characterized as small, non-contiguous, and regular. From a performance and power perspective, this is perhaps the worse kind of I/O for a disk. Two approaches to mitigating the mechanical limitations of disks are write-back caches and software-directed power management. Previous distributed caches are plagued by synchronization and scalability issues. The direct access cache: DAChe system is a user-level distributed cached that addresses both these problems. Past work on managing disk power during run time were effective, one should be able to improve on those results by adopting a proactive scheme.
